Saturday, June 22, 2024
HomeNewsPublic Data Collection Is Advancing, But Still Far From Its Full Potential

Public Data Collection Is Advancing, But Still Far From Its Full Potential

The web scraping industry is maturing both from the technology and business perspective, however, it still lacks proper regulation. For this reason, key market players are launching an Ethical Web Data Collection Initiative (EWDCI) to share best practices and advocate for common principles. These were some of the main takeaways from this year’s edition of the prominent industry conference — OxyCon.

Public Data Collection Is Advancing, But Still Far From Its Full Potential
Public Data Collection Is Advancing, But Still Far From Its Full Potential

Organized by a leading public web data gathering solutions provider Oxylabs, OxyCon connected global web scraping experts for a two-day online event. From practical tips for engineers to high-level panel discussions, the conference speakers reviewed the most recent developments in the field.

Allen O’Neill,  CEO and CTO at The DataWorks, argued that while the web scraping industry has been developing rapidly over the years, there’s still so much potential left for the future:

“The web scraping industry hasn’t even scratched the surface with its potential yet. There will be many new unicorns in the industry in the upcoming ten years – those who will be able to harness the power of information extraction (not data extraction, but information extraction) and use that to gain insights that have never been seen before”, – said Allen.

The fast growth of the industry was illustrated by scaling being the hottest topic at OxyCon. Karsten Madsen, CEO at SEO company Morningscore, shared the story of his team moving from small data requests to having to compete with SEO industry giants. According to him, it’s not always about having the most data or the smartest data – it’s about having smarter algorithms to manage it.

Glen De Cauwsemaecker, Lead Crawler Engineer at OTA Insight had another tip for scaling data operations: “Be pragmatic and look for cost-reward balance”, – he recommended to the fast-growing data companies.

Besides the technical challenges of scaling, legal issues are also often close to the top of the list of concerns. The participants of the panel discussion “Lawyers discuss scraping” emphasized the ambiguity and many unclear areas that come with the lack of proper industry regulation. As a result, the industry itself must be proactive in safeguarding it from within and sharing best practices among each other.

In this light, Christian Dawson, Executive Director at I2Coalition made an announcement of a new web scraping industry initiative. I2Coalition, together with 5 public data aggregators – Oxylabs, Zyte, Smartproxy, Coresignal, and Sprious has launched an Ethical Web Data Collection Initiative (EWDCI). The aim of the group will be to promote the industry’s best practices and advocate for beneficial technical standards.

Hernaldo Turrillo
Hernaldo Turrillo is a writer and author specialised in innovation, AI, DLT, SMEs, trading, investing and new trends in technology and business. He has been working for ztudium group since 2017. He is the editor of openbusinesscouncil.org, tradersdna.com, hedgethink.com, and writes regularly for intelligenthq.com, socialmediacouncil.eu. Hernaldo was born in Spain and finally settled in London, United Kingdom, after a few years of personal growth. Hernaldo finished his Journalism bachelor degree in the University of Seville, Spain, and began working as reporter in the newspaper, Europa Sur, writing about Politics and Society. He also worked as community manager and marketing advisor in Los Barrios, Spain. Innovation, technology, politics and economy are his main interests, with special focus on new trends and ethical projects. He enjoys finding himself getting lost in words, explaining what he understands from the world and helping others. Besides a journalist, he is also a thinker and proactive in digital transformation strategies. Knowledge and ideas have no limits.

Most Popular