An interview with Oxylabs' pro bono "Project 4β" manager Gabija Birgile, on how web intelligence collection solutions drive academic and nonprofit initiatives forward
Article continues below advertisement
Kindly brief us about Oxylabs, the "Project 4β" program, and its primary objectives.
GB: Oxylabs is a market-leading web intelligence platform and premium proxy provider, enabling companies of all sizes to utilize the power of big data. In simple terms, we offer tools and solutions for businesses looking to extract publicly available data on a large scale. Our services have become essential in the corporate world, but we also see what an impact it can have in supporting initiatives for social good.
Article continues below advertisement
Recognizing the growing interest from the academic community and nonprofit organizations, we launched "Project 4β" in 2022. This pro bono initiative provides free access to Oxylabs' expertise, advanced web scraping solutions, and the world's largest ethical proxy pool to academics, researchers, and organizations dedicated to the public good.
Through "Project 4β," we aim to empower these groups to tackle critical research questions and pursue impactful missions, ultimately driving positive change across various fields.
Article continues below advertisement
That sounds like a significant initiative. Who are some of your key partners, and what kind of work do they do?
GB: We are proud to partner with notable organizations, such as The Pulitzer Center, Bellingcat, Global Witness, Confirmado, Debunk.org, and #SRNT Project, along with students and researchers from various universities, including Stanford University, the University of Pennsylvania, London's Global University (UCL), The Northwestern University, the University of Michigan, the University of Edinburgh, to name a few.
Article continues below advertisement
These partners are engaged in a wide range of missions, from investigative journalism and verifying misinformation to environmental, human rights, and social research.
For example, The Pulitzer Center leads in groundbreaking and impactful reporting, providing journalists worldwide with access to cutting-edge methods and tools, Bellingcat uses open-source data to conduct investigative journalism, while Global Witness focuses on exposing ecological crimes and human rights abuses. Debunk.org is an organization dedicated to countering online disinformation and state-sponsored internet propaganda, and the #SRNT project aims to build the EU's largest database of social assistance organizations, addressing critical needs like domestic violence and mental health services.
Article continues below advertisement
What are some of these partners' main challenges in their research efforts?
GB: One of the primary challenges is dealing with the immense volume and complexity of data available on the web. Researchers often need to sift through vast amounts of information to find relevant data points. Additionally, many websites have anti-bot measures that can hinder automated data collection, so there's also a challenge of technical barriers, such as dealing with CAPTCHAs, blocked IPs, and dynamic web designs, handling data in various formats, and ensuring ethical and legal compliance in data collection.
These challenges can be significant obstacles for many organizations, especially those with limited technical resources.
How do web intelligence collection solutions help overcome these challenges for organizations working toward the public good?
GB: Web intelligence collection solutions are essential tools for overcoming these challenges. Advanced web scraping and proxy technologies enable efficient and effective data collection from various online sources, even from sites with robust anti-bot measures. These tools help automate the process of gathering publicly available data, enabling researchers and organizations to create new datasets and making it easier to gather valuable insights for their research.
Article continues below advertisement
For our partners, these solutions are essential in their work of speeding up the fact-checking process, supporting investigations, uncovering hidden patterns, tracking developments in real-time, and conducting large-scale studies on economic, political, and social trends.
Article continues below advertisement
In essence, web intelligence tools amplify the impact of these organizations by equipping them with the data needed to drive meaningful change. Without access to robust public web data collection solutions, their research capabilities would be significantly limited.
Can you provide an example of how a partner has benefited from "Project 4β" support?
GB: Absolutely. Once a year, we roll up our sleeves and take on a bigger, more complicated challenge, where we allocate more company resources. Last year, in collaboration with the Environmental Protection Department of Lithuania, the Oxylabs team created a dedicated "Ads-Sites Web Crawler."
Article continues below advertisement
This solution now helps the department's specialists save time and automatically collect online ads offering potentially illegal products and services, such as trade in animals and plants of protected or invasive species, prohibited hunting and fishing equipment, processing and removal of waste and sewage without the necessary permits, dismantling of unserviceable vehicles, fossil resources used and sold without the right to do so.
The department can now be more proactive in ensuring that citizens and legal persons comply with environmental legislation regulating environmental protection and the use of natural resources.
Article continues below advertisement
This year, we continue our pro bono partnership with the Communications Regulatory Authority of Lithuania (CRA). A few years ago, Oxylabs created a unique AI-powered web scraping solution that scans Lithuanian IP address spaces and searches for potentially illegal content online related to child abuse and p--nography. It would be physically impossible to monitor all sites on the web manually, and automation makes this task much easier.
Article continues below advertisement
However, AI and ML tools have advanced significantly during the last few years, and now we're working to improve them, especially their accuracy. This project is truly one of the most impactful ones, as it helps to protect the most vulnerable group in our society—our children.
What is the future of Project 4beta and its role in supporting research and public good?
GB: Looking ahead, we aim to expand our support to more researchers and organizations globally. We're always open to new partnerships and invite researchers, academics, NGOs, nonprofit organizations, and other public initiatives to join "4β". We aim to further enhance the capabilities of our partners by providing them with our advanced web intelligence tools and expertise and being a sidekick in their efforts to tackle increasingly complex research questions and societal missions.
We envision including more diverse organizations and ensuring that critical fields like human rights, environmental protection, public health, and social justice benefit from our technology. Ultimately, our goal is to drive meaningful change and contribute to a more informed, equitable, and sustainable world.