Reddit reportedly blocking data scraping from Google and other search crawlers

Reddit reportedly blocking data scraping from Google and other search crawlers،

Reports recently surfaced claiming that Reddit, the news aggregator and community site, is planning to block AI startups from scraping data from its website. If the company does so, news crawlers such as those used by Google and Bing could be affected.

The reports come from a Washington Post report claiming that Reddit could remove the ability to log into the site using Google credentials, as well as prevent the tech giant’s crawlers from scraping the site. The informational post cited Reddit’s recent difficulties in reaching an agreement with AI companies, such as Google, to pay for the data they obtain from the site.

This was later denied by Reddit, but not in its entirety, only explicitly denouncing the part of the report relating to the Google connection. This left the second part, blocking crawlers, up to interpretation.

What’s going on with data scraping?

Recently, AI startups and how their chatbots are trained have become a topic of controversy with news sites such as Reddit, X, etc. This has led to several news organizations having to block these attempts via API blocks and limits. X CEO Elon Musk criticized AI startups for scraping data from his platform and blamed the problem on recent API changes he implemented on the site.

Reddit had a similar problem a few months ago, forcing the company to follow X’s lead by blocking APIs, a move that caused a ton of controversy and prompted many subreddits to shut down permanently. However, the problem now appears to be search bots, which continue to crawl the site for free.

AI startups have traditionally relied on publicly available web data to train their chatbots and other AI models. This allows them to avoid the expensive and time-consuming process of creating their own datasets. However, news organizations and other content creators are increasingly expressing frustration with the practice, arguing that AI startups profit from their work without paying for it.
price of the lion on Tiktok

However, blocking search engine bots from accessing its website would mean that Reddit content would no longer appear in Google and Bing search results. This would be a significant setback for Reddit, as search engines are a major source of traffic for the website.

This doesn’t seem to worry Reddit, as an anonymous source believed to be a Reddit representative reportedly said that “Reddit can survive without search.” As AI becomes more powerful and widespread, the demand for data to train AI models will only increase. So let’s hope that the search giants and news sites will come to an agreement and a solution to this problem soon.