Reddit blocks major search engines and AI bots, except the ones that pay
Reddit is intensifying its crackdown on web crawlers. Over the past few weeks, Reddit has begun blocking search engines and AI bots from displaying recent posts and comments unless they pay, according to a report from 404 Media.
Currently, Google is the only mainstream search engine showing recent results when you search for posts on Reddit using the “site:reddit.com” trick, 404 Media reports. This excludes Bing, DuckDuckGo, and other alternatives.
The likely reason is that Google has struck a $60 million deal allowing the company to train its AI models on Reddit's content. "Google Is the only search engine that works on Reddit now thanks to AI Deal, 404Media reports."
When Reddit announced last month that it would update its Robots Exclusion Protocol (robots.txt) to block automated data scraping, it's now became clear that the move wasn't solely aimed at thwarting AI companies like Perplexity and its controversial "answer engine."
It’s a signal to those who don’t have an agreement with us that they shouldn’t be accessing Reddit data,” Ben Lee, Reddit’s chief legal officer.
It's a bold move for a massive website like Reddit to block some of the most popular search engines, but it's not entirely surprising. Over the past year, Reddit has become increasingly protective of its data as it seeks to establish new revenue streams and satisfy new investors. After raising the cost of its API for some third-party developers, Reddit reportedly threatened to cut off Google if it continued using the platform's data to train AI for free.
Ironically, Reddit's robots.txt file once stated, “Reddit believes in an open internet, but not the misuse of public content.” Now, however, the file essentially reads, “Do not scrape.” It seems Reddit now views search engines that don’t engage in exclusive deals as misusing its content.
The updated robots.txt file will instruct web crawlers on which areas of Reddit are off-limits, with minimal impact on regular users and well-intentioned parties like researchers. However, AI companies lacking agreements with Reddit will be blocked from scraping the platform.
The situation reflects a broader issue where AI chatbots scrape the web for data, and with courts lagging in determining fair use, companies like Reddit are tightening controls to protect their data. Reddit has effectively blocked most search engines except Google, which has a $60 million deal for training AI on Reddit’s content.
Colin Hayhurst, CEO of the no-tracking search engine Mojeek, criticized Reddit for excluding all search engines but Google and mentioned his unsuccessful attempts to contact Reddit. Despite reaching out to both Google and Reddit, there was no response from either.
Reddit has been clear about its aim to prevent AI companies from scraping its data, a move that has led to significant changes, including blocking third-party APIs, which hurt some apps but didn't significantly impact Reddit’s user base. The strategy seems to have paid off as Reddit successfully went public in March.