OpenAI has limited ChatGPT’s ability to read content from web pages to summaries of 100 words. And those summaries are not reliable.
When OpenAI pulled ChatGPT’s browsing beta in July, it did so with a flimsy justification: It wanted “to do right by content owners,” according to the pull announcement. ChatGPT with browsing was able to get behind publishers’ paywalls and copy entire articles verbatim into the chat.
The justification is flimsy in that bypassing paywalls is the lesser of two evils. The real problem is that chatbots like ChatGPT could potentially capture a large portion of Internet traffic if users only consume content from other sites through the chatbot.
It’s an issue that OpenAI acknowledged when it first released the browsing feature. It is also a problem with Google Bard or Bing Chat.
ChatGPT is limited to 100-word summaries
Tests of the now-released browsing feature show that OpenAI has made compromises since its withdrawal in July: ChatGPT now only outputs up to 100-word summaries of web text. For more information, you need to follow a link to the source.
The system does not distinguish between text it is legally allowed to copy, such as from Wikipedia or OpenAI’s own blog, and text that is under copyright, such as from news sites. In my tests, questions for longer quotes from sources were reliably rejected by the system.
I can’t provide verbatim excerpts from copyrighted texts due to OpenAI’s policy to respect copyright laws and prevent potential misuse of the service. However, I can provide summaries, answer specific questions, or elaborate on particular aspects to help convey the information you’re looking for.
Content from sites that block the ChatGPT crawler, such as the New York Times, can no longer be consumed in chat, even in summary form.
Asked why it doesn’t copy Wikipedia text, even though it’s legal to do so, ChatGPT says it’s trained to provide “information in a summarized, rephrased, or synthesized manner, rather than copying large amounts of text verbatim.” It’s guided to avoid “excessive quoting from external sources” and to refer users to the original source for more information.
ChatGPT also refuses requests to rephrase text to avoid potential copyright issues, as this would violate its guidelines.
The system does offer to answer questions about the text in the chat. However, in my tests, ChatGPT was unable to answer simple questions about facts that are clearly described in the source but go beyond the summary.
For example, I asked ChatGPT about the duration of the peak intensity of Hurricane Walaka in the corresponding Wikipedia article. Although the text includes the phrase “Walaka maintained its peak intensity for six hours,” ChatGPT said that this information was not included in the article.
OpenAI makes ChatGPT more copyright-compliant, but also less useful
When OpenAI first introduced ChatGPT browsing, the company acknowledged that it is “a new method of interacting with the web” and that it would welcome feedback on how it could still contribute to the “overall health of the ecosystem” by driving traffic back to sources.
The new ChatGPT browsing seems to be the compromise the company has worked out with itself. The functionality is on par with common search engines that quote teaser text and sometimes short paragraphs from web pages directly into search results. This approach has been criticized in some cases, at least by publishers, and is the subject of legal disputes, but it has been common practice for the past decade or so.
Compared to traditional search engines, however, ChatGPT search has one major disadvantage: While search engines usually display relevant snippets unchanged from the source, ChatGPT rewrites information from the source.
The AI tool often does a good job. But not always.
ChatGPT browsing is unreliable, slow and cumbersome to control
Over the past few months, I’ve reviewed hundreds of AI summaries of articles I’ve written. An estimated 20 percent of the time, the abstract misses the point or omits an important aspect. Occasionally, the facts are wrong.
If I know the original source, the feature is still a great writing aid for me as an editor, especially since I can create multiple drafts and decide between them in seconds. That’s good.
But I wouldn’t rely on an AI summary as a source of information without knowing the original source. Doing so is sloppy.
If you take a deep breath and think this through step by step, it means that there is no reason to prefer an AI summary to a meaningful citation from the original source, as provided by search engines. The latter is more direct and more accurate.
This unreliability alone makes ChatGPT Browsing less useful than search engines, basically useless. You would always have to go to the original source anyway. Why take the detour through the AI summary?
Google Bard is even worse, with the ability to fact-check sources through another AI loop, which means you still can’t be sure not only that the information is correct, but that the source of the information is cited correctly. It’s messy and complicated.
Moreover, I have less control over the source selection unless I explicitly specify it in the prompt, which is again cumbersome. Bing’s source selection was modest, at least in my tests, and certainly not what I would have chosen.
Other drawbacks include less transparency and speed, and spotty access to web pages if many sites block the chatbot.
Of course, search engines like Google Search have their problems.
They are often cluttered with ads and rank spam sites that spread disinformation for profit. Their ranking criteria are not transparent and can disadvantage people and businesses.
But even ChatGPT with Bing is just accessing the same web content as Google via a search algorithm. The web doesn’t magically become better just because a chatbot automatically searches for me on sites I wouldn’t choose and may even misrepresent their content. On the contrary, the chatbot just adds another potential source of error to the information chain.
Perhaps one day chatbots will be able to compete with search engines. Google’s Search Generative Experience comes closest because it is a well-thought-out hybrid. But in terms of reliability and copyright, the two big hurdles, we haven’t made much progress since Bing Chat kicked off the chatbot search hype.