- Newspaper publishers in California, Colorado, Illinois, Florida, Minnesota and New York said that Microsoft and OpenAI used millions of their articles without payment or permission to develop artificial intelligence models for ChatGPT and other products.
- The publishers provided examples of ChatGPT and Microsoft's Copilot chatbot allegedly regurgitating information from their articles without directing people to newspaper websites.
Eight U.S. newspaper publishers filed suit against Microsoft and OpenAI in a New York federal court on Tuesday, claiming the technology companies reuse their articles without permission in generative artificial intelligence products and incorrectly attribute inaccurate information to them.
The group of eight newspaper publishers takes issue with ChatGPT and Microsoft's Copilot assistant — available in the Windows operating system, the Bing search engine, and other products the software maker produces. ChatGPT and Copilot have been "purloining millions of the publishers' copyrighted articles without permission and without payment," according to the complaint, which had been filed in the U.S. District Court for the Southern District of New York.
The newspaper publishers in the lawsuit operate the New York Daily News, the Chicago Tribune, the Orlando Sentinel, the Sun Sentinel in Florida, The Mercury News in California, The Denver Post, The Orange County Register in California and the Pioneer Press of Minnesota. All fall under the ownership of hedge fund Alden Global Capital.
"We take great care in our products and design process to support news organizations," an OpenAI spokesperson said in a statement. "While we were not previously aware of Alden Global Capital's concerns, we are actively engaged in constructive partnerships and conversations with many news organizations around the world to explore opportunities, discuss any concerns, and provide solutions. Along with our news partners, we see immense potential for AI tools like ChatGPT to deepen publishers' relationships with readers and enhance the news experience."
Microsoft declined to comment.
The newspaper publishers said in the lawsuit that OpenAI has drawn on data sets containing text from their newspapers to train its GPT-2 and GPT-3 large language models, which can spit out text in response to a few words of human input.
Money Report
"The current GPT-4 LLM will output near-verbatim copies of significant portions of the publishers' works when prompted to do so," the complaint said, showing several examples of ChatGPT and the Copilot allegedly doing so.
The publishers said Microsoft copies information from their newspapers for the Bing search index, which helps inform answers in the Copilot. But such output doesn't always provide links to newspaper websites, where they can view ads alongside articles or pay for subscriptions.
The legal challenge comes four months after The New York Times sued OpenAI over copyright infringement in the ChatGPT chatbot that the startup released in late 2022. OpenAI said in a January blog post that the case is without merit, adding it wants to support "a healthy news ecosystem." That same month, Sam Altman, OpenAI's CEO, said the startup wanted to pay The New York Times and was surprised to learn about the lawsuit.
In recent months, OpenAI has signed deals with a handful of media companies, including Axel Springer and the Financial Times, enabling the Microsoft-backed startup to draw on the publishers' content to improve AI models.
Google, which has its own general-purpose chatbot for responding to user queries, said in February that it had reached an agreement with Reddit that includes the right to train AI models on the platform's content.
The New York Times case also touched on the matter of OpenAI models regurgitating information from its articles. In its blog post, OpenAI characterized such behavior as "a rare failure of the learning process that we are continually making progress on."
Correction: This article has been updated to reflect the correct day the lawsuit against Microsoft and OpenAI was filed.