OpenAI may have accidentally deleted important data related to the ongoing New York Times copyright case. As first reported by TechCrunch, lawyers for the Times and co-plaintiff Daily News sent a letter to the judge overseeing the case detailing how “a full week of work by experts and lawyers” was “irretrievably lost.” OpenAI had provided the plaintiffs with two dedicated virtual machines to investigate the alleged cases of copyright infringement. The letter said that on November 14, “programs and search result data stored on one of the dedicated virtual machines were deleted by OpenAI engineers.”
The Times has accused OpenAI and Microsoft, which uses OpenAI’s models for its Bing AI chatbot, of copyright infringement for using paid and unauthorized content. The lawsuit details several examples of near-verbatim copying in ChatGPT’s responses. OpenAI rejected the argument, saying its models were trained on publicly available data, so it constituted fair use under copyright law. The case hinges on whether the Times can prove that OpenAI’s models copied and used its content without compensation or attribution.
OpenAI was able to recover most of the deleted data, but was unable to recover the “folder structure and file names” of the works, making the data unusable. Plaintiffs’ lawyers must start their evidence from scratch. In the letter, the lawyers reiterated that “there is no reason to believe that (the deletions) were intentional,” but also noted that “OpenAI is in the best position to search its own records.” AI companies have avoided revealing details about their training data.
Other similar copyright lawsuits have been filed against OpenAI. However, the lawsuits filed by Raw Story and AlterNet were recently dismissed because the plaintiffs failed to prove sufficient damages to support their claims. OpenAI currently has licensing agreements with several media companies, allowing it to use their work for training and to add citations to ChatGPT answers. Recently, Adweek reported that OpenAI is paying at least $16 million a year to publishing giant Dotdash Meredith for content licensing fees.