NYT says it didn't hack ChatGPT, only exposed copyright infringement - The UpStream

Hero Image

NYT says it didn't hack ChatGPT, only exposed copyright infringement

posted Saturday Mar 16, 2024 by Scott Ertz

The lawsuit between The New York Times and ChatGPT maker OpenAI has heated up in the past few weeks. After NYT cited examples of ChatGPT spitting out exact text from NYT articles. This prompted OpenAI to claim that the publication had "hacked" the system in order to get it to do things it shouldn't do. The publication has responded by claiming that it did nothing wrong, only used publicly available capabilities, and exposed ChatGPT as a system of plagiarism.

What did NYT do?

The process by which the NYT was able to prove its case was simple. They just fed the system a single line from an article and asked for the next line. From there, ChatGPT happily repeated the articles word for word. To be able to do this, obviously, OpenAI had full access to the articles. This would be one thing if the articles were public like this one is. However, the articles that NYT staffers were testing were paywalled content.

This meant that OpenAI was, in one manner or another, accessing content that was not publicly available, but only available to subscribers, and either using that as training data for its system or loading it on-demand. Either way, it was purposefully accessing data that it was not supposed to have access to. But, more than that, it was serving that content up exactly as it existed on the NYT website, allowing people to bypass the company's paywall, essentially stealing from them directly.

OpenAI's accusation

OpenAI has accused NYT of hacking the system in order to set the company up for a lawsuit. Their reasoning for this is that no user would actually use ChatGPT in this manner. And while, on the surface, this seems like a silly way to use the application, it is actually very popular. In fact, there are full Reddit discussion groups dedicated to the practice.

The popularity of the action is in the result, not in the practice itself. Of course, no one wants to have a computer system retrieve and display an article one sentence or paragraph at a time. However, when an article is behind a paywall and you do not have a subscription to that site, this method can be used to read that article without paying for it. Just because OpenAI doesn't want people to use the system this way does not mean that people won't.

NYT's response

The company's legal team submitted in a legal filing saying,

In OpenAI's telling, The Times engaged in wrongdoing by detecting OpenAI's theft of The Times's own copyrighted content. OpenAI's true grievance is not about how The Times conducted its investigation, but instead what that investigation exposed: that Defendants built their products by copying The Times's content on an unprecedented scale-a fact that OpenAI does not, and cannot, dispute.

The argument seems solid, as OpenAI's complaint does not seem to be formed on the basis of the manner in which NYT discovered the data but simply that they discovered it. But, either way, NYT has a great case because OpenAI has been aware of the infringement that exists within its system, and that alone is enough. The legal theory comes from the landmark case against Napster where the company was found liable for copyright infringement occurring on its network because they were aware of the activity and took no actions to prevent it.

In this case, OpenAI was made aware early on that their system was capable of directly plagiarising content from the web, including content that was supposed to be unreachable behind a paywall. Because the company was made aware of the issue and continued to allow the infringement to happen, NYT hopes to use the same argument to indicate liability on the part of OpenAI.


Login to CommentWhat You're Saying

Be the first to comment!

We're live now - Join us!



Forgot password? Recover here.
Not a member? Register now.
Blog Meets Brand Stats