Debate on Artificial Intelligence & Copyright
Dec. 29, 2023

Why in News?

  • The New York Times has sued OpenAI – the firm behind generative Artificial Intelligence (AI) platform ChatGPT – and Microsoft for unlawful use of the paper’s copyrighted content.

What’s in Today’s Article?

  • About OpenAI (ChatGPT, How it Works)
  • OpenAI vs NYT Debate
  • News Summary

About OpenAI:

  • OpenAI is an Artificial Intelligence research company.
  • The company is best known for creating ‘ChatGPT’, which is an AI conversational chatbot.
  • Users can ask questions on just about anything to ChatGPT and the chatbot will respond accurately with answers, stories and essays.
  • It can even help programmers write software code.

Debate over ChatGPT’s Source for Data:

  • Software products like ChatGPT are based on what AI researchers call ‘Large Language Models’ (LLMs).
    • LLMs require enormous amounts of information to train their systems.
  • If chat bots or digital assistants need to be able to understand the questions that humans throw at them, they need to study human language patterns.
  • Tech companies that work on LLMs like Google, Meta or OpenAI are secretive about what kind of training data they use.
  • Tech companies use software called ‘crawlers’ to scan web pages, hoover up content and put it together in a dataset that can be used to train their LLMs.
  • This led to news outlets like the New York Times (NYT) voice concerns over copyright violations. The NYT and other news outlets blocked a web crawler known as GPT bot.
    • Through GPT bot, OpenAI used to scrape data.
  • The outlets told OpenAI that the company can no longer use their published material and their journalism, to train their chat bots.

Reason Behind News Outlets’ Decision:

  • Search engines like Google or Bing also use web crawlers to index websites and present relevant results when users search for topics.
  • However, these search engines represent a mutually beneficial relationship.
  • Google, for instance, takes a snippet of a news article (a headline, a blurb and perhaps a couple of sentences) and reproduces them to make its search results useful.
  • And while Google profits off of that content, it also directs a significant amount of user traffic to news websites.
  • On the other hand, OpenAI provides no benefit, monetary or otherwise, to news companies.
  • It simply collects publicly available data and uses it for the company’s own purposes.

News Summary:

  • The New York Times (NYT) has become the first major news publisher to sue OpenAI and Microsoft, the creators of ChatGPT and other popular artificial intelligence (AI) platforms, citing “unlawful” use of copyrighted content.
  • The lawsuit says the defendants largely scrape the NYT’s original content to build their models and manufacture responses.
  • The NYT accuses OpenAI and Microsoft of using content “without payment to create products that substitute for The Times and steal audiences away from it”.
    • Microsoft has a sizable investment in OpenAI.
  • Earlier this year, two US authors had also sued OpenAI, claiming in a proposed class action that the company misused their works to “train” ChatGPT.

What is NYT’s main Contention against OpenAI and Microsoft?

  • The lawsuit contends that millions of articles published by the NYT were used to train automated chatbots which now compete with the news outlet as a source of reliable information.
  • The publication also alleges that OpenAI and Microsoft’s large language models, which power ChatGPT and Copilot, “can generate output that recites Times content verbatim, closely summarises it, and mimics its expressive style.”
  • This “undermines and damages the Times’ relationship with readers, while also depriving it of “subscription, licensing, advertising, and affiliate revenue.”