News Media versus OpenAI’s ChatGPT

Aug. 30, 2023

Why in News?

A group of news media organisations recently shut off OpenAI’s ability to access their content.
New York Times is planning on suing OpenAI over copyright violations.

What’s in Today’s Article?

About Artificial Intelligence:

Artificial intelligence (AI) is the ability of a computer or a robot controlled by a computer to do tasks that are usually done by humans because they require human intelligence and discernment.
The term is frequently applied to the project of developing systems endowed with the intellectual processes characteristic of humans, such as the ability to reason, discover meaning, generalize, or learn from past experience.
AI algorithms are trained using large datasets so that they can identify patterns, make predictions and recommend actions, much like a human would, just faster and better.

About OpenAI:

OpenAI is an artificial intelligence research company.
The company is best known for creating ‘ChatGPT’, which is an AI conversational chatbot.
Users can ask questions on just about anything to ChatGPT and the chatbot will respond accurately with answers, stories and essays.
It can even help programmers write software code.

Debate over ChatGPT’s Source for Data:

Software products like ChatGPT are based on what AI researchers call ‘Large Language Models’ (LLMs).
LLMs require enormous amounts of information to train their systems.
If chat bots or digital assistants need to be able to understand the questions that humans throw at them, they need to study human language patterns.
Tech companies that work on LLMs like Google, Meta or OpenAI are secretive about what kind of training data they use.
Tech companies use software called ‘crawlers’ to scan web pages, hoover up content and put it together in a dataset that can be used to train their LLMs.
This is what news outlets took a stand against last week when The New York Times and others blocked a web crawler known as GPT bot.
- Through GPT bot, OpenAI used to scrape data.
News outlets told OpenAI that the company can no longer use their published material and their journalism, to train their chat bots.

Reason Behind News Outlets’ Decision:

Search engines like Google or Bing also use web crawlers to index websites and present relevant results when users search for topics.
However, these search engines represent a mutually beneficial relationship.
Google, for instance, takes a snippet of a news article (a headline, a blurb and perhaps a couple of sentences) and reproduces them to make its search results useful.
And while Google profits off of that content, it also directs a significant amount of user traffic to news websites.
On the other hand, OpenAI provides no benefit, monetary or otherwise, to news companies.
It simply collects publicly available data and uses it for the company’s own purposes.

Way Forward:

Lat month, OpenAI signed a licensing arrangement with The Associated Press, in a deal that would allow the company to use the news agency’s archival content as a training dataset.
However, it remains to be seen if people refuse to accept payment and sue OpenAI for copyright infringement, the way a group of novelists did last year.
The legal battles ahead will have interesting implications for journalism, intellectual property and the future of artificial intelligence.

See All