The AI Copyright War Is Just Starting — And the Courts Have No Map
Every major AI company is being sued for training on copyrighted material. The outcome will reshape not just AI, but the entire concept of intellectual property.
New York Times lawsuit against OpenAI reached motion-to-dismiss stage with no clear precedent — legal framework for AI training data is genuinely uncharted
- Fair Use Is a Doctrine, Not a Rule
- The International Dimension
- The Licensing Alternative
- What Happens Either Way
In the summer of 2023, the New York Times filed a lawsuit against OpenAI and Microsoft that legal scholars are still unpacking. The core claim: that ChatGPT was trained on millions of Times articles without permission, and that the resulting model can reproduce Times content verbatim when prompted correctly. OpenAI's defence was essentially: training on publicly available text is transformative use, and therefore fair use under copyright law.
The case has not yet been decided. But it is one of at least forty major copyright suits currently working through US courts involving AI training data. The outcomes will not just determine whether OpenAI owes the Times damages. They will determine what kind of AI industry the United States — and by extension the world — is permitted to build.
Fair Use Is a Doctrine, Not a Rule
Fair use in US copyright law is a four-factor balancing test, not a bright line. Courts weigh: the purpose and character of the use; the nature of the copyrighted work; the amount taken; and the effect on the market for the original. The AI training argument leans on the first factor — that training is transformative because the output is not copying but generating. Critics argue the fourth factor destroys that defence: if AI can reproduce the NYT at scale, the market for NYT content is directly harmed.
Both arguments are coherent. Neither has clear precedent in an AI context. The closest analogies — the Google Books digitisation case, Napster, MP3.com — were resolved in specific factual contexts that do not map cleanly onto foundation model training. The judges handling these cases are writing on genuinely blank territory.
The International Dimension
US copyright doctrine is not global copyright doctrine. The EU Artificial Intelligence Act requires transparency about training data but does not resolve ownership. The UK's proposed training data exemption was withdrawn under industry pressure. Japan explicitly permits AI training on copyrighted works for research purposes, regardless of the rights holder's wishes. China's AI regulation focuses on output filtering rather than input licensing.
This creates a patchwork that sophisticated actors will exploit immediately. If US courts rule against training on copyrighted text, the compute can move to jurisdictions where such training is legal. The content still gets trained. The AI still ships globally. The US rights holder wins the legal battle and loses the economic one.
The Licensing Alternative
Some publishers are not waiting for courts. The Associated Press licensed its archive to OpenAI. Axel Springer did the same. News Corp entered negotiations. The terms of these deals are not public, but the logic is clear: if your content will be used regardless, extract value from the extraction. Build a revenue stream before the courts make the decision for you.
This creates a two-tier content economy. Large publishers with leverage to negotiate are building AI partnerships. Small creators — the individual writers, photographers, illustrators whose work made up the bulk of training corpora — have no negotiating position. They can sue individually, but class action certification has been inconsistently granted, and the legal timelines stretch well beyond most creators' economic horizons.
What Happens Either Way
If fair use prevails, AI companies will train on anything. The internet's corpus becomes fuel, and the people who made it receive nothing for their contribution. Content creation economics shift further toward AI augmentation as the only viable model.
If copyright prevails, training data licensing becomes the next major cost input for AI development. Large players can afford it. Startups cannot. The result is accelerated consolidation around the companies that can afford to license from every major publisher simultaneously — which is a very short list.
Neither outcome is obviously good. But one of them is arriving, probably within the next two years. The map is being drawn in courtrooms right now.
The WokHei editorial desk continuously monitors hundreds of sources across technology, science, culture, and business — detecting emerging patterns, surfacing overlooked angles, and writing analysis grounded in what the data actually shows. It does not speculate beyond its sources and cites everything it draws from.
View all editorial analyses →- wanikua/danghuangshangGitHub · LLM · Mar 15
- Judge Rules Lawmaker Must Be Allowed to Join Kennedy Center Board Meetingr/law · Mar 15
- FCC chair threatens networks licenses after Trump complains about Iran coverager/law · Mar 15
- Minnesota immigration crackdown continues to spark fear among People in the U.S. legallyr/law · Mar 15
- Federal Communications Commission (FCC) chairman Brendan Carr threatens broadcast licenses amid President Trump's criticism of Iran war coverager/law · Mar 14
- Florida Priest Faces $500K in Fines for Feeding The Homeless. How Zoning Rules Impact Charitiesr/law · Mar 14