Creative Copyright & AI

Prepared by Rick Craig, Director of Information Technology

College of Architecture, Arts, and Design | Virginia Tech

March 2026

This document is a resource for the AAD community. The AAD AI Working Group is also working through what responsible AI use looks like in a creative college. Information is current as of March 2026 and will be updated as things change.

Why This Matters for AAD

Your designs, images, writing, and architectural renderings may already be in the datasets AI models were trained on. The tools our university provides may produce outputs that relate to, draw from, or in some cases closely resemble creative work from fields like yours.

This document gives you the facts — what is happening, what the courts are saying, what protections exist, and what is still unresolved — so you can form your own informed position.

How AI Models Learn From Creative Work

AI models do not “understand” art the way a human does. They learn statistical patterns from enormous datasets of examples. Understanding how this works matters for evaluating the legal and ethical questions that follow.

Text models (LLMs)

Large language models like ChatGPT, Gemini, and Claude ingest massive text corpora and learn statistical patterns in language. A significant portion of this training data comes from Common Crawl, a nonprofit that has been crawling the open web since 2008. As of 2026, Common Crawl’s archive spans over 9.5 petabytes of web data across billions of URLs. At least 64% of 47 reviewed LLMs used some filtered version of Common Crawl for training, and Meta’s original LLaMA model used it as approximately 67% of its training data (Mozilla Foundation, “Training Data for the Price of a Sandwich”).

Image models (diffusion models)

Image-generating models like Stable Diffusion, DALL-E, and Midjourney use a technique called diffusion. Stable Diffusion was trained on subsets of LAION-5B, a dataset of 5.85 billion image-text pairs collected by parsing Common Crawl for image tags with alt-text values (LAION, arXiv, 2022). Any artwork, architectural photograph, or design image published on the open web with alt text could have been captured.

A note on the dataset’s history: LAION-5B was taken offline in December 2023 after researchers at the Stanford Internet Observatory discovered that it contained child sexual abuse material (CSAM). LAION subsequently released a cleaned version called Re-LAION-5B in August 2024.

The “regurgitation” question

A critical technical and legal question is whether AI models merely learn patterns from training data or sometimes memorize and reproduce specific content. The NYT v. OpenAI litigation has surfaced evidence that ChatGPT can reproduce Times articles near-verbatim — what researchers call “regurgitation.” This distinction is reshaping fair use arguments in the courts.

Is Your Work in a Training Dataset?

Possibly. If your creative work has been published on the open web, there is a meaningful chance it was captured by Common Crawl or included in LAION-5B.

Have I Been Trained (haveibeentrained.com), built by Spawning AI, allows you to search the LAION-5B dataset by keyword or by uploading an image. Approximately 80 million images have been opted out, largely through platform partnerships with ArtStation and Shutterstock rather than individual artist submissions. That represents only about 3% of the 2+ billion images in the dataset (Spawning, 2023; MIT Technology Review, December 2022).

The Getty Images lawsuit alleges that approximately 12 million Getty images were scraped for Stable Diffusion training (Getty Images v. Stability AI, D. Del. 1:23-cv-00135).

What the Courts Are Saying

There is no definitive ruling on whether AI training on copyrighted work is legal. Multiple cases are working through the courts. Here is where things stand.

Thomson Reuters v. Ross Intelligence (D. Del., decided February 2025)

The first completed federal ruling on fair use in AI training. The court granted summary judgment for Thomson Reuters, finding that Ross infringed on 2,243 Westlaw headnotes. The fair use defense was rejected. However, the court expressly limited its holding to non-generative AI, leaving the question open for generative AI systems.

NYT v. OpenAI and Microsoft (S.D.N.Y., Case No. 1:23-cv-11195, active)

Filed December 2023. The New York Times alleges OpenAI and Microsoft used millions of Times articles to train ChatGPT without permission. In April 2025, Judge Sidney Stein denied OpenAI’s motions to dismiss. In January 2026, Judge Stein affirmed an order compelling OpenAI to produce a full 20-million-log sample of anonymized ChatGPT conversations — a major discovery victory for plaintiffs. The case is in active discovery. No trial date. No final ruling on fair use.

Andersen v. Stability AI, Midjourney, DeviantArt, Runway AI (N.D. Cal., Case No. 3:23-cv-00201, active)

Filed January 2023 by artists Sarah Andersen, Kelly McKernan, and Karla Ortiz. In August 2024, Judge William Orrick issued a mixed ruling on motions to dismiss — Andersen’s direct copyright claim against Stability AI survived, but some other claims were dismissed. The case is in active discovery. Trial is set for September 8, 2026.

Getty Images v. Stability AI

UK (decided, November 2025): The High Court of England and Wales rejected Getty’s secondary copyright infringement claims, finding that AI model weights and parameters do not store or reproduce visual information from training images in a way that constitutes infringement under UK copyright law. Getty had dropped its primary copyright infringement claims during trial due to lack of evidence that training occurred in the UK. The court found limited trademark infringement for some Stable Diffusion iterations. Note: this ruling is narrower than a broad holding that “model weights are not copies” — it addressed secondary infringement on the specific facts presented.

US (active, N.D. Cal., Case No. 3:25-cv-06891): Originally filed in Delaware, voluntarily dismissed August 2025 and refiled in Northern California. Alleges unauthorized reproduction of approximately 12 million Getty images. Early stages.

Bartz v. Anthropic (N.D. Cal., proposed settlement pending)

In June 2025, Judge William Alsup ruled that training on lawfully acquired books is fair use — “quintessentially transformative.” But training on pirated copies (from Library Genesis and Pirate Library Mirror) is not fair use. Anthropic had downloaded over 7 million pirated books. A proposed settlement was announced in late August 2025 and given preliminary court approval in September 2025 for $1.5 billion — one of the largest proposed copyright settlements ever, covering approximately 500,000 pirated works at roughly $3,000 per book. As of late March 2026, a federal judge has temporarily halted the settlement pending further review; final approval remains pending.

Important caveat: because this is a settlement, it does not create binding legal precedent. Judge Alsup’s summary judgment ruling on fair use is a district court opinion, not an appellate ruling.

Other active cases

The music industry is suing Anthropic for over $3 billion over alleged infringement of more than 20,000 songs. Disney and Universal have filed against Midjourney over AI-generated reproductions of Marvel and Star Wars characters.

Fair Use: The Central Legal Question

The fair use doctrine (17 U.S.C. Section 107) involves four factors, and courts are reaching different conclusions on each:

Factor 1 — Purpose and character: Courts are split. In Bartz, Judge Alsup called AI training “transformative — spectacularly so.” In Thomson Reuters, the court found this factor favored the copyright holder.

Factor 2 — Nature of the copyrighted work: Generally weighs in favor of copyright holders when creative works are involved.

Factor 3 — Amount used: AI training typically involves ingesting entire works, which traditionally weighs against fair use.

Factor 4 — Effect on the market: The US Copyright Office stated in its May 2025 Part 3 report: “The speed and scale at which AI systems generate content pose a serious risk of diluting markets for works of the same kind as in their training data.”

The overall trend: transformativeness alone will not save AI companies if there is demonstrable market harm or if the data was obtained unlawfully. But no appellate court has ruled on the merits yet, so the law remains unsettled.

Who Owns AI-Generated Output?

Purely AI-generated content is not copyrightable. In Thaler v. Perlmutter, courts ruled that copyright requires a human author. The US Supreme Court denied certiorari on March 2, 2026, making this settled law for now.

AI-assisted work can be copyrightable. In the Zarya of the Dawn decision (February 2023), the Copyright Office ruled that AI-generated images within a graphic novel were individually not copyrightable, but the human-authored text and the overall selection and arrangement were copyrightable as a compilation.

What this means for students

The key principle: the more human creative judgment and modification a student applies to AI output, the stronger the copyright position. Pure generation is the weakest position; AI-assisted creation with substantial human contribution is much stronger.

Documenting your creative contribution

For faculty and students who want to use AI as a tool while retaining copyright protection:

Save your prompts and iterations
Document your selection process
Keep records of modifications you made to AI output
For architecture and design work, show how AI-generated elements were integrated into a larger human-authored design framework
Maintain a clear creative narrative

What this means for faculty

AI-generated content in faculty research may not be independently copyrightable. The human-authored analysis, selection, arrangement, and modification around AI outputs would still be protectable. Faculty should document their creative contribution when using AI tools, particularly for work they intend to publish.

Academic journals are increasingly establishing their own policies on AI-generated content. Nature, Science, and many field-specific journals now require disclosure of AI tool use. Check your target journal’s AI policy before submission.

The International Landscape

Different countries are taking different approaches.

European Union: The EU AI Act requires providers of general-purpose AI models to implement a copyright policy and publish a “sufficiently detailed summary” of training content. Enforcement begins August 2, 2026 — fines up to 3% of global annual turnover or EUR 15 million.

United Kingdom: Narrow text and data mining exception limited to non-commercial research. The Getty UK ruling rejected secondary copyright infringement claims against AI model weights on the specific facts presented.

Japan: Article 30-4 of the Japanese Copyright Act broadly permits AI training, as long as the purpose is not to “enjoy the thoughts or sentiments expressed in that work.”

Jurisdiction	Training on Copyrighted Work	Output Liability	Opt-Out Mechanism?
US	Unsettled; fair use case-by-case	Standard infringement analysis	No statutory mechanism
EU	Allowed with transparency requirements	Standard infringement	Yes (CDSM Directive Article 4)
UK	Narrow exception (non-commercial only)	Under development	Proposed but not enacted
Japan	Broadly permitted for training	Infringement analysis at output stage	No formal mechanism

What Creators Can Do Right Now

Check if your work is in training datasets

Have I Been Trained (haveibeentrained.com) lets you search LAION-5B by keyword or image upload and opt out of future versions.

Protect your work from AI style mimicry

Glaze, developed by Ben Zhao’s lab at UChicago, makes subtle changes to artwork that disrupt AI style mimicry. Over 92% effectiveness in controlled testing. Downloaded over 8.5 million times since March 2023.

Nightshade, from the same lab, “poisons” training data. Downloaded over 2.5 million times since January 2024.

Control AI crawling of your websites

Major AI crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot) state they respect robots.txt directives. However, compliance is voluntary.

What Remains Unresolved

No appellate court has ruled on whether generative AI training is fair use.
The NYT v. OpenAI case has no ruling on the merits.
The UK ruling rejecting secondary copyright infringement claims against AI model weights has no binding effect in the US and was narrower than widely reported.
Opt-out tools have limited reach. Have I Been Trained covers only LAION-5B.
EU AI Act transparency requirements are new and untested. Enforcement begins August 2026.
Copyright Office reports are guidance, not law.

Looking Forward

The questions are real: How should faculty think about AI in their own creative practice? What should syllabi say about AI-generated work? How do we support students in developing their own creative capacities while preparing them for a world where these tools exist? The AAD AI Working Group is looking at some of these questions as part of a broader conversation about AI in our college.

This document is a starting point, not an endpoint. The legal landscape is moving fast — several of the cases described here could produce significant rulings within the next year. As new developments emerge, I will update this resource.