Back to all posts
The "We Didn't Know" Defense: How AI Companies Avoid Copyright Liability
Erik Svilich, Founder & CEO | Encypher | C2PA Text Co-Chair

The "We Didn't Know" Defense: How AI Companies Avoid Copyright Liability

AI companies claim innocent infringement because they 'didn't know' whose content they scraped. Here's how this defense works—and how publishers can eliminate it.

By: Erik Svilich, Founder & CEO | Encypher | C2PA Text Co-Chair

In the high-stakes copyright battles between publishers and AI companies, one defense has emerged as particularly effective: "We didn't know it was yours. "

󠇟󠇠󠇡󠇢󠆧󠅗󠄹︁󠅶󠆰󠄽󠇄󠆮󠅧󠆺󠇋󠄉󠇬︋󠅃󠇑︇󠄥󠇧󠄑󠄪󠅡󠆃︅󠅄󠅤󠄴󠅯󠆚󠇃󠇋󠆣󠅧󠄇󠆉󠇮󠆇︅󠅕This isn't just a rhetorical strategy. 󠇟󠇠󠇡󠇢󠆋󠄪󠆡󠅟󠄢󠄹󠄰󠄥󠆡󠅏󠅉󠅁󠆷󠇆︋󠆶󠅬󠆵󠆋󠅙󠅾󠄃󠅣󠄰󠅂󠅉󠇜󠇃󠅌󠇜󠄽󠅒︆󠅒󠇘󠆳󠆄󠅣󠄔󠅹It's a legal doctrine with real teeth—and it's costing publishers billions in potential damages. 󠇟󠇠󠇡󠇢󠆲󠇚󠅐󠄡󠄆󠄾󠄺󠅟󠅳󠄦󠅈󠅆󠄓󠅅󠇝󠅫󠆁󠆄󠄤󠄜󠇠󠆛︆󠇢󠅢󠆔󠆷󠇘󠄯󠆑󠅁󠄢󠆴󠆐󠄻󠆩︄󠆌󠆎󠅩Understanding how this defense works is the first step to defeating it.

󠇟󠇠󠇡󠇢󠇭󠇤󠄓󠄭󠄧󠇀󠄳󠄯󠅰󠇜󠄗󠅯󠆫󠄙󠅸󠄗󠇖󠇅󠅯󠅌󠄮󠅟󠆽󠅞󠄹󠅑󠄼󠇌󠆦󠆪󠆮󠅇󠅺󠄚󠇤󠄀︄󠄭󠄦󠅡The Innocent Infringement Doctrine

Under U.S. copyright law, there's an important distinction between types of infringement:

Innocent Infringement

When an infringer didn't know—and had no reason to know—that their actions constituted copyright infringement, they may qualify for "innocent infringer" status. 󠇟󠇠󠇡󠇢󠇑󠇓󠆨󠄞󠄺󠄔󠄵󠅰󠆁󠆚󠆧󠆡󠆿󠄧󠄇󠅆󠄷󠇘󠇕󠅈󠄧󠅀︉󠇕󠅀󠅡󠅬󠆔󠅷󠅑󠄩󠄕󠅣󠅵󠆮󠄕󠅯󠄟󠆁󠅽This can significantly reduce damages. 󠇟󠇠󠇡󠇢󠄶󠅮󠆣󠆄󠆃󠆟󠄾󠄂󠆀󠇚󠅨󠄗︈󠄾󠆞󠇜󠄏󠇩󠇢󠅖󠄏︅󠆘󠆖󠇮󠆴󠆁󠇜󠅌󠇉󠄣󠄅󠆸󠄄󠆍󠇌󠄃󠆥︇󠇐17 U.S.C. § 504(c)(2):

"󠇟󠇠󠇡󠇢󠆐󠇀󠆇󠅲󠅫︁󠄷󠇕󠅼󠅠󠅟󠆮󠆔󠄵󠅵󠄉󠇦󠆹󠄯󠅹󠅉󠇆󠅏︋󠄃󠆭󠆉󠅘󠇄︈󠆵󠆅󠆰󠇣󠇠󠄲󠅛󠄝󠄽󠇖In a case where the infringer sustains the burden of proving... that such infringer was not aware and had no reason to believe that his or her acts constituted an infringement of copyright, the court in its discretion may reduce the award of statutory damages to a sum of not less than $200. "

󠇟󠇠󠇡󠇢󠆻󠇒󠅮󠆋󠄞󠅆󠄸󠄼󠆏󠇇󠅙󠄺󠅰󠄒󠇯󠄗󠆴󠅢󠄑󠅕󠇨󠆟󠅉󠇋󠄉󠅭󠆛󠅐︆󠅱󠅥󠆪󠆽󠄥󠇦󠆃󠅗︃󠆰󠅈Compare this to willful infringement, where statutory damages can reach $150,000 per work.

Why It Matters for AI

AI companies have built their defense around this doctrine:

    • "󠇟󠇠󠇡󠇢󠅩󠅛󠅺󠄽󠆾󠅖󠄺󠆯󠆇󠇩󠇈󠅛󠇟󠅼󠅴󠆜󠄘󠄲󠇔󠇎󠅺󠆊󠆧󠄐󠆹󠆄󠄿󠇏󠆁󠆷󠄃︉󠆸󠄁︄󠄁󠅹󠆼󠅯󠆮We scraped the open web"* — Content was publicly accessible
  1. "We processed billions of documents" — Scale makes specific knowledge impossible
  2. "We didn't know it was copyrighted" — No clear ownership signals
  3. "We didn't know it was yours" — Content was mixed with millions of other sources

This defense is particularly powerful because it's often technically true. 󠇟󠇠󠇡󠇢󠇔󠆦󠇗󠅱󠅁󠇝󠄰󠆲󠆧󠅳󠇘󠆼󠇊󠇝󠅻󠄮󠆬󠄁󠅅󠆠󠄚󠆿󠅗󠇮󠇞󠆙󠄫󠆢󠆊󠆛󠆙󠇨󠄫󠄃󠅹󠄀󠄣︃󠇡󠅢When you're processing petabytes of data from Common Crawl, you genuinely don't know which specific articles came from which publishers. 󠇟󠇠󠇡󠇢󠅭󠄣󠄂︎󠇤󠅵󠄴󠇑󠆗󠇒󠆶󠄡︄󠅖󠆶󠅓󠅳󠇖󠆙󠄒󠅥󠇠󠄝󠆂󠆣󠄩󠅏󠄼󠆁󠅐󠆯󠅋󠄩󠅤󠄘󠄇󠇅󠅒︎󠅐## 󠇟󠇠󠇡󠇢󠅳󠆟󠆍󠄕󠅵󠅤󠄷󠅹󠆚󠆁︋󠆶󠅺󠅘󠄺󠅟󠇡󠅹󠅿󠅷󠄱󠆘󠄎󠄮󠆭󠄏󠅉󠆖󠅫󠇣󠇟󠅅󠄦󠆺󠆞󠆎󠆑󠇂󠆝󠅎How the Defense Plays Out in Court

󠇟󠇠󠇡󠇢󠇉󠄼󠅟󠄕󠆂󠄴󠄱󠇚󠆫︇︄󠆢󠆐󠄂󠅴󠅙󠆄󠇀󠅚󠅕󠇙󠄞󠆟󠄙󠅉󠆘󠄕󠇪󠄪󠄚󠅤󠆓󠆷󠇞󠅠︀󠇥󠄣︅󠅾The New York Times vs. OpenAI

In the landmark NYT vs. OpenAI case, discovery revealed a critical problem: OpenAI had "accidentally" deleted over 150 hours of expert search data that could have shown exactly how Times articles were used in training. 󠇟󠇠󠇡󠇢󠆥󠆰󠆤󠄁󠇉󠄉󠄸󠆆󠆀󠆒︆󠆓︋󠇛󠅃󠇟󠅏󠆢󠅳󠇢󠆎󠅦󠇧󠆧󠇫︁󠇨󠅲󠄰󠇠󠅧󠆚󠆟󠆭󠇞󠆒󠆲︋󠄄󠅏Without this evidence, proving that OpenAI specifically knew it was using NYT content—and knew that use was infringing—becomes extraordinarily difficult. 󠇟󠇠󠇡󠇢󠆠󠄧󠆍︋󠆇󠇐󠄵󠇋󠆆󠅲󠇝󠇀︂󠆸󠄡󠇪󠇪󠅌󠄦󠅀󠆄󠆷󠆯󠄀󠇞󠆏󠄞󠄅󠇎󠄉󠆚󠄰󠇁󠄡︉󠅸󠆯󠆤󠇅󠄤### Thomson Reuters vs. ROSS Intelligence

The February 2025 Thomson Reuters decision was notable because the court rejected fair use—but the case involved a system that returned existing content rather than generating new material. 󠇟󠇠󠇡󠇢︆󠆱󠅃󠄜󠅄󠇄󠄿󠄛󠆀󠆑󠄌󠄳󠆺󠅐󠄬󠅄󠅇󠄩︃󠄜󠆎󠇣︊󠆥󠆳󠄎󠄥󠆈󠇉󠇬󠆉󠅸󠅷󠅑󠆎󠄛󠄳󠆳󠆁󠅕The evidence chain was clearer. 󠇟󠇠󠇡󠇢󠇄󠅾󠇪󠄧󠆴󠅭󠄰︂󠆫󠅐󠅸󠄥󠇙󠅸󠆿󠆵︋󠇙󠅏󠅒󠅦󠆘󠄖󠅹󠇄󠅹󠇢󠅆󠄖︁󠄶︇󠅴󠆜󠆭󠄪󠇎󠆳󠄨󠇊For generative AI, where training data is transformed into model weights, the evidentiary challenge is much harder. 󠇟󠇠󠇡󠇢󠇈󠅶󠄲󠇓󠅓󠅑󠄰󠇅󠅻󠆒󠆬󠄀󠅉󠇪󠇜󠇊󠅭󠇉󠆇󠄹󠇊󠇠󠆼󠆩︎󠄀󠄼󠇡󠄽󠇛󠄞󠇝󠇑󠆭󠇁󠇠󠆩︇󠇅󠅽### The Pattern

Across AI copyright litigation, the pattern is consistent: 1. Publisher: "󠇟󠇠󠇡󠇢󠆺󠅎󠅿󠄙󠅨󠄑󠄻󠅺󠅻󠇒󠇮󠆫󠆎󠄱︂󠅑󠄤󠅐󠅥󠇁󠄶󠅛󠇜󠆳󠄓󠇪󠇆󠅏󠆙︃󠄮󠅇󠄱󠄚︆󠆢󠅫󠇯󠄣󠄣You used our content to train your model" 2. 󠇟󠇠󠇡󠇢󠇧󠇥󠆢󠇋󠇤󠅣󠄳󠆺󠆗󠆛󠄲󠅳󠅣󠅝󠇨󠆞󠆛󠄏󠄄󠅃󠄴󠆸󠇁󠅣󠆭︎󠇞︅󠅁󠇭󠄌󠅈󠆖󠆡󠇞󠅹󠅷󠅔󠆚󠆤AI Company: "󠇟󠇠󠇡󠇢󠄳󠅑󠅱󠄁󠇁󠇛󠄽󠇃󠆇󠆡󠄮󠇉󠆕󠄖󠅇󠇛󠆽󠅛󠄧󠄁󠄇󠅧󠅗󠄠󠅇󠆒󠅢󠇫󠄁󠇃󠅰󠅕󠇂︂󠆴󠄿󠆋︍︅󠆼Prove it. 󠇟󠇠󠇡󠇢󠆃󠅸󠆫󠇂󠇥󠆀󠄺󠇛󠅺󠅾󠄭󠇭󠅖󠅚󠄇󠆼󠇪󠄞󠆎󠆓󠅠󠅯󠅚󠇎󠅓󠅜󠄭󠄶󠅯󠇂󠆯︈󠅩󠆶󠆭󠅧︍󠅯󠇞󠅠Show us exactly which content, when it was ingested, and how it was used" 3. Publisher: 󠇟󠇠󠇡󠇢󠄋󠆼󠇭󠄙︉󠅚󠄶󠇤󠆇󠆉󠆭󠆆󠇙󠇀󠄞󠇅󠇢󠆟︇󠅏󠆂󠆫󠆲󠅫󠇆󠆺󠆴󠄩󠅊󠄸󠇟󠆓󠆮󠆏󠅆󠄝󠅞󠄲󠇈󠆡[Faces expensive discovery with uncertain outcomes] 4. 󠇟󠇠󠇡󠇢󠆘󠆼󠄶󠇟󠄇󠆴󠄶󠆼󠅳󠄱󠅺󠅒󠇊󠄯󠆭󠅫󠄑󠇀󠄩󠇟󠅻󠇪󠄽󠄜󠄒󠅐︉󠅙󠄧󠅳󠄪󠇬󠅙󠄃󠇮󠄉󠆤︀󠅐󠆣AI Company: "󠇟󠇠󠇡󠇢󠄛󠅭󠇎󠅉󠆶︉󠄾󠄮󠅶󠆥󠄃󠇓󠄸󠅊󠆲󠄿󠆻󠄍󠅐󠄮󠇂󠅠󠄈󠅨️󠄁󠇐󠆚󠆯︃󠅡󠄉󠆮󠄬󠅻󠆝󠆌󠅾󠄈󠅆Even if you prove we had your content, we didn't know it was yours among billions of documents"

󠇟󠇠󠇡󠇢󠆨󠄐󠅁󠄖󠇐󠅷󠄴󠄃󠅲󠄎󠆌󠇞︍󠇄󠆃󠄎󠆝󠅏󠄙󠇍󠄿󠄙󠆆󠅠󠆺󠄩󠄶󠅯󠇥󠄅󠇐󠄙󠇔󠄻󠇀󠆚󠇦󠄹󠆧󠇍Why Traditional Evidence Fails

No Technical Proof of Ownership

Text on the open web carries no inherent proof of ownership. 󠇟󠇠󠇡󠇢󠇨󠅶󠅀󠄾︌󠄿󠄾󠄐󠆪󠄓󠆃󠄅󠆠󠄝󠇈󠇤󠄒󠇭︍󠅃󠅐󠄖󠄐︆󠇅󠄪󠆖󠄩󠅺󠅩󠄶󠄄󠇙󠆽󠇯󠅨󠅡󠄒󠇘󠆻A copyright notice at the bottom of a page:

  • Gets stripped during data processing
  • Isn't linked to specific content
  • Doesn't travel with the text when copied

Metadata Doesn't Survive

HTML metadata, author tags, and publication information are routinely removed when content is:

  • Scraped into datasets
  • Processed through cleaning pipelines
  • Converted to training format

Scale Creates Plausible Deniability

When your content is one article among billions, the "we didn't know" defense is credible:

  • No human reviewed individual documents
  • Automated systems don't parse copyright notices
  • Specific knowledge is genuinely impossible at scale

󠇟󠇠󠇡󠇢︈󠇥󠅋󠅪󠄬󠆂󠄻󠇘󠅱󠇃󠆊󠄾󠄷󠅹󠆹󠆀󠅆󠄤︉󠆧󠄦󠄡󠆶󠅩󠄏󠇮󠅈︄󠆧󠄦󠅟󠄿󠆇󠆍󠆀󠅻󠆅󠅱󠇃󠅞The Notification Gap

The key weakness in the "we didn't know" defense is notification. 󠇟󠇠󠇡󠇢󠄸󠇮󠅿󠆪󠆫󠅺󠄷󠄣󠆖󠇋󠅾︇󠅦󠅡︌󠆦󠆢󠄅󠅅󠅺󠆑󠆁󠇇󠅋󠆡󠅧󠄈󠆶󠇙󠅢󠄄󠇪󠅊󠅧󠇗󠅩󠆼󠅔󠇁󠇜If a publisher formally notifies an AI company that specific content is copyrighted and shouldn't be used, continued use becomes much harder to defend as "innocent. "

󠇟󠇠󠇡󠇢󠆗󠇗󠄢󠆡󠄕󠄗󠄿󠇛󠅻󠆬󠇠󠅊󠄗󠄺󠅘󠇮󠇦󠆭󠄊︂󠆺󠄺󠇩󠄎󠄢󠇀󠅒󠅠󠆠󠇇󠆊󠆸󠇏󠇎󠄗󠅪󠄗󠄾󠅘󠆟But traditional notification has problems:

Notification Without Proof

Sending a letter saying "our content is copyrighted" doesn't prove:

  • Which specific content is yours
  • That the AI company actually has it
  • That they can identify it in their systems

󠇟󠇠󠇡󠇢󠇅󠇒󠆦󠆻󠅒󠇚󠄲󠇌󠅿󠆉󠅎󠅝󠇡󠅾󠇤󠅮󠆘󠇢󠇚󠆷󠆨󠄶󠄌󠆱󠇆︅󠄣󠅂󠇁󠄰󠅥󠄁󠆓󠅔󠄺󠇎󠅎󠇮󠇪󠆆No Technical Mechanism

There's no standard way to:

  • Mark content as belonging to a specific publisher
  • Enable AI companies to detect marked content
  • Verify that notification was received and actionable

󠇟󠇠󠇡󠇢󠇮󠅃󠆨󠆄󠆚󠆢󠄿󠄽󠆠󠅛󠆺󠅔󠇕󠄔󠆝󠅐󠇎󠄄󠇁󠆥󠄜󠇐︌󠅚󠅖󠅡󠆊󠅱󠆆󠆫󠆥󠇯󠅚󠄸󠄘󠇆󠆴󠄯󠇁󠅫The Response

AI companies can respond to traditional notification with:

"Thank you for your letter. 󠇟󠇠󠇡󠇢󠆎󠆧󠄎󠅍󠄔󠇕󠄹󠅿󠆔󠆇󠆘󠇦󠄋󠄈󠄦󠄜󠆶󠇈󠄭󠄤󠄦󠇐󠆞󠇝󠄽󠅤󠅸︂󠄯󠇝󠇤󠅝󠄋󠇞󠄑󠅌︁󠆳󠇌󠇥We have no way to identify your specific content among our training data. 󠇟󠇠󠇡󠇢󠇂󠆼󠅶󠇅󠇉󠄓󠄾󠅮󠆜󠇂󠆅󠇖󠆑󠅩󠅳󠅋󠅞󠅄󠄇󠇌󠆞󠅙󠆠󠅿󠆨󠆒󠇁󠇓󠆓󠆔󠄥󠇗󠄟󠆜󠄮󠅑󠄝󠅳󠇮󠇢Please provide the exact URLs, timestamps, and content that you believe we have. 󠇟󠇠󠇡󠇢󠅨󠄭󠆜󠆴󠅭󠄶󠄾󠇛󠆢󠆯󠇘󠇙󠆠︂︌︍󠆍󠄊󠇙󠅫󠄙󠆵󠅆︊︇󠆑󠆒󠅡󠇇︁󠇇󠆋󠅋󠆣󠅏󠆀󠅝󠆔︄󠆜Even then, we cannot 'unlearn' content that has already been used in training. "

󠇟󠇠󠇡󠇢󠆴󠆷󠆖󠆋󠅁󠄀󠄻󠄃󠆈󠅑󠆼︃󠆚󠅖󠅹︇󠅛󠆛󠄤󠆞󠆅󠆻︅︇󠇙󠆈󠆓󠅬󠆳󠄻󠄊󠅪︃︎󠄂󠆎󠄾󠇎󠆧󠆥## The Provenance Solution

Cryptographic content provenance changes this equation entirely. 󠇟󠇠󠇡󠇢󠄬󠅋󠅘󠆨󠄯󠆃󠄲󠆩󠆬󠅮󠅆󠆂󠄥󠄊󠇓󠇧󠆮󠇦󠆷︆󠅉󠅱󠄑󠅍︇󠅝󠆍󠄀󠆠󠄿󠅤󠆪󠄭󠄒󠅞󠄿󠅗󠅷︀󠇨### How It Works

  1. 󠇟󠇠󠇡󠇢󠅍󠅼󠇥󠇠󠄧󠇂󠄷󠇫󠆊󠅗︁󠄆󠅪󠇜󠄦︂󠆍󠅢󠆖󠄲󠅑󠇙󠄫󠄹󠅤󠆎󠄨󠅏󠇒󠇪󠇒󠆭󠅼󠄏󠅸󠆂󠆶󠅉󠄡󠇉Publisher embeds cryptographic signature in content at publication
  2. 󠇟󠇠󠇡󠇢󠅴󠆻󠅒󠆳󠇅󠇃󠄽󠅏󠆆󠇕󠅆󠇢󠅶󠅺󠅚󠇂󠅅󠅽󠅼󠇡󠅷󠄀󠅴󠇢󠅞󠅩󠆟󠆓󠄐󠅶󠆊󠆤󠇏󠆘󠅍󠆪󠆞󠄚󠇇󠅽Signature travels with content through any distribution or scraping
    • 󠇟󠇠󠇡󠇢󠆒󠄬󠆛󠆤󠄠󠅓󠄹󠆼󠆍󠇃󠆵󠆀󠆽󠇄󠆖󠅖󠇗󠆛󠇭󠆖󠇧󠄭󠄣󠄹󠇓󠆸󠄍󠆃󠆞︍󠅨󠆈󠅣󠆥󠆻󠅢󠅥󠅃󠄢󠆯Publisher formally notifies AI company:* "󠇟󠇠󠇡󠇢󠅨󠆴󠄌󠇙󠄲󠄘󠄳︋󠆩󠆥󠅕󠆦󠆵󠆀︁󠅆󠅿󠇉󠆩󠆘󠄮󠅐󠅼󠇒󠅘󠇅󠄮󠇥󠄨󠆑󠇐󠅯︈󠆭󠆻󠆏󠆼󠇂󠆕󠇛Our content carries these signatures. 󠇟󠇠󠇡󠇢󠆖󠆝󠇦󠇄󠄯󠄖󠄱󠅛󠆑󠆀󠅈󠇆󠇏︉󠇂󠄓󠅼󠄋󠅧󠅨󠆓󠅚󠅹󠄸󠅁󠆈󠄪󠆬󠅢󠅊󠇤︃󠆧󠇆󠅲︃󠄂󠄡󠄢󠅣You can verify ownership via our public API."
  3. 󠇟󠇠󠇡󠇢󠇆󠅂󠄐󠄋󠄮󠇭󠄵󠅤󠆣󠄤󠇗󠆙󠆕󠆖󠄵︄󠄔󠅻󠆸󠅯󠇀󠇕󠆄︁󠇆󠆽󠅬󠅗󠄡󠇥󠇫󠆫󠆼󠇙󠆩󠆭󠄊󠄑󠄦󠇆AI company can now detect marked content in their pipeline
    • 󠇟󠇠󠇡󠇢󠇇󠄫󠄳󠄌󠆋󠆀󠄼︊󠆇󠇃󠄾󠅒󠄝󠆸󠄷󠆍󠅹󠆂󠄦󠆻󠆑󠅛󠅹󠅐󠇩󠄾󠄑󠄁󠆉︊︃󠇊️󠄲󠇙󠄤󠅿󠇙󠇦󠆑Continued unauthorized use* is no longer "innocent"

The Legal Transformation

Before notification + provenance:

  • Publisher: "You used our content"
  • AI Company: "We didn't know it was yours"
  • Result: 󠇟󠇠󠇡󠇢󠆮󠅀󠆮︄󠄰󠅤󠄷󠇊󠆬󠄍󠄛󠄪󠆍󠆈󠇊󠆤󠄳󠄱󠅿󠄋󠄦󠆝󠄮󠆑󠄻󠇜󠆯󠇏󠇀󠄀󠇝󠆰󠇂󠇐󠅞󠅼󠅽󠆲󠇛󠇚Innocent infringement defense viable

After notification + provenance:

  • Publisher: "You used our content. 󠇟󠇠󠇡󠇢󠆵󠄋󠅉︌󠆑︀󠄿󠆾󠆥󠅔󠅮󠄃󠅂󠄐󠅹󠆾󠅁󠇝󠄰󠅗󠄽󠅐󠅌󠅕󠄞󠆭󠇇󠄆󠄹󠆔󠄌󠄧︋󠆌︎󠇍󠆓󠇎󠄃󠄃It carries our cryptographic signature. 󠇟󠇠󠇡󠇢󠇬󠇉󠇭󠄇󠇇󠆊󠄳󠇗󠆔󠇮󠇑󠅏󠄃󠇢󠇅󠇘󠆹󠆐󠇌󠄁󠇞󠆏󠅁󠅹󠅝󠇚󠅁󠆧󠄝󠄫︀󠆫󠆨󠄺󠆧󠇂󠅸󠇓󠄤󠄇We notified you. 󠇟󠇠󠇡󠇢󠄧󠇮󠄪󠆴󠅙︇󠄳󠅷󠅵󠅊󠆩󠅠󠅣󠆈󠇫󠅫󠄬󠇣󠇝󠇆󠄤󠄟󠄂󠆎󠄱󠇪󠄌󠇅󠆕󠆪󠆬󠆬󠇂󠄽︉󠆎󠄿󠄱󠄆󠇮You can verify ownership. " 󠇟󠇠󠇡󠇢󠆀󠆙󠆢󠇇󠆝󠇊󠄹󠅜󠆂󠆀󠅭󠅛󠄇󠆧󠇡︉󠅵󠅗󠅵󠅸󠅑󠆨󠇚󠇙󠄷󠅦󠄫󠅂󠆀󠆟󠆄󠇧󠄞󠆏︎︎󠇠︊󠆻󠇣- AI Company: [No viable "we didn't know" defense]
  • Result: Willful infringement territory

󠇟󠇠󠇡󠇢󠄲󠇫󠅆󠄢󠆶󠅁󠄴󠇫󠆘󠅻󠄂︊󠇤󠅰󠆔󠄮︅󠅍󠅵󠄯󠆴󠄚󠆓︂󠆠︊󠆪󠅔󠆍󠆭󠅝󠅋󠇥󠇑󠆸󠄑󠇌󠅬󠇀󠅢Why This Changes Everything

The shift from "we didn't know" to "you ignored our notice" transforms:

Aspect Before After
Burden of proof Publisher must prove AI company knew AI company must explain why they ignored notice
Damages Reduced (innocent infringer) 󠇟󠇠󠇡󠇢︅󠅊󠇡󠄙󠅸󠄒󠄽󠅔󠅸󠆬󠅘󠆣󠇜󠇯󠄇󠅲󠅙󠅭󠄈󠄋󠇓󠇍󠇜󠇘󠆊󠄰󠅈󠇍󠄦󠇇󠅻󠄯󠆊󠇑󠆮󠅪󠆹󠆄󠄰󠇘 Enhanced (willful infringement)
Discovery Expensive, uncertain Cryptographic evidence ready
Settlement leverage 󠇟󠇠󠇡󠇢󠆐󠅶󠇝󠅞󠇨󠄱󠄶󠅴󠅰󠆀󠇤󠅉󠄂󠆁󠅘󠄱󠆾󠄦󠅛󠅒󠄆󠄳󠅞󠅄󠅪󠆉󠄇󠄛󠇄󠄻󠇇󠄄󠄗󠇓󠅤󠅲󠆶󠄅󠄉󠄜 Weak Strong

󠇟󠇠󠇡󠇢󠅡󠅮󠅧󠄴󠇅󠅘󠄺󠇞󠆡󠆌󠅼󠄽󠇘󠇡󠆲󠅳󠇛󠅁󠇑󠄥󠄠󠄵󠅏󠆴󠄬󠆳󠄩󠇤󠄮︋󠄺󠅘︅︅󠄑󠅝󠅰󠅅󠇇󠅪The Three-Step Framework

For publishers looking to eliminate the "we didn't know" defense:

Step 1: Implement Provenance

Embed cryptographic proof of origin into your content. 󠇟󠇠󠇡󠇢󠄖︋󠇬󠄑󠄞󠄥󠄷󠇊󠆒󠆝︄󠄹󠅧󠆯︈󠆻󠆷󠄬󠇚󠄃󠅡󠇆󠅸󠄰󠄎󠆈󠅶󠇋󠅡󠅎󠅓󠆄󠄃󠆓󠅷󠇜󠆛󠅩󠄇󠄿This creates the technical foundation for everything else. 󠇟󠇠󠇡󠇢󠆚︁󠅰󠅑󠅮󠆄󠄶󠄻󠆥󠆤󠅔󠄨󠆒󠄑󠇈󠄋󠇟󠄼󠄴󠇌󠇠󠄜󠆥󠅣󠄃󠆌󠄍󠇅󠆷󠄗󠇛󠄶󠆝󠅣️󠄉󠅔︆󠄇󠄖Requirements:

  • Signatures must be cryptographically verifiable
  • Signatures must survive copy-paste and distribution
  • Verification must be publicly accessible

Step 2: Serve Formal Notice

Notify AI companies that your content is marked and provide verification mechanisms. 󠇟󠇠󠇡󠇢󠄿󠆖󠆭󠇂󠄲󠄲󠄱󠇆󠆘︀󠄁󠅆󠇂󠄫󠇍︉󠄰󠇑󠅉󠄋󠅵󠄟󠇦󠅼󠇧󠇝󠄃󠇂󠆔󠆆󠄜󠇓󠄳󠄭󠅒󠄠󠄮󠇕󠇧󠇆The notice should include:

  • Statement that your content carries cryptographic signatures
  • URL or API endpoint for verification
  • Clear statement that unauthorized use after notification is not innocent
  • Request for confirmation of receipt

Step 3: Document Everything

Maintain records of:

  • When content was marked
  • When notices were sent
  • Any responses received
  • Evidence of continued unauthorized use

The Industry Response

AI companies are adapting to this new reality:

OpenAI Joins C2PA

OpenAI joined the Coalition for Content Provenance and Authenticity in 2024, signaling recognition that provenance infrastructure is coming.

Google's SynthID

Google is implementing watermarking for AI-generated content and exploring provenance for training data. 󠇟󠇠󠇡󠇢󠄃󠅫󠆅󠇙󠄥󠄰󠄾󠄘󠆫󠅩󠄎󠆔󠇤󠆗󠆙󠇮󠇋󠄳󠇭󠅨󠇇󠆎󠄁󠅎󠅒󠇄󠇭󠅀󠄊󠆑󠇑󠄤󠄧󠄻󠇥󠄹󠇕󠇀󠅀󠆠### Licensing Deals

Major AI companies are signing licensing deals with publishers (AP, News Corp, Axel Springer)—partly to avoid the legal exposure that comes with provenance-enabled enforcement. 󠇟󠇠󠇡󠇢󠆦󠅐󠇅󠆧󠇅󠇧󠄺󠄰󠆮󠆿󠄳󠄵󠅍󠄳󠅚󠅍︆󠆺󠅴󠅾󠇅󠆠󠆢󠅰󠄶󠅠󠄎󠆻󠇌󠆤︈󠅯󠄙󠇁󠆤󠆰󠅜󠄒󠄰󠄝## 󠇟󠇠󠇡󠇢󠅦󠄇󠆌󠄜󠆞󠆪󠄱󠇤󠆈󠄾󠇣︁󠅜󠆞󠅴󠄙󠆩󠆭󠅪󠅥󠄧󠆋󠆳󠄪󠄍󠄁󠆎󠆇󠆞󠄂󠇗󠄲󠄋󠆵󠆩󠆆󠄦󠆑󠇆󠆹What Publishers Should Do Now

Immediate Actions

  1. 󠇟󠇠󠇡󠇢󠆣󠆗󠇇󠆫󠇝󠆎󠄽󠆷󠆧︋󠆪󠆟󠆒︊󠅫󠆴󠅧󠅒󠄰󠄎󠄆󠆁󠇕󠄨󠄝󠆞󠆊󠆱󠄓󠇛󠄧󠄚󠇓󠅚󠅍󠇛󠄹󠅁︎︊Evaluate provenance solutions that can mark your content

  2. 󠇟󠇠󠇡󠇢󠄑󠄥︆󠄂󠇄󠆤󠄱󠄟󠅼󠄵󠇌󠄁󠆂︈󠄛󠆶󠆋󠅕󠄹󠆵︄󠆬󠄁󠅗󠅖󠄐󠆻󠆳󠄙󠇉︊󠆧︇󠅍󠇝󠇉󠇊󠄤󠇟󠆝Consult legal counsel on notification strategies

  3. 󠇟󠇠󠇡󠇢󠄂󠆍󠅏󠆶󠅻󠅂󠄶󠄤󠅰󠅮󠅁󠄌󠆜󠅪󠄚󠆸󠄑󠅐󠅧󠅩󠅢󠅈󠄺󠆁󠆪󠅔󠆬󠇉󠆦󠇧󠇊󠇒󠇤󠄫󠄊󠆲󠆺󠇜󠅹󠇮Document your content with timestamps and archives ### Medium-Term Strategy

  4. 󠇟󠇠󠇡󠇢󠇫︆󠅔󠆐󠅎󠆓󠄶󠅬󠆉󠆈󠅘󠅐󠅥󠄵󠅦󠄃󠆞󠄵󠇚󠆙󠅔󠄇󠆂󠇜󠆴󠅪󠄛󠆒︊󠄇󠆡󠄓󠆈󠆛󠆖󠆌󠄾󠄙󠆳󠄂Implement cryptographic provenance across your content library

  5. 󠇟󠇠󠇡󠇢󠇁󠄐󠇠️󠄶󠅋󠄿󠄝󠅳︇󠅽︀󠄭󠇟󠄕︅󠆛󠅎󠅀󠅱󠅚󠄕󠄶︀󠄢󠇁󠅓󠇐󠇪󠆩󠄾󠅌󠄾󠄥󠄴️󠅎󠆧󠆱󠅾Develop formal notification templates for AI companies

  6. 󠇟󠇠󠇡󠇢︍󠇪︉󠇓󠄥󠄥󠄻󠇧󠆪󠅉︌󠄴󠇥󠇖︉󠆌󠇪︊󠅽︊󠅇󠇣󠆻󠅎󠄿󠄜󠇍󠇎󠄷󠆠︆󠆘󠇙󠅬󠄮︂󠄥󠄔󠅵󠆾Build verification infrastructure that AI companies can access

Long-Term Position

  1. 󠇟󠇠󠇡󠇢󠅢󠄈󠇎󠆝󠆄󠇃󠄼󠅄󠆬󠅰󠄰󠆶󠇕󠄍󠆛󠄨󠄏󠅔󠄑󠅜󠆈󠆭󠅀󠆷󠄫󠅨󠇁󠅙󠆋󠅨󠅫󠆕󠄉󠅭󠄿󠅉󠆢󠇁󠇫󠄯Establish licensing frameworks for AI training use
  2. 󠇟󠇠󠇡󠇢󠅸󠇃󠆠󠆋󠆐󠄌󠄰󠆖󠆋󠇯󠅭󠆱󠅾󠇍󠅵󠇍󠅕󠄪󠇞󠄴󠄴󠆢󠅄󠅝󠆜󠅸󠄸󠆟󠅭󠇘󠇎󠄉︍︂󠆝󠄦︆󠅣󠄺󠇩Join industry coalitions working on content attribution
  3. Prepare for litigation with provenance-backed evidence

󠇟󠇠󠇡󠇢󠄥󠇆︃󠄹󠄮󠄱󠄴󠄑󠆋󠄅︃󠅧󠇚󠅿󠅂󠄀󠄻󠅅󠄱󠆽󠅸󠅤󠅠󠄀󠅀󠇠󠆣󠅾󠇫󠇝󠆇󠄋󠄠󠆯󠆬󠄃󠆆󠅎󠅨󠇧The Endgame 󠇟󠇠󠇡󠇢︄󠆙󠄙󠆼󠆔󠅉󠄲󠅲󠆗󠄚󠅰󠄓󠇓󠅶󠆚󠅛󠆻󠅙󠅛󠅰󠅸󠆌󠄞󠅵󠅥󠄮󠄞󠄅󠅉󠄌󠅪󠆿󠄞󠆡󠇩󠅤󠆠󠅛󠄟󠅟The "we didn't know" defense has been effective because it's been true. 󠇟󠇠󠇡󠇢󠄮󠄌󠇓󠆀󠆭󠄁󠄻󠆾󠆞󠅓󠄧󠇟󠆌󠄟󠆚󠄲󠆞󠄘󠅞󠇪󠄱󠄇󠅖󠅣󠄺󠅛󠅉󠅀󠇉󠇄󠅾󠇂󠄭󠆆󠄪󠅭󠄶󠇅󠄘󠄹AI companies genuinely couldn't identify specific content among billions of documents. 󠇟󠇠󠇡󠇢󠄷󠆉󠄅󠄫󠅯󠅎󠄺󠄣󠆝󠅅󠇙󠄃︎󠄶󠄪︂󠅪󠅛󠄵󠅔󠆅󠄔󠆰︄󠆳󠆠󠄴󠄐󠄖󠄦󠄅󠄱󠆑󠆑󠄓󠄬󠅶󠇁󠅇󠇃Cryptographic provenance makes that defense obsolete. 󠇟󠇠󠇡󠇢󠄿󠅃󠆢󠇬󠆟󠅸󠄼󠆌󠆫󠄻󠆁󠇖󠅾󠄵󠆶󠅏󠄇󠅧󠅃󠆉󠆀󠆽󠆇︂󠄪󠅑󠇫󠄬󠇭󠅆︂󠄱󠇃󠇚󠆈󠄱󠅸󠆘󠆒︄When content carries proof of origin that survives any transformation, and publishers formally notify AI companies of that proof, "we didn't know" becomes "you chose to ignore. "

󠇟󠇠󠇡󠇢󠅚󠇘󠆽󠅇󠆨󠄶󠄳󠄊󠆧󠅣󠅍󠆸︎󠄖󠄱󠆩󠇦󠅃󠆜󠄥󠄆󠇔󠄝󠅭󠇣󠄯󠇏󠄃󠆞󠆲󠅙󠆼󠆭󠆀󠇝󠇬󠄻󠆋︀󠅵That's not innocent infringement. 󠇟󠇠󠇡󠇢󠇨󠆏󠆇󠇥󠆑︄󠄽󠆰󠅶︀󠄗󠄬󠄟󠄐󠄒󠆢󠅷󠇚󠅫󠇎󠅘󠇇󠅝󠄋󠅧󠄒︄󠄅󠆄󠅏︋󠅦󠆳󠆧󠆛󠆁󠇉󠄫󠄚󠅊That's willful. 󠇟󠇠󠇡󠇢󠄀󠄋󠆐󠄂󠄀󠇍󠄰󠅶󠅽󠆒󠆛󠄀󠄖󠆑󠄒󠆉󠆉󠇩󠅠󠆔󠇭󠆥󠅉󠅘󠇑󠄵󠆁󠅽󠅈󠇄󠆸󠄽󠇫󠄕󠅛󠄑󠄷󠅿󠄗󠄍And willful infringement changes everything—from damages calculations to settlement negotiations to the fundamental economics of AI training. 󠇟󠇠󠇡󠇢󠅋󠄀󠅥󠆧󠆋󠆈󠄲󠅺󠆯󠆏󠆲󠆟󠇛󠇮󠆽︎󠄜󠆀󠆌󠆁󠄥󠄪󠇖󠆗󠇠󠆾󠅟󠅡󠄑󠄥󠅗󠄳󠄣󠇍󠅰󠆢󠇟︈󠅅󠅈The publishers who implement provenance first will define how this new landscape works. 󠇟󠇠󠇡󠇢󠇉󠆰󠅾󠆰󠆲󠆻󠄱󠇓󠅹󠄕󠆃︉︃󠆥󠆰󠆨󠇖󠆼󠄄󠄉󠄛󠆊󠇞󠅶󠆖󠇊󠆠󠅮󠅭󠆀󠅮󠇁󠆜󠄬󠆁󠆣󠇊󠆓󠆓󠅒The ones who wait will accept terms others negotiated. 󠇟󠇠󠇡󠇢󠆺󠇟󠇩󠆡󠄒󠇦󠄾󠆨󠆬󠄄󠄂󠆨󠆿︄󠅝󠅱󠆊︁󠅾󠆥󠄀󠅴󠅆󠇑󠄃󠆄󠇔󠅐󠆆󠄡󠄽󠄷󠆜󠅑󠅁󠇅󠅡󠅆󠄂󠄫Learn more about eliminating the "we didn't know" defense: 󠇟󠇠󠇡󠇢󠇨󠆜󠄵󠄍󠆡󠆕󠄳󠆋󠅼󠅖󠅌󠇐󠄚󠅃󠅚󠇪󠇢󠇪󠆘󠆖󠆇󠄧󠆍︆󠄢󠇂󠆥󠅝󠆩󠅋󠅇󠄌󠆦󠅅󠇂󠄕󠄮󠇨󠄬󠄭encypherai.com/publisher-demo

#Copyright #AILitigation #FairUse #LegalStrategy #ContentProvenance󠇟󠇠󠇡󠇢󠄵󠆶󠄣󠇝󠄊󠄎󠄵󠅭󠆎󠆦󠄭󠆳󠄦︅󠄉󠇫󠄽󠅜󠆮󠇧󠇬󠄯󠄞󠇢󠇜󠄠󠅞󠇧󠅢󠆐󠇃󠅗󠄕󠅯󠇖󠇠󠆏󠄪󠆘󠅣