The Machine-Readable Opt-Out Standard Publishers Are Missing

A photographer named Robert Kneschke posted his images online with clear copyright notices in his Terms of Service. LAION, the nonprofit that built one of the largest open AI training datasets, scraped those images anyway. Kneschke sued. In December 2025, the Hamburg Higher Regional Court ruled that his plain-text copyright reservation - the kind that virtually every publisher on the internet relies on - did not meet the legal standard for a valid opt-out under EU copyright law. The court held that a reservation of rights must be machine-readable to be enforceable. The European Commission is now building its AI Act enforcement framework on the same principle - and every day a publisher delays implementing machine-readable signals is a day more of their content becomes permanently available for AI training.

This post discusses legal developments for informational purposes only and does not constitute legal advice. Encypher is a technology company, not a law firm. Consult qualified legal counsel for advice specific to your situation.

What the Hamburg Court Actually Ruled

The case - OLG Hamburg, 5 U 104/24, decided 10 December 2025 - turned on Article 4(3) of the EU Digital Single Market Directive, which allows rightsholders to reserve their rights against text-and-data mining. The question was what form that reservation must take. The court ruled that natural-language copyright notices - Terms of Service, website footers, text-based statements - are insufficient. The reservation must be expressed in a form that automated systems can detect and act on without human interpretation.

The ruling went further than many observers expected. Legal analysis from Bird & Bird confirmed that the court did not merely prefer machine-readable signals. It treated them as the only valid form of opt-out under EU law. A publisher whose sole protection is a copyright notice in prose - which describes the vast majority of publishers online today - has no enforceable reservation against AI training.

The Kluwer Copyright Blog analysis identified a second requirement that raises the bar further. The court's reasoning implies that opt-outs must be not only machine-readable but actionable - meaning an automated process must be able to use the signal to block text-and-data mining operations. A machine-readable tag that no scraping tool actually checks would satisfy the letter of the standard but not its purpose. The dual standard - readable and actionable - means publishers need signals that AI training pipelines will recognize and respect.

The ruling also contains a temporal dimension that most coverage has underemphasized. The court applied a time-of-use analysis: content scraped before a publisher implements machine-readable opt-outs remains lawfully available for AI training even if the publisher adds proper signals later. Major AI training datasets compiled between 2019 and 2023 - Common Crawl, LAION-5B, The Pile - ingested content protected only by text-based copyright notices. Under the Hamburg court's reasoning, that content is permanently unprotected. The retroactivity problem cannot be fixed after the fact. The only question is how much more content enters that unprotected window before publishers act.

The EU Regulatory Machine Is Codifying This Standard

The Hamburg ruling is a German appellate decision, not binding across the EU. A further appeal to the German Federal Court of Justice has been allowed. But the ruling does not stand alone. The European Commission is independently converging on the same standard through its AI Act enforcement machinery.

Article 53(1)(c) of the EU AI Act requires providers of general-purpose AI models to "identify and comply, including through state-of-the-art technologies, with reservations of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790." Enforcement by the AI Office begins 2 August 2026. The operative language - "state-of-the-art technologies" - mirrors the Hamburg court's insistence on machine-readability. Text-based notices are not state-of-the-art technology for expressing rights reservations.

In December 2025, the Commission launched a stakeholder consultation to identify machine-readable opt-out protocols that are "technically implementable and widely adopted." The consultation closed in January 2026. The resulting list will define what the AI Office considers compliant when it begins enforcing Article 53 in August. The GPAI Code of Practice will reference this list. The window to influence what protocols appear on it has already closed.

The regulatory and judicial systems are converging on the same principle. Whether the Commission's final protocol list adopts existing standards like C2PA content credentials or produces new ones, the requirement itself is settled: opt-out signals must be machine-readable and technically implementable.

The Counterargument and Where It Falls Short

The strongest objection to urgency is jurisdictional. The Hamburg ruling is one appellate court in one member state. Other courts may interpret Article 4(3) differently. The appeal to the German Federal Court of Justice could narrow or reverse the holding. Publishers in France, the Netherlands, or Spain are not bound by a Hamburg court's interpretation of the DSM Directive.

This objection is technically correct but does not change the practical calculus. The Commission's consultation is not contingent on the Hamburg ruling. Article 53's machine-readability requirement exists in the enacted statute. The judicial and regulatory tracks are arriving at the same destination independently. A publisher who waits for pan-European judicial consensus will wait years - and every month of waiting is a month more content enters training datasets without protection.

The second objection is that robots.txt already provides a sufficient machine-readable opt-out for web content. The GPAI Code of Practice does include robots.txt compliance as a commitment from signatory AI providers. But robots.txt has three structural limitations that prevent it from serving as a general opt-out mechanism. First, it only covers web crawling. Content distributed through RSS feeds, APIs, syndication partnerships, email newsletters, or any non-web channel is invisible to robots.txt. Second, robots.txt carries no licensing terms. It can say "do not crawl" but it cannot say "you may train on this content if you provide attribution and pay the specified rate." Third, robots.txt is a server-level directive that does not travel with content. Once an article is syndicated, copied, or cached, the robots.txt signal from the original server no longer applies. A machine-readable signal that protects content only on the publisher's own domain and only through one distribution channel is not adequate infrastructure for a multi-channel content economy.

The third objection concerns retroactivity. Future AI models will be retrained, and publishers can protect their content in those future training runs. This is true but slower than it appears. Major model retraining cycles run months to years. More importantly, the Hamburg ruling assesses legality at the time of scraping, not at the time of model deployment. Content that was unprotected when scraped remains available for training regardless of later opt-outs. The retraining cycle does not undo the retroactivity problem. It limits it to one model generation.

Content Provenance as the Technical Answer

The legal and regulatory picture converges on a specific technical requirement: rights signals that are machine-readable, that express granular licensing terms, and that travel with the content itself across distribution channels. This is what content provenance infrastructure provides.

A C2PA authentication manifest embedded at publication time creates a cryptographically signed, tamper-evident record of authorship, licensing terms, and permitted downstream uses. That record is bound to the document. It persists through crawling, indexing, syndication, and storage. When a C2PA-signed document enters an AI training pipeline, the manifest is present at the point where the system decides whether to include or exclude the content. The signal does not depend on the training operator checking a separate database or respecting a server-level directive - it travels with the content itself.

We co-authored Section A.7 of the C2PA specification, published January 8, 2026, which defines the authentication mechanism for unstructured text - the first open standard for embedding provenance manifests into text documents. The standard was developed with review from Google, OpenAI, Adobe, Microsoft, the New York Times, BBC, and AP through the C2PA consortium. A C2PA manifest operates at the document level: one signed credential per document, covering authorship, licensing, and permitted uses. Encypher's proprietary technology extends this further by enabling sentence-level attribution - binding provenance markers to individual text segments so that rights information survives even when content is excerpted, quoted, or partially reproduced. A paragraph copied from a signed article into a research report carries its provenance with it. A sentence retrieved by a RAG system retains its rights signal at the point of retrieval.

The Hamburg court required signals that are not only machine-readable but actionable - and sentence-level attribution operates at the granularity where AI systems actually consume content. Training pipelines and retrieval indexes do not process whole documents. They process chunks, passages, and extracted segments. A document-level manifest satisfies the machine-readability requirement. Sentence-level attribution satisfies the actionability requirement at the point where content is actually ingested.

What This Means for Publishers

The August 2026 enforcement date for Article 53 is widely understood as a deadline for AI companies. It is equally a deadline for publishers. AI providers will be required to identify and comply with machine-readable rights reservations. But that obligation is only meaningful if the content carries such reservations. A publisher whose content contains no machine-readable rights signal has no right for AI providers to comply with. The enforcement mechanism protects publishers who have implemented the infrastructure. It does nothing for those who have not.

The practical steps are concrete. Publishers should audit their current opt-out mechanisms and identify how much of their content relies solely on text-based copyright notices. They should implement machine-readable rights signals - C2PA content credentials are purpose-built for this - on new content immediately, so that at minimum all content published from this point forward carries enforceable reservations. They should evaluate whether their distribution channels preserve or strip these signals. And they should track the Commission's forthcoming protocol list, because the standards it endorses will define what "state-of-the-art" means under Article 53 for years to come.

The Hamburg court has drawn the line. The Commission is building enforcement around the same principle. The open question is how much content enters AI training datasets, permanently and irrevocably, before publishers implement the signals that would have protected it.