DOI: 10.5281/zenodo.18821373 (concept)
Author: Bernard Lynch | ORCID: 0009-0007-6170-7818
Publisher: AI Visibility Architecture Group Limited | NZBN: 9429053354075 | Wikidata: Q137998218
Machine Expression (SNP): https://aivisibilityarchitects.com/snp/Signal_Native_Papers_v2_0.snp.json
Archive: https://web.archive.org/web/2026/https://aivisibilityarchitects.com/signal-native-papers/
Version: 2.0 | License: CC BY-NC-ND 4.0
A Bilateral Document Architecture for AI-Era Publishing
Bernard Lynch — Founder & Principal Researcher, AI Visibility Architecture Group Limited, Auckland, New Zealand
This paper specifies the Signal-Native Paper (SNP) architecture: a bilateral document structure in which a published work exists simultaneously as a human-readable paper and a machine-readable companion, sharing a single identity. It describes the relationship between the two expressions, the thirteen elements of the machine expression, the three-tier adoption gradient, and the AI-assisted authoring process by which authors create SNPs without changing how they write.
Abstract
Every document on the web was written for humans. When an AI system needs to understand one, it must infer structure, extract entities, guess at relevance, and estimate confidence from raw text. This paper introduces Signal-Native Papers (SNP), a bilateral document architecture in which every published work carries two co-equal expressions: the human-readable text that authors have always written, and a machine-readable expression that describes the work’s scope, entities, and retrieval boundaries in structured form.
The architecture comprises thirteen elements organised in three tiers, governed by two principles. It addresses the documented failure modes of seven predecessor approaches to machine-readable publishing by keeping the paper whole, eliminating authoring burden through AI-assisted generation, and using universally consumable formats. A Tier 1 proof of concept has been implemented and validated. The full specification is maintained by AI Visibility Architecture Group Limited.
Contents
- The Problem
1.1 Who This Paper Is For - What a Signal-Native Paper Is
2.1 The Three Architectural Options - The Machine Expression and Existing Site Schema
3.1 What Site Schema Does
3.2 What the SNP Machine Expression Does
3.3 How They Work Together - The Thirteen Elements
4.1 Tier 1 — Seed
4.2 Tier 2 — Structured
4.3 Tier 3 — Networked
4.4 Governing Principles - How an SNP Is Created
5.1 The Author Writes
5.2 AI Generates the Machine Expression
5.3 The Author Ratifies - Publication and Discovery
6.1 Where the Machine Expression Lives
6.2 How AI Systems Find and Use It - What This Changes
7.1 For RAG Systems
7.2 For Authors
7.3 For Consuming AI Systems
7.4 For Knowledge Accuracy - The Discovery Resolution Problem
8.1 Mechanism 1: The Canonical Website as AI Entry Point
8.2 Mechanism 2: Identity Matching Across the Web
8.3 Mechanism 3: DOI as Resolution Bridge
8.4 Mechanism 4: In-Paper Pointer
8.5 Mechanism 5: The Signal Mesh Beacon
8.6 The Resolution Architecture in Practice - What This Specification Is Not
- Prior Art and Differentiation
- Proof of Concept
11.1 Tier 1 Example Fragment
11.2 Quality Safeguards
11.3 Implementation Status
Appendix A: Critical Review and Author Response
1. The Problem
Every document on the internet was written for humans. When an AI system needs to understand that document, it must work backwards from human prose to extract structure, meaning, intent, and reliability. This extraction process is lossy, expensive, and unreliable.
The central claim of this specification is testable: a document that describes itself to AI systems in structured form is more accurately comprehended, more reliably retrieved, and more honestly represented than one that does not. If that claim is true, every consequence described in this paper follows from it. If it is false, no amount of architectural elegance saves the specification. Everything in this document serves that primary result.
The Retrieval-Augmented Generation (RAG) industry exists in part because documents provide no structural help to the systems consuming them. A RAG pipeline takes a document, splits it into chunks of arbitrary length, converts those chunks into vector embeddings, stores them in a database, then at query time searches those embeddings for semantic similarity to the user’s question. This process is blind to the document’s own structure. The system doesn’t know which chunks are conclusions versus methodology. It doesn’t know whether the document answers the query directly or only tangentially. It doesn’t know if the data is from 2019 or 2025. It guesses. Embeddings, vector search, and ranking systems will continue to operate regardless — they are core infrastructure. The problem is not that these systems exist. The problem is that documents give them nothing to work with except raw text.
Meanwhile, the document sits there containing everything the AI system needs to know — what it argues, how confident the author is, who it’s written for, when its data expires, what it doesn’t cover. All of this is encoded in human prose that the AI must decode. Every AI system that encounters the document repeats the same decoding work independently, and each one decodes it slightly differently.
The Signal-Native Paper addresses this from the document side by making the document bilingual from birth. The author’s paper is written for humans. A parallel machine expression is generated for AI systems. Both exist at the same location, under the same identifier, as two expressions of the same work.
1.1 Who This Paper Is For
This paper addresses four audiences. Authors and publishers who want their work to be accurately represented by AI systems should read it for the bilateral architecture and the three-tier adoption gradient. AI retrieval and infrastructure engineers building RAG pipelines, search systems, or knowledge graphs should read it for the retrieval contract, entity registry, and machine expression format. Researchers in semantic publishing, knowledge representation, and information science should read it for the prior art differentiation and the architectural choices that distinguish the SNP from nanopublications, micropublications, ORKG, and executable papers. Organisations evaluating AI Visibility Architecture should read it as evidence that the methodology extends from site-level schema through canonical anchoring to document-level structural signalling. Each audience will find different sections most relevant, but the specification is designed to be read as a complete architectural argument.
2. What a Signal-Native Paper Is
A Signal-Native Paper is a published work that exists as a single intellectual object with two co-equal native expressions. The human expression is the author’s paper — the prose, the argument, the tables, the citations, exactly as the author wrote them. The machine expression is a structured JSON-LD document that describes the work in terms AI systems can parse directly — what the paper answers, who it’s for, what entities it references, how authoritative its claims are, and when its data expires.
The machine expression is structurally derived from the human expression but is not a rewrite of it. It does not reproduce, rephrase, or summarise the author’s prose. It describes the work’s structure, scope, entities, and claims in a parallel format that AI systems can parse directly. Neither expression is subordinate. The human expression is the author’s original work. The machine expression is a structural description of that work, generated from the prose, reviewed and ratified by the author. Together they give an AI system the fullest possible understanding of the work. Apart, each still functions — the paper is still a paper, the machine expression still tells AI systems what the paper answers. The value is additive, not conditional.
The two expressions do not need to be physically co-located. The machine expression may live on the author’s website. The human expression may live on Zenodo. They are connected by shared identity — the same DOI, the same author, the same version — not by proximity. The web is the container.
The machine expression does not rewrite, rephrase, summarise, or alter the human text in any way. Not a word. Not a comma. The author’s paper goes in, the author’s paper comes out unchanged. What comes out alongside it is a structured description that makes the paper legible to AI systems without modifying what the paper says.
2.1 The Three Architectural Options
Three approaches exist for relating human-readable and machine-readable content. Understanding all three clarifies why the SNP takes the approach it does.
Embedded. Machine-readable signals are baked directly into the document itself — structured data in the HTML, semantic markup, metadata headers. This makes the document self-contained. Every copy on every platform carries everything it needs. But it is frozen at the moment of publication. The machine layer cannot evolve independently of the human text. Durability at the cost of adaptability.
Companion. A separate machine-readable file points to the document it describes. This provides depth and evolution. The companion file can be updated, enriched, and versioned independently. In a human distribution context, this introduces fragility — someone could download the paper without the companion. But in an AI search context, this fragility disappears. AI systems crawl the web. If the machine expression exists anywhere on the web and declares which paper it describes, AI systems will find both. Physical separation is a human problem, not a machine problem.
Bilateral. The paper and the companion are not two separate things but two expressions of the same thing. They share an identity — same DOI anchor, same versioning, same entity relationships. AI systems that encounter either expression understand they have found two faces of the same object.
The SNP specification adopts the bilateral principle. In practice, for AI search, this is implemented as a JSON-LD file published on the web — typically on the author’s website, where it serves as the AI-facing beacon for the work. The human expression may live elsewhere: on Zenodo, in a repository, on a publisher’s platform. The two expressions do not need to be at the same location. They share an identity, not an address.
The structural identity flows from the human expression to the machine expression, not the reverse. The human paper is published first. It receives its canonical URL and DOI. The machine expression is then generated, and it references the human paper’s identifiers as its foundation. In the JSON-LD structure, the machine expression declares itself as type snp:MachineExpression and uses an snp:describes property pointing to the ScholarlyArticle — the human paper’s identity. The machine expression’s own @id is its service URL on the author’s website. The human paper’s @id is its DOI or canonical URL. This hierarchy means an AI system reading the machine expression sees immediately that it is a structural description of a specific work, not the work itself.
3. The Machine Expression and Existing Site Schema
Websites that implement structured data already publish machine-readable information. Schema.org markup, JSON-LD blocks, and meta tags describe pages, organisations, and content types. In the AI Visibility Architecture methodology, this takes the form of a 12-block schema architecture that declares site identity, organisational credentials, breadcrumb structure, and page-level metadata.
The SNP machine expression is categorically different from this site-level schema. Understanding the distinction is essential.
3.1 What Site Schema Does
Site schema describes the container. It tells AI systems what type of page this is, who published it, what organisation stands behind it, what the site’s navigation structure looks like. This information is true of every page on the site. It is infrastructure-level identity.
A 12-block schema on aivisibilityarchitects.com tells an AI system: this page exists, it’s published by AI Visibility Architecture Group Limited, Bernard Lynch is the founder, the company registration number is 9395521, the NZBN is 9429053354075. That’s the same whether the AI encounters the homepage, a blog post, or a white paper landing page.
3.2 What the SNP Machine Expression Does
The SNP machine expression describes the work. Not the page. Not the site. The specific intellectual contribution of a specific paper.
For the paper AI Search Is Here, the machine expression declares: this paper answers twelve specific questions (and explicitly does not answer seven others). It is written for six defined audiences with specific relevance notes for each. The data is current to January 2026. These particular statistics are changing rapidly and should be rechecked within six months. The structural analysis is stable for twelve months. The authority level is practitioner analysis, not peer-reviewed research, and here are four specific limitations. Six named companies are documented as casualties with exact claimed impact figures over exact periods. The paper contributes four original frameworks: a 20-category website impact taxonomy, an 8-sector industry assessment, a four-options decision framework, and a seven-phase displacement timeline.
No site schema can carry any of this. It was not designed to. Site schema and SNP machine expressions serve entirely different functions.
3.3 How They Work Together
The site schema establishes trust. It says: this source is legitimate, here is who stands behind it.
The SNP machine expression establishes utility. It says: here is exactly what this source is claiming, how reliable those claims are, and how to use them correctly.
An AI system encountering a website reads the site schema first to assess whether the source is credible. If it then encounters an SNP machine expression for a specific paper on that site, it reads the machine expression to understand what the paper contributes, whether it is relevant to the current query, and how to represent its claims accurately. The first layer is recognition. The second layer is comprehension.
The two are complementary and non-overlapping. The site schema does not need to know that SNP machine expressions exist. The SNP machine expression does not duplicate site-level identity. Each does its job.
4. The Thirteen Elements
The SNP machine expression contains up to thirteen elements organised in three tiers. Each tier is a complete, valid SNP at its level. Each provides standalone value. Authors enter at the tier that matches their capacity.
4.1 Tier 1 — Seed
Minimum viable SNP. Achievable in minutes. Provides immediate single-document value.
1. Authorship Parity Declaration. The SNP formally declares that its human and machine expressions represent the same work. The two expressions are bound together so that separating, altering, or orphaning either one is detectable. The binding uses the strongest persistent identifier available: a DOI if one exists, a canonical URL if not, or an ORCID-linked author plus unique title as fallback. The stronger the identifier, the stronger the resolution across platforms — but the architecture functions at every level. A DOI is not required. This is the definitional element. Without it, you have a paper with metadata. With it, you have an SNP.
2. Retrieval Contract. What this work answers, for whom, at what authority level, with what temporal currency, and what it does not cover. Expressed as natural-language queries the work addresses, audience roles, evidence types, temporal scope, and exclusion boundaries. Any AI system can read the retrieval contract and assess relevance without processing the paper’s text. This is the element that addresses the RAG problem from the document side — not by replacing retrieval infrastructure, but by giving it structured input instead of raw text.
3. Entity Registry. Every significant entity in the work — people, organisations, concepts, datasets, methodologies — declared once with persistent identifiers and roles. The work’s own knowledge graph, curated by the author.
4.2 Tier 2 — Structured
Full argumentative structure. Requires engagement with the machine expression. Provides verifiability and temporal intelligence.
Tier 2 includes all of Tier 1, plus:
4. Argument Topology. The complete reasoning structure of the work as a machine-traversable graph. Claims, evidence, methodological assumptions, and the dependency chains between them. The topology exposes not just what the work argues but how the argument holds together and where it is structurally vulnerable.
5. Source Verification Chain. Every claim in the argument topology links to its evidential origin through a machine-traversable path: Claim → Data Point → Dataset → Method → Date → Parameters. The argument topology maps how claims relate to each other. The verification chain maps how claims relate to the world.
6. Confidence Architecture. Every claim carries an author-declared confidence level: established fact, strong evidence, emerging evidence, informed inference, speculative projection, or hypothesis. Each level links to the evidence that justifies it. This prevents AI systems from treating all statements with equal authority.
7. Temporal Anchor Set. Every time-sensitive statement declares its time-dependency type: point-in-time measurement, directional trend, jurisdiction-and-date-specific, or time-independent. Invalidation conditions say when something breaks. Temporal anchors say what kind of relationship each statement has with time.
8. Reasoning Modality Declaration. Each claim pathway declares its reasoning type: empirical (from observed data), analytical (from logical operation), normative (from values or policy), comparative (from analogy or precedent), or projected (from model or forecast). A paper that says “revenue grew 14%” and then says “therefore the company should expand into Asia” is making two fundamentally different kinds of statement. Without modality declaration, AI systems treat both identically.
To clarify the relationship between these elements: the argument topology is structure — how the claims connect. The source verification chain is grounding — where the claims come from. The confidence architecture is author stance over that grounding — how certain the author is. The temporal anchor set is time-dependency — when the claims expire. The reasoning modality declaration is epistemological type — what kind of knowledge each claim represents. Together they give a consuming AI system a complete structural account of the work’s argument.
4.3 Tier 3 — Networked
Full ecosystem participation. Designed for a future where multiple SNPs exist. Value at this tier depends on adoption beyond a single author.
Tier 3 includes all of Tier 1 and Tier 2, plus:
9. Invalidation Conditions. Machine-readable conditions under which specific claims become unreliable. Not expiry dates. Testable conditions: “If the Reserve Bank base rate exceeds 5.5%, the projections in Section 4 are invalidated.” The work defines when it stops being true, claim by claim.
10. Signal Mesh Position. How this work relates to other works in the knowledge architecture. Typed relationship declarations: supersedes, extends, contradicts, replicates, shares methodology with, shares dataset with. Includes the beacon function — the work announcing its existence and scope to discovery systems. Today, these declarations are one-directional: the author declares a relationship to a non-SNP work. If adoption grows and other works also carry machine expressions, these relationships become bidirectional and machine-navigable. That is a strategic goal, not a given.
11. Version Lineage. Content-addressed version history with semantic diffs describing what changed in the argument between versions, not just what text changed. Any system encountering any version can discover every other version and understand the evolution.
4.4 Governing Principles
12. Interoperability Spine. The machine expression uses JSON-LD with Schema.org vocabulary, extended by a minimal SNP-specific vocabulary. This ensures the expression is simultaneously valid JSON parseable by any programming language, a W3C linked data standard, embeddable in HTML, indexable by search engines, ingestible by knowledge graphs, and directly readable by LLMs in context windows. No specialist tooling is required to consume an SNP.
Where established vocabulary exists that expresses what the SNP needs to say, the specification uses it rather than inventing new terms. For citation intent, CiTO vocabulary from the SPAR Ontologies (Peroni and Shotton) provides proven terms. For bibliographic resource description, FaBiO serves the same function. This is good practice, not dependency. The SNP does not require SPAR, import its ontological framework, or adopt its RDF architecture. It borrows labels where they are already understood by existing systems. Where no established vocabulary covers what the SNP needs to express — retrieval contracts, confidence architecture, temporal anchors, reasoning modality, invalidation conditions, human primacy guarantees — the SNP defines its own minimal vocabulary. The governing principle is: use what exists, invent only what must be invented.
13. Human Primacy Guarantee. Where the human expression and the machine expression conflict, the human expression governs. The human text is the legally and intellectually authoritative version. The machine expression is the structurally authoritative version. This protects authors, prevents machine-layer manipulation, and establishes a clear resolution hierarchy. Operationally, consuming AI systems should treat any conflict between the two expressions as a machine-layer error and fall back to the human text plus site schema as the authoritative source.
5. How an SNP Is Created
Every prior attempt at machine-readable publishing failed on the same problem: authoring burden. Nanopublications required authors to write RDF triples. ORKG required authors to fill in structured templates after writing their paper. Micropublications required stand-off annotation layers. Each system asked authors to change how they write or to do significant additional work after writing. Adoption remained low in every case.
The SNP specification is honest about human behaviour. Authors will continue to write as they have always written. In Word. In Google Docs. In LaTeX. In whatever they know. They will not switch tools. They will not adopt new workflows. This is not a failure of will. It is rational behaviour. Writing is hard enough.
The SNP therefore uses an AI-assisted authoring model. The process has three steps.
5.1 The Author Writes
The author writes their paper exactly as they always have. No special tools. No special format. No awareness of the SNP specification is required during writing. The paper is a finished human document when this step is complete.
5.2 AI Generates the Machine Expression
An AI system reads the finished paper and generates the machine expression. For Tier 1, this means drafting the retrieval contract from the title, abstract, introduction, and methodology; building the entity registry from named entity recognition, DOI extraction, and persistent identifier lookups; and creating the authorship parity declaration binding both expressions.
The generation capability varies by tier:
| Tier | AI Contribution | Author Contribution | Additional Time |
|---|---|---|---|
| Tier 1: Seed | 90% — AI generates | 10% — Author reviews | Minutes |
| Tier 2: Structured | 60% — AI drafts | 40% — Author collaborates | 30–60 minutes |
| Tier 3: Networked | 40% — AI assists | 60% — Author leads | Varies by complexity |
The key insight is that the machine expression does not require the author to express anything they do not already know. The retrieval contract describes what the paper answers — the author knows this. The entity registry catalogues what the paper references — the author knows this. The argument topology maps how the claims relate — the author knows this. The AI system’s role is to read the prose and draft a structural representation of knowledge the author already possesses. The author’s role is to verify that the AI got it right.
5.3 The Author Ratifies
The author checks the generation prompt and the output. If the correct paper was provided as input and the machine expression accurately describes it, the machine expression is ratified. The snp:ratifiedByAuthor field is set to true and the snp:ratificationDate is recorded. This is a verification check, not an editorial process. The quality gate is the generation itself — the right input producing the right output.
The SNP is born at the moment of ratification — the moment the author confirms that the prompt and output are correct. The ratification flag records that this check happened. It is not a ceremony. It is a quality record.
This process preserves the core SNP principle: both expressions carry the author’s authority. The human expression was written by the author. The machine expression was generated by AI from the author’s paper and verified by the author. Both are the work. The parity is in authority, not in creation method.
6. Publication and Discovery
The SNP machine expression is published for AI discovery, not for human navigation.
6.1 Where the Machine Expression Lives
The machine expression is a JSON-LD file published at a predictable URL on the author’s website or the paper’s canonical domain. It is not linked in any navigation menu. No human needs to click on it. It exists in the structured data layer of the website, discoverable by AI systems that crawl the site, read JSON-LD, and process structured metadata.
The human expression lives wherever it lives — on Zenodo with a DOI, on a publisher’s platform, in a GitHub repository. It does not need to know the machine expression exists. The machine expression knows the human expression exists. It carries the DOI, the canonical URL, the identifying metadata. An AI system that finds the machine expression can follow those references to retrieve the human text from wherever it is published. An AI system that already has the human text indexed and then encounters the machine expression gains a structured understanding of a document it already knows. Either direction works. The bilateral principle is maintained not by co-location but by declaration.
6.2 How AI Systems Find and Use It
The direction of discovery is reversed from what most people assume. In traditional publishing, the human finds the paper, and the machine expression (if it exists) is secondary. In AI search, the machine expression is found first. The AI system reads the retrieval contract before it ever opens the paper. This is machine-first discovery — the AI system encounters the structured description of the work before encountering the work itself, assesses relevance before retrieval, and requests the human expression only when the machine expression confirms the work is worth reading.
The discovery protocol:
1. Encounter. The AI system encounters the machine expression while crawling a website, indexing a repository, or querying an API.
2. Relevance assessment. The system reads the retrieval contract and compares the paper’s declared query scope against the user’s question. This is a structured comparison against explicit queries, not vector similarity against chunked text. It is faster, cheaper, and more accurate.
3. Authority check. The system reads the authority declaration. Is this peer-reviewed research, practitioner analysis, or expert commentary? Does the authority level match what the user needs?
4. Currency check. The system reads the temporal currency signals. When was the data collected? Which elements are changing rapidly? Has the recommended recheck interval passed?
5. Selective retrieval. Only if the paper passes relevance, authority, and currency checks does the system retrieve the human text. When it does, the machine expression has already told it what the paper argues and how the argument is structured, dramatically reducing the processing burden.
The machine expression is the anchor. The paper is the payload. The AI decides whether it needs the payload based on the anchor — before it processes a single word of prose. This is a discovery sequence, not an identity hierarchy. The human paper remains the work. The machine expression remains a description of that work. But in AI-mediated discovery, the description is encountered first.
7. What This Changes
7.1 For RAG Systems
Today: document → chunk → embed → store → search → rank. Every document is treated as raw text. The system has no prior knowledge of what the document contains, what it argues, or how reliable its claims are. Embeddings and vector search do the heavy lifting, but they operate without structural guidance from the document itself.
With SNP: machine expression → assess relevance → check authority → check currency → retrieve selectively. The document introduces itself before the system processes it. Chunking and embedding still occur, but the machine expression provides a structural pre-filter that improves retrieval accuracy — the system knows what the document answers before it searches the document’s text.
7.2 For Authors
No change to the writing process. The additional work at Tier 1 is minutes of review. At Tier 2, the review process itself provides value as an “argument mirror” that shows authors their own reasoning structure in a way prose never does. The machine expression is not just a publishing requirement. It is a thinking tool.
7.3 For Consuming AI Systems
An SNP looks like a regular document to systems that don’t understand it. The human expression is complete on its own. To systems that do understand it, the SNP is a structured knowledge object that reduces guessing. Graceful degradation, not all-or-nothing.
7.4 For Knowledge Accuracy
When an AI system retrieves from an SNP, it knows the author’s confidence level for each claim, the reasoning type behind each statement, the temporal context of each data point, and the explicit limitations of the work. The AI system can represent the paper honestly because the paper told it how to. Misrepresentation doesn’t disappear, but the author has given the machine every tool to avoid it.
Taken together, these changes reveal that the SNP is simultaneously several things: a practical discoverability tool for individual authors, a trust architecture for AI systems, a signal framework for knowledge networks, and a structural reform of how documents participate in their own comprehension. These are not competing identities. They are a sequence. Tier 1 delivers discoverability for a single document today. Tier 2 delivers verifiable trust architecture. Tier 3 delivers networked signal intelligence. Publishing reform is the cumulative outcome that nobody needs to vote for — it arrives as a consequence of documents that work better.
8. The Discovery Resolution Problem
A paper published through the AI Visibility Architecture methodology is distributed across up to 140 platforms. The human expression exists on Zenodo, GitHub, ORCID, ResearchGate, the W3C Community Group, IETF Datatracker, academic repositories, and dozens of other platforms. The machine expression exists in one place: the author’s website.
This creates a discovery resolution problem. An AI system that encounters the paper on platform 87 of 140 has no reason to look for a machine expression. It does not know one exists. It processes the paper the old way — chunking, embedding, guessing. The machine expression only works for AI systems that encounter it directly.
The problem is asymmetric. The machine expression knows about the human expression — it carries the DOI, the title, the author, the canonical source URL. But the human expression on those 140 platforms knows nothing about the machine expression.
There are five mechanisms to close this gap, operating at different layers. Used together, they create a robust resolution architecture. In brief: (1) the canonical website serves the machine expression to AI crawlers; (2) AI systems match the machine expression to paper copies across the web by title, author, and identifier; (3) the DOI landing page points to the machine expression; (4) the paper itself contains a URL pointing to its machine expression; (5) a future discovery registry aggregates machine expressions for lookup. Mechanisms 1, 2, and 4 work today. Mechanism 3 depends on platform support. Mechanism 5 requires ecosystem development.
8.1 Mechanism 1: The Canonical Website as AI Entry Point
The machine expression lives on the author’s canonical website as a JSON-LD file at a predictable URL, not linked in navigation, not visible to human visitors, but fully discoverable by AI systems crawling the site. The website’s existing structured data architecture — its sitemap, its robots.txt, its schema.org markup — surfaces the machine expression as a first-class object. AI systems that crawl the site index the machine expression alongside the site’s other structured data.
This is the primary mechanism. The author’s website is the home of the machine expression. The 140 platforms are distribution points for the human expression. AI systems encounter the machine expression by crawling the canonical site, and the human expression by crawling the distribution platforms. The website is the anchor; the platforms are the reach.
8.2 Mechanism 2: Identity Matching Across the Web
AI systems do not process documents in isolation. They build indexes, knowledge graphs, and entity maps across everything they crawl. When an AI system indexes the machine expression from the author’s website, it stores the title, author, DOI, and identifying metadata. When that same system later encounters the human expression on Zenodo, ResearchGate, or any other platform, it matches by title, author, and DOI — recognising that it already holds a machine expression for this work.
This is how search engines already work. Google encounters structured data on one site and uses it to enrich its understanding of content it finds elsewhere. AI systems do the same. The machine expression on the canonical site enriches the AI system’s understanding of every copy of the paper it finds across the web, regardless of where it finds them.
This mechanism is probabilistic, not guaranteed. It depends on the AI system having crawled the canonical site before encountering the paper elsewhere. But the more prominent the canonical site is in the AI system’s index — through strong site schema, consistent structured data, and regular content updates — the more likely this matching occurs.
8.3 Mechanism 3: DOI as Resolution Bridge
The DOI is the one identifier that persists across all 140 platforms. Every copy of the paper on every platform carries the same DOI. The machine expression also carries this DOI. This makes the DOI a natural resolution bridge.
When the DOI resolves to a landing page (typically on Zenodo or the publisher’s site), that landing page can include a reference to the machine expression — either as a related link, a metadata field, or a JSON-LD block embedded in the page. Any AI system that follows the DOI finds not just the paper but a pointer to its machine expression. This transforms the DOI from a document identifier into a bilateral work identifier.
8.4 Mechanism 4: In-Paper Pointer
The human expression itself can carry a reference to its machine expression. A line in the paper’s front matter, metadata section, or access notice that states: “The machine expression for this work is available at [URL].” This is a simple, durable pointer that travels with every copy of the paper across all 140 platforms.
This is the lowest-technology mechanism, but it is robust. An AI system reading the paper’s text encounters the URL and can follow it. Even if the AI system does not understand SNP conventions, it can fetch the URL and discover structured data at the other end. The pointer does not depend on any platform preserving metadata fields. It is plain text in the document itself.
In the AIVA deployment, Mechanism 4 is implemented as the resolution block — a six-line identity block placed at the very top of every paper, above the title, formatted at 5pt, right-aligned, in black. The block contains the paper’s DOI (Concept), the author’s ORCID, the publisher’s NZBN and Wikidata identifier, the direct URL to the machine expression, the Wayback Machine archive URL, and the version and licence. Every identifier uses an established standard that AI systems already understand. Because the block is embedded in the .docx itself, it travels with the paper to every platform it is uploaded to.
The resolution block is not metadata in the traditional sense. It is plain text, visible to human readers and machine readers alike. A human reader’s eye goes past it to the title. An AI crawler reading top to bottom parses six functional identifiers before encountering a single word of the paper’s argument.
8.5 Mechanism 5: The Signal Mesh Beacon
The machine expression includes a Signal Mesh Position element (Tier 3) that functions as a beacon — the work announcing its own existence, scope, and retrieval contract to discovery systems. At Tier 1, the machine expression on the canonical website already serves a beacon function through the site’s structured data layer.
If the SNP architecture achieves adoption beyond a single author, dedicated discovery registries could aggregate machine expressions from multiple authors and sites, creating a searchable index of SNPs. An AI system could query this registry before processing any document, asking: does a machine expression exist for this work? This is analogous to how DOI resolvers work — a centralised lookup that connects identifiers to resources across the decentralised web. This registry does not exist today. Building it is a strategic undertaking that depends on demonstrated single-document value driving adoption.
8.6 The Resolution Architecture in Practice
These five mechanisms are not alternatives. They are layers of a single resolution architecture, ordered from most immediately implementable to most architecturally complete.
Mechanism 1 (canonical website) is available today. The author publishes the machine expression on their site. AI systems that crawl the site find it.
Mechanism 2 (identity matching) is available today. AI systems already cross-reference content across domains by matching titles, authors, and identifiers.
Mechanism 3 (DOI bridge) is available on platforms that support related identifiers or custom metadata. Zenodo supports this today through its related identifiers field.
Mechanism 4 (in-paper pointer) is available today. It requires only that the author include a URL in their paper’s text. This is the most durable mechanism because it is embedded in the human expression itself and survives every format conversion and platform migration.
Mechanism 5 (signal mesh beacon and discovery registry) is the long-term architectural goal. It requires ecosystem development.
The practical recommendation for authors publishing today: implement Mechanisms 1, 2, and 4 immediately. Publish the machine expression on your canonical website. If a DOI exists, ensure it appears in both expressions for cross-referencing. Include a pointer URL in the paper’s text. These three actions are sufficient for AI systems to resolve the relationship between 140 copies of the human expression and one machine expression, using capabilities those AI systems already possess.
A note on identifiers: the DOI is the strongest resolution bridge because it is persistent, globally unique, and understood by every platform and AI system. If a paper has a DOI, the resolution architecture works at maximum strength. But most documents on the web do not have DOIs. White papers, blog posts, reports, internal documents, policy briefs — the vast majority of published work that would benefit from an SNP machine expression has no DOI and never will. The specification does not require one. Of the five mechanisms, only Mechanism 3 depends on a DOI. Mechanisms 1, 2, 4, and 5 work without it.
9. What This Specification Is Not
It is not a file format. The thirteen elements can be implemented in multiple technical architectures.
It is not a publishing platform. Any platform can publish SNPs if it supports the specification.
It is not a replacement for peer review, editorial judgment, or scholarly norms. It makes those human processes machine-legible.
It is not dependent on any single technology beyond JSON-LD for the interoperability spine.
It is not all-or-nothing. It is a gradient from seed to structured to networked, with real value at every tier. Tiers 1 and 2 deliver that value for a single author today. Tier 3 delivers its full value only if other authors and platforms adopt the architecture — a strategic outcome that must be earned, not assumed.
It does not rewrite, alter, summarise, or interfere with the author’s text. The human expression is sovereign.
Positioned precisely, the SNP is a document-side structural enhancement for AI retrieval and knowledge integrity. It operates as five functional layers: a structural intent declaration layer (what the work argues and how), a retrieval pre-filter layer (what the work answers and for whom), a claim reliability signalling layer (how confident the author is and why), a time validity signalling layer (when the data expires and what kind of temporal claim each statement makes), and an author-governed trust metadata layer (who wrote it, what entities are involved, what the limitations are). It is not a new search engine. It is not a universal knowledge protocol. It is not a competitor to embeddings or vector search. It sits upstream of those systems as structured input that makes them work more accurately.
10. Prior Art and Differentiation
The SNP builds on and differentiates itself from seven streams of prior work in machine-readable publishing.
Nanopublications (Groth, Gibson & Velterop, 2010; Kuhn et al.) decompose papers into individual RDF assertions with provenance. The SNP learns from their content-addressed identification and provenance model but preserves the complete argument structure that nanopublications atomise. Nanopublications require RDF literacy; the SNP generates its machine layer from the author’s existing prose.
Micropublications (Clark, Ciccarese & Goble, 2014) model argument structure as claim-evidence-support-challenge relationships. The SNP adopts this insight that science is arguments, not just assertions, and extends it from biomedical communications to all domains. Micropublications are stand-off annotations layered onto papers; the SNP machine expression is a co-equal representation.
SPAR Ontologies and Semantic Publishing (Shotton, 2009; Peroni & Shotton) created vocabularies for describing publishing in RDF, including CiTO for citation typing and DoCO for document components. The SNP builds on these vocabularies rather than reinventing them, extending with temporal validity signals and AI retrieval mechanisms that predate the LLM era.
Open Research Knowledge Graph (Auer et al.) provides template-based structured descriptions of research contributions. The SNP learns from ORKG’s template approach and extends it by embedding structured description into the document’s lifecycle rather than maintaining it in a separate system. ORKG requires post-publication curation; the SNP generates its machine expression at the point of publication.
Executable Papers (eLife Reproducible Document Stack, Stencila) made research computationally reproducible by bundling code, data, and text. The SNP extends verification beyond computational reproducibility to all forms of claims, whether computational or not.
Digital Twin Principles from physical asset management provide the conceptual architecture: bidirectional data flow between physical and digital representations, state signalling, co-equal representations, and living, updating objects. The SNP applies digital twin thinking to documents — a connection not previously made in the publishing or information science literature.
FAIR Principles (Wilkinson et al., 2016) established that data should be Findable, Accessible, Interoperable, and Reusable. The SNP operationalises FAIR for complete published works rather than datasets, and extends findability from metadata keywords to structured retrieval contracts that describe what a document answers.
No prior work proposes a general-purpose bilateral document architecture designed for AI-era retrieval. Domain-specific workflows exist — JATS XML in biomedicine, semantic publishing pipelines in linked data communities, knowledge graph-first authoring in specialised tools — but none produce a document that carries its own retrieval contract, entity registry, confidence architecture, and temporal validity signals as native outputs of the publication process, across all domains, consumable by any AI system without specialist tooling.
As of early 2026, this gap remains confirmed. Knowledge Pixels (Kuhn, founded 2022) is commercialising nanopublications through publisher pilots with IOS Press and Pensoft, but the approach remains decomposition-first. ORKG now uses LLMs for hybrid human-machine curation at scale, but operates post-publication. The RAG industry is converging on the problem from the receiver side — TreeRAG builds hierarchical summaries, GraphRAG extracts entity relationships, Retrieval And Structuring paradigms construct knowledge graphs from unstructured documents — all reconstructing after ingestion what the SNP proposes documents should carry at birth. The Croissant-RAI metadata format (2024) addresses machine-readable dataset documentation using Schema.org, but targets datasets rather than scholarly argument. No current system proposes author-declared retrieval contracts, bilateral document architecture with shared identity, or AI-assisted generation of machine expressions from finished prose. The gap identified in this specification remains unoccupied.
11. Proof of Concept
A Tier 1 machine expression has been generated for the white paper AI Search Is Here: Google Was Here 20 Years. Now AI Is Here (Lynch, 2026). The process demonstrated the AI-assisted authoring model in practice.
The paper was written in Word. No special authoring tools were used. No awareness of the SNP specification was required during writing. After completion, an AI system read the full text and generated a JSON-LD machine expression containing all three Tier 1 elements.
The retrieval contract declared twelve questions the paper answers and seven it does not. Six audience types were identified with relevance notes. Authority was declared as practitioner analysis with four named limitations. Temporal currency was separated into rapidly-changing statistics (recommended six-month recheck) and stable structural analysis (twelve-month recheck).
The entity registry catalogued the author, two organisations, five data sources, six casualty case studies with specific claimed impacts and periods, four key concepts, and four original frameworks contributed by the paper.
The generation process took under five minutes. The machine expression totalled 13,231 bytes of valid JSON-LD using Schema.org vocabulary with SNP-specific extensions. It declares itself as snp:MachineExpression describing the ScholarlyArticle, with the human paper’s URL and DOI as its foundation identifiers. No word of the original paper was altered.
The machine expression is published as a JSON-LD file (AI_Search_Is_Here_v0_7.snp.json) at a predictable URL on the author’s website. Its @type is snp:MachineExpression and it uses snp:describes to point to the ScholarlyArticle. The author has ratified the machine expression. The authorship parity declaration is sealed and the SNP is live.
11.1 Tier 1 Example Fragment
To make Tier 1 tangible, here is a fragment from the proof of concept. The retrieval contract for the AI Search paper declares three example queries it answers: “How is AI search displacing traditional search engines?”, “What strategic options do businesses have in response to AI-driven search?”, and “Which industries are most affected by the shift from search engines to AI answers?” It declares one exclusion: “This paper does not cover the technical implementation of large language model architectures or training methods.”
The entity registry fragment includes three entries: Bernard Lynch (ORCID-linked, role: author), Google (Wikidata QID Q95, role: subject entity), and ChatGPT (Wikidata QID Q115536516, role: subject entity). Each entry carries a persistent identifier and a declared role in the work. This is the minimum viable machine expression — achievable in minutes, immediately useful to any AI system assessing whether this paper answers a given query.
11.2 Quality Safeguards
When an AI system generates the machine expression, it can get things wrong. The retrieval contract may declare questions the paper does not actually answer. The entity registry may list people the paper does not reference, or miss people it does. The confidence levels may not match the author’s hedging language. The temporal anchors may assign the wrong time-dependency type.
These errors are detectable precisely because the machine expression is a structural claim about the human expression. The author reviews the machine expression against their own paper and asks: does this describe what I wrote? Where the description is wrong, the author corrects it. The ratification step is not ceremonial. It is the quality gate. This is why the specification requires author ratification before the authorship parity declaration is sealed.
11.3 Implementation Status
What exists today: a complete Tier 1 proof of concept for one paper, generated as valid JSON-LD, with retrieval contract, entity registry, and authorship parity declaration. The generation was performed through an AI conversation, demonstrating the AI-assisted authoring model described in Section 5.
What is in design: dedicated authoring tooling that allows authors to upload a paper and receive a draft machine expression without requiring an AI conversation; a validation tool that checks internal consistency between the machine expression and the human text; and a publication workflow that embeds the machine expression in the author’s website as a JSON-LD file at a predictable URL.
What is intentionally out of scope for this specification: the SNP discovery registry described in Mechanism 5, the detailed JSON-LD schema definitions (maintained separately as the specification matures), and any platform-specific integration with Zenodo, GitHub, or other repositories.
The next validation step is a controlled retrieval comparison. The same document and the same set of questions will be presented to AI systems under two conditions: one where the system has only the human text, and one where it has both the human text and the Tier 1 machine expression. The test measures whether the AI correctly answers the questions the retrieval contract declares answerable, correctly declines questions the contract declares out of scope, accurately identifies entities from the registry, and respects the declared authority level. This is the empirical test of the specification’s central claim. Until it is run, the SNP is a hypothesis with strong architectural reasoning but no measured retrieval evidence. The specification is honest about this.
Appendix A: Critical Review and Author Response
This appendix records a structured critical review of the specification conducted during the drafting process. Each point of critique was addressed through revision. The review is preserved here as a transparency record — showing what was challenged, what was changed, and why.
A.1 RAG Industry Framing
Critique: The original text implied the SNP “solves RAG.” It does not. The SNP improves document self-description. It does not eliminate embeddings, vector search, chunking, or ranking systems. Large models will continue to use embeddings internally. The stronger claim is that the SNP supplements RAG with explicit structural intent. The claim that it replaces RAG is strategically fragile.
Response: Accepted. All instances of “solves” were replaced with “addresses from the document side.” “Eliminates guessing” became “reduces guessing.” Section 1 now explicitly states that embeddings, vector search, and ranking systems will continue to operate as core infrastructure. Section 7.1 now acknowledges chunking and embedding still occur. The positioning paragraph in Section 9 explicitly states the SNP is not a competitor to embeddings and sits upstream of existing systems as structured input.
A.2 Derivation Claim
Critique: The line “Neither expression is derived from the other” was philosophically elegant but technically contestable. The machine expression is derived from the prose during generation, even if ratified. It is structurally derived but not semantically rewriting. Precision here protects from unnecessary criticism.
Response: Accepted. The line now reads: “The machine expression is structurally derived from the human expression but is not a rewrite of it. It does not reproduce, rephrase, or summarise the author’s prose. It describes the work’s structure, scope, entities, and claims in a parallel format that AI systems can parse directly.” This is technically precise and defensible.
A.3 “First Architecture” Claim
Critique: The claim “No prior work proposes documents born bilateral” is the most attackable statement in the paper. Opponents will cite JATS XML workflows, semantic publishing pipelines, and knowledge graph-first authoring systems. Even if directionally correct, the phrasing must be tight. A safer framing: first general-purpose bilateral architecture designed for AI-era retrieval. That narrows the attack surface.
Response: Accepted. The prior art section now explicitly acknowledges domain-specific precedents by name — JATS XML in biomedicine, semantic publishing pipelines, knowledge graph-first authoring — then draws the precise boundary: none produce a document carrying its own retrieval contract, entity registry, confidence architecture, and temporal validity signals, across all domains, consumable by any AI system without specialist tooling. The SNP is positioned as the first general-purpose architecture designed for AI-era retrieval.
A.4 Ecosystem Dependency
Critique: Tier 3 requires cross-paper participation, registries, adoption, and shared vocabularies. Right now this is a single-author architecture. That is not fatal, but the paper occasionally reads as if ecosystem adoption is inevitable. It is not inevitable. It is strategic. Tone matters.
Response: Accepted. Five tonal adjustments were made. The Tier 3 subtitle now explicitly states value depends on adoption beyond a single author. Signal Mesh Position now acknowledges one-directional declarations today and bidirectional connections only if adoption grows, calling it “a strategic goal, not a given.” The discovery registry language states it depends on demonstrated single-document value driving adoption. The closing line of Section 9 now reads: “Tier 3 delivers its full value only if other authors and platforms adopt the architecture — a strategic outcome that must be earned, not assumed.”
A.5 Strategic Positioning
Critique: The specification is strongest if positioned as a document-side structural enhancement for AI retrieval and knowledge integrity. It is weaker if positioned as a replacement for AI retrieval infrastructure. It should be framed as five functional layers: structural intent declaration, retrieval pre-filter, claim reliability signalling, time validity signalling, and author-governed trust metadata. It is not a new search engine, not a universal knowledge protocol, and not a competitor to embeddings. Clarity here determines reception.
Response: Accepted. A positioning paragraph was added to Section 9 that explicitly names the five functional layers and states what the SNP is not. All overreach language throughout the paper was revised to position the SNP as sitting upstream of existing retrieval infrastructure rather than replacing it.
A.6 Intellectual Strength Assessment
The following honest assessment was produced during the review process and is preserved here as a transparency record.
Conceptual originality: High. No prior work proposes a general-purpose bilateral document architecture designed for AI-era retrieval. Seven threads of predecessor work were mapped; none do this.
Architectural coherence: Strong. The thirteen elements, three tiers, governing principles, and authoring model are internally consistent. Nothing contradicts anything else.
Adoption realism: Moderate. The AI-assisted authoring model is sound in theory and Tier 1 has been demonstrated in practice. However, the authoring tooling does not yet exist as a product. The current process requires an AI conversation to generate the machine expression — a demonstration, not a workflow.
Ecosystem readiness: Early. This is one author, one paper, one proof of concept. Tier 3’s value proposition depends on adoption that does not yet exist. The specification describes what those capabilities look like. It does not make them real.
Attack surface: Manageable but present. The derivation claim, “first” claim, ecosystem inevitability, and RAG positioning have all been revised. The remaining open question is whether the retrieval contract demonstrably improves retrieval accuracy — a controlled comparison study has not yet been conducted.
Phase: Specification, not standards. The paper describes an architecture, demonstrates a proof of concept, and positions itself within prior art. It is not a standard. It is not a validated implementation at scale.
A.7 Structural Strengths Identified
The review identified the following as the strongest sections: Section 1 (problem framing — attacks document design, not AI systems), Section 3 (site schema vs SNP distinction), Section 4 (thirteen elements in tiered architecture), Section 8 (discovery resolution problem with five mechanisms), and Section 10 (prior art differentiation).
The most important conceptual moves were ranked as: (1) Retrieval Contract — the most immediately valuable element, usable by any AI system today. (2) Authorship Parity — the definitional move; without it, there is no SNP. (3) Confidence and Temporal Architecture — where the paper goes beyond all prior art. (4) Argument Topology — the most intellectually ambitious element. (5) Discovery Resolution Architecture — what makes the entire specification practical rather than theoretical.
The bilateral principle was identified as the core intellectual contribution — not embedded metadata, not companion metadata, not post-publication extraction, but co-equal identity. Every predecessor works by doing something to a paper after it exists. The SNP proposes that documents should be born bilateral. If a reader takes away only one idea from this paper, it should be that one.
A.8 The Bilateral Principle as Core Innovation
The review identified the bilateral principle as the real intellectual contribution of the specification. Not embedded metadata. Not companion metadata. Not post-publication extraction. But co-equal identity — two expressions of the same work, each sovereign in its domain, both ratified by the author, both carrying the author’s authority. This framing is novel. No prior work in semantic publishing, nanopublications, micropublications, or structured knowledge management proposes this relationship between human and machine expressions. It is the backbone of the entire architecture. Every other element in the specification — the retrieval contract, the entity registry, the confidence architecture — derives its coherence from this founding principle.
A.9 The Tier Gradient as Adoption Strategy
The three-tier adoption gradient — Tier 1 in minutes, Tier 2 as collaborative editing, Tier 3 as ecosystem participation — was identified as a strategically correct design choice. It directly addresses the classic adoption failure of RDF-based systems. Nanopublications failed to achieve broad adoption because they required authors to write RDF triples. ORKG failed to scale because it required post-publication curation effort that most researchers would not invest. Semantic publishing stayed academic because it demanded specialist tooling and vocabulary knowledge.
Each of these predecessors asked authors to do something new and difficult. The SNP tier gradient solves this by making the entry point trivial (minutes of review at Tier 1), the middle tier valuable to the author as an argument mirror (not just a publishing requirement), and the upper tier dependent on ecosystem growth rather than individual effort. The author burden problem — the single most common cause of failure in machine-readable publishing — is addressed at an architectural level, not left as an implementation detail.
A.10 The Discovery Resolution Section
Section 8 was identified as one of the most strategically grounded parts of the document. It does not assume magical AI behaviour. It outlines five concrete mechanisms — canonical website as entry point, identity matching across the web, DOI as resolution bridge, in-paper pointer, and future discovery registry — layered from immediately implementable to architecturally complete. The section is honest about which mechanisms work today (1, 2, and 4), which depend on platform support (3), and which require ecosystem development (5). This layered architecture is realistic and architecturally credible. It answers the hardest practical question the specification faces: how do 140 copies of the human expression find their way back to one machine expression?
A.11 The Interoperability Spine
The choice of JSON-LD with Schema.org vocabulary plus minimal SNP-specific extensions was identified as a sensible technical decision. The specification avoids inventing a new ecosystem. It builds on the format already understood by search engines, already parseable by every programming language, already a W3C standard, and already readable by LLMs in context windows. This reduces friction to near zero for any consuming system. The lesson from every predecessor is clear: RDF/SPARQL kept nanopublications in the semantic web community, custom XML kept JATS in biomedicine, and proprietary formats limited adoption everywhere they appeared. The interoperability spine ensures the SNP machine expression can be consumed by any system without specialist tooling — the single most important practical decision in the specification.
Access and Scope Notice: This paper specifies the SNP architecture — the complete structural definition including the thirteen elements, three tiers, authoring model, and discovery mechanisms. Deployment procedures and tooling are maintained separately.
Source: aivisibilityarchitects.com
Canonical Source: Zenodo (DOI: 10.5281/zenodo.18821374). Related implementations: aivisibilityarchitects.com
Suggested Citation: Lynch, B. (2026). Signal-Native Papers (v2.0). AI Visibility Architecture Group Limited.
© 2026 AI Visibility Architecture Group Limited. All rights reserved. This work is licensed under CC BY-NC-ND 4.0. This paper may be cited and distributed but may not be modified or used for commercial purposes without written permission.