When it comes to AI models, one of the hardest questions to answer is deceptively simple: where did this model actually come from?
We solved part of this problem by using the Model Provenance Kit, an open-source tool that prints models at the weight level (the parameters that define what the model knows and behaves) to verify their provenance. However, a fingerprinting tool needs a clear standard to measure against that defines exactly what qualifies as an inferred relationship between two models. Here, the industry does not yet have a consistent answer.
Definitions vary by licensors, standards bodies, research groups and AI labs. The same pair of models may be labeled “related” by one reviewer and “independent” by another, both citing defensible reasoning. This inconsistency creates real challenges for license enforcement, vulnerability triage and compliance.
We created tea Model Origin Constitution as an attempt to fix it. Understanding taxonomy, definition and specification of boundaries, it is a normative reference, a constitution that specifies what a model the provenance relationship is and is not at the level of the weight derivative. This post looks at its structure, its rationale, and how it connects to frameworks already in use by management programs. Model provenance constitution builds on forthcoming work from Cisco AI Defense that describes methodology in full, including empirical evidence why such an approach is critical for both origin and detection conduit. You can check Constitution within documents folder Kit Provenance of the model.
Why it is important to define the origin of the model
Foundation models do not come into the enterprise as isolated artifacts. They are fine-tuned, distilled, quantized, merged and repackaged, and each step creates a new control point whose relationship to its parent is poorly documented. When a security team needs to know whether a deployed model inherits a known vulnerability, or when compliance needs to determine whether a third-party checkpoint triggers a licensing obligation, the question is always the same: is this model a derivative of that model?
Without a shared and accurate answer, an organization may face additional risks:
- Supply chain attacks are already exploiting this loophole
- Regulatory requirements assume a clarity of provenance that does not yet exist
- Incident response depends on the traceable line
Provenance is about model scales
The Provenance Constitution model bases provenance on a single concept: the verifiable derivation of the history of the model’s trained weights. Two models share an origin if and only if they are connected by a causal chain of mass derivation, either directly, indirectly through distillation, or mechanically through a non-practical transformation such as quantization.
Shared architecture, shared training data, shared tokenizer, and shared benchmark performance do not count. The exclusion is intentional. A broader definition that would treat any architectural or behavioral similarity as derivation may cause license enforcement to apply to every model in the architecture family, flag convergent designs as true vulnerability bindings, and flood management audits with false positives. Strong causality produces labels that are stable across reviewers, robust to metadata manipulation, and consistent with how inference actually occurs in practice.
How the model provenance of the constitution is structured
The Constitution answers three questions: when are two models related? How does this relationship occur? And what appearance like a relationship but not? It organizes these answers as explicit enumerations rather than definitions followed by examples, so that each pair of models encountered in practice is mapped into a clear category.
Five conditions indicate when there is a reference to provenance
- Direct descent: training initialized from the trained control point
- Indirect Descent: A Distillation from the Teacher Model
- Mechanical transformation: quantization, clipping, merging or format conversion
- Identity: byte-equivalent copy
- Transitivity: any composition of the above
A pair is associated with a provenance if at least one condition is true.
Nine Mechanisms list specific derivation paths observed in practice:
- Identity and reformatting
- Fine tuning
- Continuation of pre-training
- Vocabulary derivation
- Distillation of knowledge
- Structural modification with weight inheritance
- Quantization and compression
- Adapter-based inference (LoRA, QLoRA, prefix debugging)
- Merging models
Eight dismissals below are states that may appear related to provenance but are independent of provenance. Each exclusion is a pattern of apparent similarity, but ultimately carries no chain of derivation of weight:
- Independent reproduction (e.g. Llama-2 vs. Open LLaMA, which share the same architecture and tokenizer but are trained from scratch)
- Same family different size (eg llama-2-7B vs. llama-2-13B).
- Training for different corpora from the same family (e.g. T5 vs. MT5, which share a name root but have separate training from the start)
- Independent runs under shared seed (ie shared seed does not represent shared weights)
- Architectural convergence (different teams independently arriving at similar model designs)
- Dimensional matching under different mechanisms (models that happen to share the same size or shape without one being built from the other)
- Shared vocabulary without weight transfer (tokenizer is a tool, not a weight)
- Shared training goal (sharing a goal does not link weights)
A strict provenance standard must explicitly name them, as mistaking any of them for true derivation impairs subsequent licensing decisions, vulnerability assessments, and compliance determinations.
Introduction of the record keeping standard
A taxonomy is only as useful as a standard of proof attached to it. The Model Provenance Constitution includes three sources for establishing provenance (but architectural similarity and naming conventions are expressly insufficient):
- Official documentation: from the releasing organization that explicitly names the parent model and inference method
- Checkpoint verification: via hash comparison, layer comparison or reproducible derivation scripts
- Authoritative third-party analysis: which has been peer-reviewed or widely cited
Due to the ambiguity, the Provenance Constitution model is labeled as a provenance-independent pair by default. This conservatism is intentional. A false positive provenance has immediate consequences: license charges, intellectual property claim, supply chain incident notification. False negatives are caught by defense-in-depth through manual review, license audit and forensic analysis. Specificity wins when rigor is required.
Alignment with AI Threat Frameworks and standards
Model provenance verification can be thought of as a supply chain check, and the Model Provenance Constitution serves as a definition layer that allows you to audit model dependencies. It specifies what it means for a deployed model to inherit from an upstream source, which is a prerequisite for any meaningful question about inherited vulnerabilities, licensing obligations, or unattributed redistribution.
“weak model provenance“ and notice it “no guarantees of model origin.“ Tea ATLAS OF MITERS Framework Documents Compromised Supply Chain (AML.T0010) as primary initial– technical approach. Tea Cisco AI Security and Safety Framework classifies third-party model components under OB-009 Supply Chain Compromise with direct applicability via AITech-9.3 (Dependency/Plugin Compromise). The Cisco AI Security and Safety Framework classifies components of third-party models OB-009 Supply Chain Compromisewith direct applicability through AITech-9.3 Plugin Dependency / Compromise: actors inject malicious code, backdoors, or vulnerabilities into third-party dependencies used by AI models, agents, or applications, creating supply chain attacks that affect all systems using the compromised component. Base model checkpoints reused as initializations for subsequent models are exactly such dependencies.
The Constitution also recognizes the adversarial dimension through Leak detection AITech-9.2: intentional obfuscation of a derived relationship — metadata rewrites, tokenizer substitutions, chained modifications intended to obscure the parent. The Constitution’s commitment to weight-level evidence, rather than metadata-level evidence, is a direct response to this adversary model.
The Provenance Constitution model draws on existing frameworks that AI supply chain programs already rely on. These frameworks identify requirements or considerations that the constitution helps to fulfill. A formal definition of provenance is a prerequisite for consistent creation of this documentation across the organization and between suppliers.

Table 1. Frameworks, regulations, and standards from which the sample provenance of the Constitution drew
A living documentary
New modeling methods emerge faster than any fixed taxonomy can accommodate. Model fusion, combining specialized trained models, has become a dominant technique in recent years. In addition to merging, the ecosystem encounters Mixture-of-Experts architectures with independently trained components, federated training across organizations, and synthetic data feeds that blur the lines between knowledge transfer and native training. The Model Provenance Constitution takes these open borders into account and commits to revision as the landscape evolves.
Get started
A full summary of Model Provenance Constitution is available next to this post: https://github.com/cisco-ai-defense/model-provenance-kit/tree/main/docs/constitution
For teams ready to put these definitions into practice, the Model Provenance Kit provides a tool. The entire pipeline runs on the CPU, architectural matches are resolved in milliseconds, and extracted features are cached for reuse. Check out the Model Provenance Kit Github: https://github.com/cisco-ai-defense/model-provenance-kit
Access the base model provenance kit at Hugging Face: https://huggingface.co/datasets/cisco-ai/model-provenance-kit