Vendor Says It Does Not Train on Your Data. What Evidence Should You Ask For?
A no-training claim may be true. It may be useful. It is still not evidence-complete.
Positioning
Vendor claim is not evidence.
“We do not train on your data” is one of the most common AI vendor statements. It is also one of the easiest to over-credit.
The claim may be true. It may be helpful. It may even appear in a contract.
But by itself, it is not evidence-complete.
Training is only one possible use of customer data. The real review question is:
After customer data enters the AI product, what happens to it?
That means looking past the slogan and asking about retention, logging, review workflows, support access, metadata, subprocessors, model providers, deletion, audit logs, tenant settings, and contract scope.
Claim
We do not train on your data.
Or:
Customer data is not used to train our models.
Or:
Your prompts and outputs are not used for model training.
These statements sound clear. They are not all the same.
One may cover customer content. Another may cover prompts and outputs only. Another may cover training, but say nothing about logs, telemetry, evaluation, support access, or product improvement.
The exact wording matters.
So does the source.
A public FAQ is not the same as a customer contract. A product page is not the same as a data processing addendum (DPA). A trust center statement is not the same as tenant-level evidence.
Why it sounds sufficient
It answers the question most buyers have been trained to ask:
Will the vendor use my data to train its model?
That is a real concern. If proprietary or sensitive data can be used for model training, the buyer has a serious problem.
So when the vendor says no, it feels like the main issue is closed.
That is where reviews go off track.
The claim addresses one use. It does not explain the operational data path. A vendor may not train on customer data and still retain prompts, keep logs, allow support review, or send data through model providers and subprocessors.
What the claim can support
A no-training statement can support a narrow point:
The vendor says customer data is not used to train models.
That is useful. It is just not enough.
The strength depends on where it appears. A contract or product-specific term is stronger than a marketing page. A signed response is stronger than a generic FAQ.
It is also only as good as its scope. If the statement does not define the covered data, product, plan, exceptions, and document source, it remains incomplete.
At minimum, the buyer should be able to answer:
What data is covered?
Which product and plan are covered?
Which uses are excluded?
Where is the commitment written?
Without that, the claim is reassuring but thin.
What it does not prove
A no-training claim does not automatically prove:
prompt retention period
logging and telemetry scope
review and support access boundaries
metadata handling
support access boundary
subprocessor data path
model provider data path
deletion controls
audit log availability
tenant setting defaults
customer-specific contract coverage
That is the gap. The vendor answered training use. The buyer still does not know the full data path.
Weak-answer pattern
The weak-answer pattern is a narrow promise presented as if it closed the review.
The vendor says:
We do not train on customer data.
But the answer does not address the rest of the path:
logging
retention
review workflows
support access
metadata
subprocessors
model providers
deletion
The statement may be true. It is still too narrow. A strong answer maps the claim to data categories, retention, third parties, customer controls, and contract scope.
Evidence request
Do not ask the vendor to restate the slogan. Ask for evidence that maps the claim to actual handling:
Please provide the product-specific data flow for prompts, outputs, files, logs, metadata, diagnostic data, support access, subprocessors, and model providers.
For each data category, identify retention period, access roles, third-party processing, customer controls, deletion options, and the contractual or administrative source for the commitment.
That is a much better question than “do you train on our data?”
Review note
Usable review language:
The vendor states that customer data is not used for model training. This is useful but not evidence-complete. The statement addresses training use only. It does not by itself establish retention, logging, review workflows, support access, subprocessor handling, model provider routing, deletion controls, audit-log coverage, tenant settings, or customer-specific contract scope.
Additional evidence is needed before relying on this claim for sensitive data use.
That note records the evidence gap without pretending the claim proves more than it does.
Usage boundary
Until the missing evidence is resolved, keep the usage boundary narrow:
low-sensitivity workflows. Do not use the product for customer data, regulated data, confidential source code, or workflows that require prompt-level auditability.
That is not approval or rejection. It is a review boundary.
Bottom line
“We do not train on your data” may be true. It may be useful.
It is still not the review.
The buyer still needs to know what enters the product, where it goes, how long it stays, who can access it, which third parties touch it, what can be deleted, what can be audited, and which commitments actually apply to the buyer’s product, plan, tenant, and contract.
Vendor claim is not evidence.
The work is turning a narrow public claim into a mapped evidence request, a usable review note, and a conservative usage boundary.
This is part of the AI Vendor Evidence Gap Pack series: vendor claim → evidence source → evidence gap → buyer question → usage boundary.
Boundary
This article is for evidence structuring and review preparation.
It does not provide legal, regulatory, audit, procurement, certification, or implementation advice.
Examples are illustrative unless separately validated for a specific organization and use case.


