I do not need the model to stop writing. I need the system to stop trusting every sentence it writes.
TL;DR
For player biographies, the LLM is allowed to write prose. That is the part it is good at.
But if the prose includes a factual stat claim, the system extracts the claim and checks it against structured data before returning the answer. Lahman and DuckDB handle the primary verification path. Retrosheet can add secondary evidence when it applies.
The model writes. The database checks.
Why biographies are tricky
A biography is not like a leaderboard query.
If the user asks for the RBI leader in 1962, the system can run SQL and return a structured answer. The path is clear.
If the user asks, “Who was Hank Aaron?”, the answer needs prose. It needs context, transitions, and some judgment about what matters. That is where a model helps.
The problem is that prose invites unsupported facts.
A model might say a player had a certain number of home runs, played a specific season for a team, or led a league in a category. Some of those claims may be right. Some may be wrong. All of them sound the same when they are wrapped in confident language.
That is not good enough.
The design
The biography path became a two-step process:
1. Generate readable biography prose.
2. Extract stat claims and verify them against structured sources.
The system does not treat every sentence the same. General background may remain prose. Extractable stat claims get checked.
A claim can land in states like:
verified_by_allverified_primary_onlycontradicted_by_allconflict
Those states are more useful than a single true/false label.
A claim verified by the primary source is different from a claim verified by every available source. A conflict is different from a contradiction. A missing secondary source is different from a source disagreeing.
The point is not to pretend the system knows everything. The point is to make the system honest about what it knows.
Why not make the model self-check?
Because that would put the same kind of component on both sides of the judgment.
A model can help extract claims. It can help phrase a caveat. It can explain why a verification state matters.
But the final check should come from structured evidence when structured evidence exists.
If the claim is about a career total, a season stat, or a team relationship, that belongs in the data path. The model should not grade its own homework.
What this changes for the reader
A normal generated bio asks the reader to trust the paragraph.
A checked bio can show more texture:
This claim was verified against the primary source.
This claim had secondary support.
This claim could not be verified.
This claim conflicted with another source.
That turns the answer from a polished blob into something a reader can reason about.
The output may be less sleek. Good. Sleek is overrated when facts are at stake.
The accessibility angle
People should not need baseball-reference-level expertise to know whether an AI answer is safe to reuse.
Verification states lower the burden. They tell the user which parts of the generated prose are grounded and which parts need caution.
That matters for anyone using the system with limited time, limited domain knowledge, or assistive workflows where manually cross-checking every claim is expensive.
The system should carry more of that work.
Takeaway
The goal is not to ban the LLM from writing.
The goal is to give it a narrow, useful job and then wrap that job with checks.
Let the model make the biography readable. Make the database keep the biography honest.