by Stewart

ChromaDB was extra state, not extra evidence

Why I removed ChromaDB from Baseball RAG after realizing it did not give me anything the model and database could not already do better.

Chroma gave me another place to look. It did not give me another source of truth.

I had ChromaDB in Baseball RAG for a while because it seemed like the obvious thing to do.

It is a RAG project. RAG projects usually have a vector database. Baseball has biographies, player descriptions, old names, weird history, and fuzzy questions. So a vector store sounded useful.

But after working through the actual system, I do not think Chroma owned a real job.

That was the problem.

The system I actually needed

Baseball RAG is not supposed to be a chatbot that sounds good about baseball.

I wanted natural language in and grounded evidence out.

That means different parts of the system need clear jobs:

  • DuckDB answers structured stat questions.
  • Lahman is the primary factual authority.
  • Retrosheet can add optional secondary evidence for some biography claims.
  • Local stat-definition Markdown handles supported glossary questions.
  • The local LLM writes prose, classifies intent, and explains grounded results.
  • The eval gate catches drift when those boundaries break.

That is already a lot.

So when I looked at Chroma honestly, the question became simple:

What does this component own?

And I did not like the answer.

Chroma did not own facts

If I needed a stat, DuckDB was better.

It had the tables. It had the rows. It had SQL. It had the dataset manifest. It had checksums and source metadata. If the user asked who had the most RBIs in 1962, I did not want a nearest-neighbor search over baseball text. I wanted the database to answer it.

A vector result can feel like evidence because it returns something that looks relevant.

But similarity is not authority.

If Chroma retrieved a paragraph saying a player hit 714 home runs, I still needed DuckDB or another structured source to verify it. If the model generated that same sentence on its own, I still needed the same verification step.

Chroma did not remove the need for the database.

It added another path that could be stale, missing, or rebuilt differently.

Chroma did not own prose

The other argument for Chroma was biographies.

That made sense at first. Biographies are fuzzier than leaderboard queries. They need context. They need readable language. They do not always map cleanly to one SQL query.

But that is exactly the part the model can already do.

If the goal is readable prose, the LLM is the prose engine. Chroma was not writing better biographies by itself. It was retrieving text that the model still had to turn into an answer.

And if that retrieved text was generated or stale or only loosely related, then I had not improved the system. I had just given the model a different pile of words to lean on.

That is not enough.

For biographies, the better design was:

1. Resolve the player identity through DuckDB/Lahman.
2. Let the model write readable prose.
3. Extract supported stat claims.
4. Verify those claims against structured sources.
5. Show warnings when something is unsupported or conflicting.

That gives the model a useful job without pretending the model is the source of truth.

Chroma did not make that boundary clearer. It made it fuzzier.

Chroma did not own verification

This was the part that bothered me most.

The system already needed claim verification.

If a biography says a player hit a certain number of home runs, played for a team in a specific season, or led a league in a category, I need that checked against structured data when structured data exists.

Chroma could not replace that.

A retrieved paragraph can support a claim in a loose human sense, but it is not the same as a checked row from a known source. It does not tell me enough about authority, freshness, or whether the number came from generated text that got embedded earlier.

So the system still needed:

  • source authority
  • structured rows
  • SQL or equivalent evidence
  • warnings
  • verification states
  • eval coverage

Once I accepted that, Chroma looked less like retrieval and more like extra state.

The debugging problem

The worst architecture is the one that makes a future bug harder to explain.

With Chroma in the middle, a wrong answer could come from too many places:

Did DuckDB return the wrong row?
Did the model invent something?
Did Chroma retrieve stale text?
Was the vector index missing?
Was the local index rebuilt differently?
Did generated text get embedded and treated like evidence later?

I do not want users debugging that.

I do not want future me debugging that either.

For a portfolio project, that matters. The point is not just that the system answers baseball questions. The point is that the system shows taste about where facts come from.

Adding another component is easy. Removing the component with unclear authority is the harder call.

The better boundary

The simpler architecture made more sense:

Question
  |
  v
Router
  |-- stat query -----------> DuckDB
  |-- database question ----> typed spec -> parameterized SQL -> DuckDB
  |-- player biography -----> DuckDB identity -> LLM prose -> stat-claim checks
  |-- stat explanation -----> local definitions first, then LLM if needed
  |
  v
Answer with sources, warnings, metadata, and review state

That is less magical.

Good.

The database owns structured facts. The model owns language. The verifier owns supported factual claims. Unsupported or ambiguous questions fail closed.

No hidden vector memory in the middle.

The lesson

Chroma was not bad technology. It was bad ownership.

It did not own facts. DuckDB owned facts.

It did not own prose. The model owned prose.

It did not own verification. The claim checker owned verification.

So it became extra state with unclear authority.

That is exactly the kind of component I want to remove from an AI system.

The useful lesson for me was not “never use vector databases.”

The lesson was: do not add retrieval just because the project is called RAG.

If retrieval does not make the answer more inspectable, more grounded, or easier to verify, it may not be helping. It may just be giving the system one more place to hide the truth.