by Stewart

Fidelity over backward compatibility

Why I accepted breaking API changes in Propontis when the old schema hid important evidence boundaries.

Backward compatibility is valuable until it makes the system lie about what happened.

TL;DR

In Propontis, I accepted breaking schema changes because the old interface blurred evidence types that needed to stay separate.

A source conversation, executable prior context, and an executed PyRIT conversation are not the same thing. Treating them as interchangeable made the API easier to keep stable and harder to trust.

The better choice was fidelity.

The problem

Propontis deals with attack runs, evidence, model behavior, and reports. That kind of system needs a clean audit trail.

The old shape made some things too convenient. It let different kinds of conversation evidence collapse into one mental bucket:

  • the full source conversation
  • prior context that can be executed or replayed
  • the actual conversation produced by a PyRIT run

Those are related, but they are not interchangeable.

If a report mixes them together, a reviewer has to guess what they are looking at. Is this the input? The replay context? The executed attack trace? A transformed version of something else?

That guessing is where trust goes to die.

The uncomfortable decision

The easy move would have been to preserve the old schema and keep adding fields around it.

That would have avoided churn. It also would have preserved the confusion.

So I accepted a breaking change.

The schema needed names that matched the evidence. The output needed to make boundaries explicit. Validation placeholders had to become real facts. Expected text-mode failures had to stop dumping tracebacks as if every known failure were a surprise. A vague config_status field needed to become something closer to the truth, like model_readiness.

None of that sounds glamorous. It is the kind of work that makes demos slightly less smooth and operations much less weird.

Why fidelity mattered more

A red-team or evaluation system does not earn trust by keeping old field names alive.

It earns trust by making review possible.

If the system says an attack used a certain context, that context should be inspectable. If a report includes a conversation, the report should be clear about whether it is source material or executed evidence. If a model was not ready, the output should say that plainly instead of hiding behind a generic config status.

Backward compatibility can protect users from unnecessary churn. But it can also protect bad abstractions from being corrected.

In this case, the old abstraction was the problem.

The accessibility angle

Auditability is accessibility for reviewers.

A reviewer should not need to reconstruct internal architecture from vague output names. They should be able to read the artifact and understand:

  • what was provided to the system
  • what was executed
  • what failed
  • what was expected
  • what evidence supports the report

When a schema hides those distinctions, it shifts work onto the reviewer. That is especially bad in security and evaluation work, where the whole point is to make failure easier to inspect.

Breaking the schema reduced that burden.

The rule I would use again

I would not break an API just because I found a prettier name.

But I would break one when the old contract makes users less informed.

A stable lie is still a lie. A stable ambiguity is still ambiguity.

If the interface hides truth, compatibility is not the highest virtue anymore.

Takeaway

Compatibility matters. Fidelity matters more when the system is supposed to prove what happened.

The public contract should preserve the evidence boundary, not the convenience of the first version.