A versioned, reproducible suite of tests run against an AI system on every change. Covers accuracy, hallucination, bias, PII leakage and task-specific criteria. The single highest-leverage investment for production AI.
This definition is maintained by Moweb partners and used in live client engagements. For how Evaluation harness applies to your estate, or to challenge a working definition, speak to a partner.