We have built a lot of data pipelines. Many of them are still running. Few of them are loved.
A pipeline is a one-way commitment: from a source you don't control, to a target that doesn't ask for permission. A data product is the opposite: a published contract, owned by someone, with a documented consumer set and an SLA.
Why the shift matters now
Generative AI raises the consequences of bad data dramatically. A pipeline that quietly broke for three days used to result in a delayed dashboard. The same pipeline feeding a retrieval-augmented generation system now silently fabricates plausible-sounding wrong answers for three days.
Data product thinking - contracts, consumers, SLAs, ownership - is the architectural compensation for the new failure mode. Investments here compound.
