Data that is artificially generated rather than obtained from direct measurement of real-world events, designed to preserve selected statistical properties of a source dataset while removing direct identifiers and reducing re-identification risk.
In practiceSynthetic data is produced by statistical models, simulators, or generative neural networks trained on a source dataset. It is used to augment scarce training samples, to share data with third parties under tighter privacy constraints, to test pipelines without exposing production records, and to stress-test models against rare scenarios. It is not automatically privacy-safe: a generator over-fit to its training set can leak records, and utility for downstream tasks must be empirically demonstrated rather than assumed.
A bank generates a synthetic transactions dataset from its production ledger to share with a fintech vendor during proof-of-concept, after measuring that downstream fraud-detection model accuracy drops by less than two percentage points relative to training on the real data.
This definition is maintained by Moweb partners and used in live client engagements. For how Synthetic data applies to your estate, or to challenge a working definition, speak to a partner.