Output Statistics

Under the hoods of the matching process

Zingg Enterprise Feature

If you’ve ever asked “how deterministic rules are performing?” or “did my latest incremental run improve cluster quality?”, Output Statistics is your answer. The Output Statistics surface information about the linkages Zingg found among records within a cluster. While running Zingg incrementally, Output Statistics expose how cluster numbers change as records get inserted and updated into the identity graph. Match Statistics surfaces those insights by writing structured metrics for every match or incremental run, so you can:

See how dense or sparse your clusters are
Understand how much of a cluster is explained by deterministic rules vs. probabilistic links
Identify highly central records (connectors) and outliers
Track how clusters change across runs (growth, splits, merges, reassignments)

If the number of clusters changes disproportionately to the number of records updated or added, an alert could be triggered.

What gets written

Zingg writes statistics to the stats directory whenever you run phases like match or incremental. The output comprises of three types:

SUMMARY: High-level run summary
CLUSTER: One row per cluster with cluster level matching metrics
RECORD: One row per record with its matching metrics within its cluster

PreviousWhich records matched with each other in a cluster NextConfiguring Output Statistics

Last updated 20 days ago

Was this helpful?

hashtagWhat gets written

What gets written