Match Statistics
Reassign the ZINGG IDs for clusters from the original production model
Currently, the Zingg output consists of the original data along with the Zingg ID and the match probabilities. If we could surface information about the linkages we found in the records within the cluster, we could help users with matching internals and anomalies. When our users are dealing with millions of records, finding the needle in the haystack is critical.
This information about changing clusters would be a good first step to observe the entity resolution pipeline. While running Zingg incrementally, the Match Statistics would expose how the clusters numbers change as records get inserted and updated into the identity graph. If the number of clusters changes disproportionately to the number of records updated and added, an alert could be triggered.
Whenever we run phases like match or incremental, the statistics about the changes in clusters will be written.
The output will be the statistics being written which are of three types - SUMMARY, CLUSTER AND RECORD
The output is written in the directory -
zinggDir/modelId/stats/type
Last updated
Was this helpful?