# Zingg Models

Zingg learns two models from the data.

## 1. Blocking Model

One fundamental problem with scaling data mastering is that the number of comparisons increases quadratically as the number of input records increases.

![Data Mastering At Scale](https://1010246109-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fa7sgpR3odgfck5L8KMcN%2Fuploads%2Fgit-blob-d84ed8fee8655cece1f5f5e1d7aa6b81e6d7715f%2Ffuzzymatchingcomparisons.jpg?alt=media)

Zingg learns a clustering/blocking model which indexes near similar records. This means that Zingg does not compare every record with every other record. Typical Zingg comparisons are 0.05-1% of the possible problem space.

## 2. Similarity Model

The similarity model helps Zingg to predict which record pairs match. The similarity is run only on records within the same block/cluster to scale the problem to larger datasets. The similarity model is a classifier that predicts the similarity of records that are not exactly the same but could belong together.

![Fuzzy matching comparisons](https://1010246109-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fa7sgpR3odgfck5L8KMcN%2Fuploads%2Fgit-blob-9c6606491acd13c97cc5c9560919ddebe50c23ba%2FdataMatching.jpg?alt=media)

To build these models, training data is needed. Zingg comes with an interactive learner to rapidly build training sets.

![Shows records and asks user to mark yes, no, cant say on the cli.](https://1010246109-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fa7sgpR3odgfck5L8KMcN%2Fuploads%2Fgit-blob-ba27eaba44c1dd5ff8d1760b87107346292ae65d%2Flabel2.gif?alt=media)
