Model Difference

Comparison of two outputs with different models

Let us take the case where we have an existing model where we have marked some fields as fuzzy and we then build a model and look at its match output. Now, we train another model where we've marked some of these attributes as exact or maybe added more match types or even change some field types, etc. Here, the primary key remains the same.

We want to understand how those changes are translating into either a better or worse model. Also, what other changes that we could make to get the model to the kind of accuracy that we are looking for.

Comparison of the two outputs becomes important in such a case and understanding which model is working better for us.

The model difference phase is run as follows:

./scripts/zingg.sh --phase diff --conf <path to new model conf> --compareTo <path to original conf>

The output will be as follows -

zingg_modelDiff_originalModelId_newModelId in case of snowflake zinggDir/newModelId/modeldiff/originalModelId_newModelId in case of spark

The output will contain records that have been impacted due to changes in clusters as a result of the new model trained.

PreviousCombining Different Match Models NextPersistent ZINGG ID

Last updated 1 month ago

Was this helpful?