Finding Records For Training Set Creation

pairs of records that could be similar to train Zingg

The findTrainingData phase prompts Zingg to search for edge cases in the data which can be labeled by the user and used for learning. During this phase, Zingg combs through the data samples and judiciously selects limited representative pairs which can be marked by the user. Zingg is very frugal about the training so that user effort is minimized and models can be built and deployed quickly.

This findTrainingData job writes the edge cases to the folder configured through zinggDir/modelId in the config.

./ --phase findTrainingData --conf config.json

The findTrainingData phase is run first and then the label phase is run and this cycle is repeated so that the Zingg models get smarter from user feedback.

