Zingg-0.3.3
  • Welcome to Zingg
  • Step By Step Guide
    • Installation
      • Working with Docker Image
    • Hardware Sizing
    • Configuration
    • Creating training data
      • findTrainingData
      • label
      • findAndLabel
      • Using preexisting training data
      • Exporting labeled data as csv
    • Building and saving the model
    • Finding the matches
    • Linking across datasets
  • Data Sources and Sinks
    • Zingg Pipes
    • Snowflake
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
  • Running Zingg on Cloud
    • Running on AWS
    • Running on Azure
    • Running on Databricks
  • Zingg Models
    • Pretrained models
  • Improving Accuracy By Defining Own Functions
  • Generating Documentation
  • Output Scores
  • Security And Privacy
  • Updating Labeled Pairs
  • Reporting bugs and contributing
  • Community
  • Frequently Asked Questions
  • Reading Material
Powered by GitBook
On this page
  1. Step By Step Guide
  2. Creating training data

label

PreviousfindTrainingDataNextfindAndLabel

Last updated 2 years ago

Providing user feedback on the training pairs

This phase opens an interactive learner where the user can mark the pairs found by findTrainingData phase as matches or non matches. The findTrainingData phase generates edge cases for labelling and the label phase helps the user to mark them.

./zingg.sh --phase label --conf config.json

Proceed running findTrainingData followed by label phases till you have at least 30-40 positives, or when you see the predictions by Zingg converging with the output you want. At each stage, the user will get different variations of attributes across the records. Zingg performs pretty well with even small number of training, as the samples to be labelled are chosen by the algorithm itself.

Shows records and asks user to mark yes, no, cant say on the cli.