Zingg-0.3.3
  • Welcome to Zingg
  • Step By Step Guide
    • Installation
      • Working with Docker Image
    • Hardware Sizing
    • Configuration
    • Creating training data
      • findTrainingData
      • label
      • findAndLabel
      • Using preexisting training data
      • Exporting labeled data as csv
    • Building and saving the model
    • Finding the matches
    • Linking across datasets
  • Data Sources and Sinks
    • Zingg Pipes
    • Snowflake
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
  • Running Zingg on Cloud
    • Running on AWS
    • Running on Azure
    • Running on Databricks
  • Zingg Models
    • Pretrained models
  • Improving Accuracy By Defining Own Functions
  • Generating Documentation
  • Output Scores
  • Security And Privacy
  • Updating Labeled Pairs
  • Reporting bugs and contributing
  • Community
  • Frequently Asked Questions
  • Reading Material
Powered by GitBook
On this page
  1. Step By Step Guide
  2. Creating training data

Using preexisting training data

PreviousfindAndLabelNextExporting labeled data as csv

Last updated 2 years ago

Supplementing Zingg with existing training data

If you alredy have some training data that you want to start with, you can use that as well with Zingg. Add an attribute trainingSamples to the config and define the training pairs.

The training data supplied to Zingg should have z_cluster column which groups the records together. It also needs z_isMatch column which is 1 if the pairs match or 0 if they do not match.

An example is provided at

The above training data can be specified using

In addition, labelled data of one model can also be exported and used as training data for another model. For details, check out .

Please note: It is advisable to still run and a few rounds to tune Zingg with the supplied training data as well as patterns it needs to learn independently.

Github training data
trainingSamples attribute in the configuration.
exporting labelled data
findTrainingData
label