Zingg-0.3.3
  • Welcome to Zingg
  • Step By Step Guide
    • Installation
      • Working with Docker Image
    • Hardware Sizing
    • Configuration
    • Creating training data
      • findTrainingData
      • label
      • findAndLabel
      • Using preexisting training data
      • Exporting labeled data as csv
    • Building and saving the model
    • Finding the matches
    • Linking across datasets
  • Data Sources and Sinks
    • Zingg Pipes
    • Snowflake
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
  • Running Zingg on Cloud
    • Running on AWS
    • Running on Azure
    • Running on Databricks
  • Zingg Models
    • Pretrained models
  • Improving Accuracy By Defining Own Functions
  • Generating Documentation
  • Output Scores
  • Security And Privacy
  • Updating Labeled Pairs
  • Reporting bugs and contributing
  • Community
  • Frequently Asked Questions
  • Reading Material
Powered by GitBook
On this page
  • Step 1: Install
  • Step 2: Plan for Hardware
  • Step 3: Build the config for your data
  • Step 4: Create the training data
  • Step 5: Build and save the model
  • Step 6: Voila, lets match!

Step By Step Guide

PreviousWelcome to ZinggNextInstallation

Last updated 2 years ago

Step 1: Install

Installation instructions for docker as well as github release are . If you need to build from the sources or compile for a different flavor of Spark, check

Step 2: Plan for Hardware

Decide your hardware based on the

Step 3: Build the config for your data

Zingg needs a configuration file which defines the data and what kind of matching is needed. You can create the configuration file by following the instructions

Step 4: Create the training data

Zingg builds a new set of models(blocking and similarity) for every new schema definition(columns and match types). This means running the findTrainingData and label phases multiple times to build the training dataset form which Zingg will learn. You can read more

Step 5: Build and save the model

The training data in Step 4 above is used to train Zingg and build and save the models. This is done by running the train phase. Read more

Step 6: Voila, lets match!

As long as your input columns and the field types are not changing, the same model should work and you do not need to build a new model. If you change the match type, you can continue to use the training data and add more labelled pairs on top of it.

Its now time to apply the model above on our data. This is done by running the match or the link phases depending on whether you are matching within a single source or linking multiple sources respectively. You can read more about and

matching
linking
here
performance numbers
here
here
here
compiling