Zingg-0.3.4
  • Welcome to Zingg
  • Step-By-Step Guide
    • Installation
      • Docker
        • Sharing custom data and config files
        • Shared locations
        • File read/write permissions
        • Copying Files To and From the Container
      • Installing From Release
        • Single Machine Setup
        • Spark Cluster Checklist
        • Installing Zingg
        • Verifying The Installation
      • Compiling From Source
    • Hardware Sizing
    • Zingg Runtime Properties
    • Zingg Command Line
    • Configuration
      • Configuring Through Environment Variables
      • Data Input and Output
        • Input Data
        • Output
      • Field Definitions
      • Model Location
      • Tuning Label, Match And Link Jobs
      • Telemetry
    • Working With Training Data
      • Finding Records For Training Set Creation
      • Labeling Records
      • Find And Label
      • Using pre-existing training data
      • Updating Labeled Pairs
      • Exporting Labeled Data
    • Building and saving the model
    • Finding the matches
    • Linking across datasets
  • Data Sources and Sinks
    • Zingg Pipes
    • Snowflake
    • JDBC
      • Postgres
      • MySQL
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
    • BigQuery
  • Working With Python
  • Running Zingg on Cloud
    • Running on AWS
    • Running on Azure
    • Running on Databricks
  • Zingg Models
    • Pre-trained models
  • Improving Accuracy
    • Ignoring Commonly Occuring Words While Matching
    • Defining Domain Specific Blocking And Similarity Functions
  • Documenting The Model
  • Interpreting Output Scores
  • Reporting bugs and contributing
    • Setting Zingg Development Environment
  • Community
  • Frequently Asked Questions
  • Reading Material
  • Security And Privacy
Powered by GitBook
On this page
  1. Step-By-Step Guide
  2. Configuration

Field Definitions

Defining which fields should appear in the output and whether and how they need to be used in matching

fieldDefinition

This is a JSON array representing the fields from the source data to be used for matching, and the kind of matching they need.

Each field denotes a column from the input. Fields have the following JSON attributes:

fieldName

The name of the field from the input data schema

fields

To be defined later. For now, please keep this as the fieldName

dataType

Type of the column - string, integer, double, etc.

matchType

The way to match the given field. Multiple match types, separated by commas, can also be used. Here are the different types supported.

showConcise

Match Type
Description
Can be applied to

FUZZY

Broad matches with typos, abbreviations, and other variations.

string, integer, double, date

EXACT

No tolerance with variations, Preferable for country codes, pin codes, and other categorical variables where you expect no variations.

string

DONT_USE

any

EMAIL

Matches only the id part of the email before the @ character

any

PINCODE

Matches pin codes like xxxxx-xxxx with xxxxx

string

NULL_OR___BLANK

By default Zingg marks matches as

string

TEXT

Compares words overlap between two strings.

string

NUMERIC

extracts numbers from strings and compares how many of them are same across both strings

NUMERIC_WITH_UNITS

extracts product codes or numbers with units, for example 16gb from strings and compares how many are same across both strings

string

ONLY_ALPHABETS_EXACT

only looks at the alphabetical characters and compares if they are exactly the same

string

ONLY_ALPHABETS_FUZZY

ignores any numbers in the strings and then does a fuzzy comparison

string

PreviousOutputNextModel Location

Last updated 2 years ago

Appears in the output but no computation is done on these. Helpful for fields like ids that are required in the output. DONT_USE fields are not shown to the user while labeling, if is set to true.

showConcise