Zingg
  • Welcome To Zingg
  • Step-By-Step Guide
    • Installation
      • Docker
        • Sharing Custom Data And Config Files
        • Shared Locations
        • File Read/Write Permissions
        • Copying Files To And From The Container
      • Installing From Release
        • Single Machine Setup
        • Spark Cluster Checklist
        • Installing Zingg
        • Verifying The Installation
      • Enterprise Installation for Snowflake
        • Setting up Zingg
        • Snowflake Properties
        • Match Configuration
        • Running Asynchronously
        • Verifying The Installation
      • Compiling From Source
    • Hardware Sizing
    • Zingg Runtime Properties
    • Zingg Command Line
    • Configuration
      • Configuring Through Environment Variables
      • Data Input And Output
        • Input Data
        • Output
      • Field Definitions
      • User Defined Mapping Match Types
      • Deterministic Matching
      • Pass Thru Data
      • Model Location
      • Tuning Label, Match And Link Jobs
      • Telemetry
    • Working With Training Data
      • Finding Records For Training Set Creation
      • Labeling Records
      • Find And Label
      • Using Pre-existing Training Data
      • Updating Labeled Pairs
      • Exporting Labeled Data
    • Verification of Blocking Model
    • Building And Saving The Model
    • Finding The Matches
    • Adding Incremental Data
    • Linking Across Datasets
    • Explanation of Models
    • Approval of Clusters
    • Combining Different Match Models
    • Model Difference
    • Persistent ZINGG ID
  • Data Sources and Sinks
    • Zingg Pipes
    • Databricks
    • Snowflake
    • JDBC
      • Postgres
      • MySQL
    • AWS S3
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
    • BigQuery
    • Exasol
  • Working With Python
    • Python API
  • Running Zingg On Cloud
    • Running On AWS
    • Running On Azure
    • Running On Databricks
    • Running on Fabric
  • Zingg Models
    • Pre-Trained Models
  • Improving Accuracy
    • Ignoring Commonly Occuring Words While Matching
    • Defining Domain Specific Blocking And Similarity Functions
  • Documenting The Model
  • Interpreting Output Scores
  • Reporting Bugs And Contributing
    • Setting Up Zingg Development Environment
  • Community
  • Frequently Asked Questions
  • Reading Material
  • Security And Privacy
Powered by GitBook

@2021 Zingg Labs, Inc.

On this page

Was this helpful?

Edit on GitHub
  1. Step-By-Step Guide
  2. Configuration

Deterministic Matching

Ensuring higher matching accuracy and performance

PreviousUser Defined Mapping Match TypesNextPass Thru Data

Last updated 1 month ago

Was this helpful?

Zingg Enterprise allows the ability to plug rule-based along with already Zingg AI's probabilistic matching. If the data contains sure identifiers like emails, SSNs, passport-ids etc, we can use these attributes to resolve records.

The deterministic matching flow is weaved into Zingg's flow to ensure that each record which has a match finds one, probabilistically, deterministically or both. If the data has known identifiers, Zingg Enterprise's Deterministic Matching highly improves both matching accuracy and performance.

Example For Configuring In JSON:

    "deterministicMatching":[  
        {  
           "matchCondition":[{"fieldName":"fname"},{"fieldName":"stNo"},{"fieldName":"add1"}]  
        },  
        {  
           "matchCondition":[{"fieldName":"fname"},{"fieldName":"dob"},{"fieldName":"ssn"}]  
        },   
        {  
           "matchCondition":[{"fieldName":"fname"},{"fieldName":"email"}]  
        }  
    ]  

Python Code Example:

detMatchNameAdd = DeterministicMatching('fname','stNo','add1')  
detMatchNameDobSsn = DeterministicMatching('fname','dob','ssn')  
detMatchNameEmail = DeterministicMatching('fname','email')  
args.setDeterministicMatchingCondition(detMatchNameAdd,detMatchNameDobSsn,detMatchNameEmail)  

How Will It Work:

The above conditions would translate into the following:

  1. Those rows which have exactly same fname, stNo and add1 => exact match with max score 1 OR

  2. Those rows which have exactly same fname, dob and ssn => exact match with max score 1 OR

  3. Those rows which have exactly same fname and email => exact match with max score 1

deterministic matching