Zingg
  • Welcome To Zingg
  • Step-By-Step Guide
    • Installation
      • Docker
        • Sharing Custom Data And Config Files
        • Shared Locations
        • File Read/Write Permissions
        • Copying Files To And From The Container
      • Installing From Release
        • Single Machine Setup
        • Spark Cluster Checklist
        • Installing Zingg
        • Verifying The Installation
      • Enterprise Installation for Snowflake
        • Setting up Zingg
        • Snowflake Properties
        • Match Configuration
        • Running Asynchronously
        • Verifying The Installation
      • Compiling From Source
    • Hardware Sizing
    • Zingg Runtime Properties
    • Zingg Command Line
    • Configuration
      • Configuring Through Environment Variables
      • Data Input And Output
        • Input Data
        • Output
      • Field Definitions
      • User Defined Mapping Match Types
      • Deterministic Matching
      • Pass Thru Data
      • Model Location
      • Tuning Label, Match And Link Jobs
      • Telemetry
    • Working With Training Data
      • Finding Records For Training Set Creation
      • Labeling Records
      • Find And Label
      • Using Pre-existing Training Data
      • Updating Labeled Pairs
      • Exporting Labeled Data
    • Verification of Blocking Model
    • Building And Saving The Model
    • Finding The Matches
    • Adding Incremental Data
    • Linking Across Datasets
    • Explanation of Models
    • Approval of Clusters
    • Combining Different Match Models
    • Model Difference
    • Persistent ZINGG ID
  • Data Sources and Sinks
    • Zingg Pipes
    • Databricks
    • Snowflake
    • JDBC
      • Postgres
      • MySQL
    • AWS S3
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
    • BigQuery
    • Exasol
  • Working With Python
    • Python API
  • Running Zingg On Cloud
    • Running On AWS
    • Running On Azure
    • Running On Databricks
    • Running on Fabric
  • Zingg Models
    • Pre-Trained Models
  • Improving Accuracy
    • Ignoring Commonly Occuring Words While Matching
    • Defining Domain Specific Blocking And Similarity Functions
  • Documenting The Model
  • Interpreting Output Scores
  • Reporting Bugs And Contributing
    • Setting Up Zingg Development Environment
  • Community
  • Frequently Asked Questions
  • Reading Material
  • Security And Privacy
Powered by GitBook

@2021 Zingg Labs, Inc.

On this page

Was this helpful?

Edit on GitHub
  1. Step-By-Step Guide

Combining Different Match Models

When a single match model is not sufficient

In many cases, we want to build the identity graph using a combination of different datasets, schemas and matching logic. An example could be having a source system which only contains userids and emails, another one wtih user name and phone numbers and a few others with person information with addresses. Another example could be some systems capturing spousal information, but others to be matched on the basis of lastname and address.

In such cases, Zingg can build the entire graph and relate different models together. In the following case, results of a query with exact match on family Id and a matching model(household) using address and lastname are brought together.

```
{ 
    "vertices" : 
    [ 
        { 
            "name" : "spouse",  
            "vertexType" : "zingg_pipe", 
            "data" : [
                {
                "name" : "spouse", 
                "format" : "snowflake", 
                "props": {
                        "query": "select a.id as id, a.FNAME, a.LNAME, a.STNO, a.ADD1, a.CITY, a.STATE, a.ZINGG_ID_PERSON, b.id as z_id, b.fname as Z_FNAME,b.lname as Z_LNAME,b.stno as Z_STNO,b.add1 as Z_ADD1, b.city as Z_CITY,b.state as Z_STATE, b.ZINGG_ID_PERSON as Z_ZINGG_ID_PERSON from CUSTOMER_RELATE_PARTIAL a, CUSTOMER_RELATE_PARTIAL b where a.familyId = b.familyId"
                        }
                }
                ],
            "edges" :  
            {   "edgeType" : "same_edge",
                "edges":[
                    {
                        "dataColumn" : "zingg_personId",
                        "column" : "zingg_personId",
                        "name" : "zingg_personId1"
                    },
                    {
                        "dataColumn" : "zingg_personId",
                        "column" : "z_zingg_personId",
                        "name" : "zingg_personId2"
                    }
                ]
            }
        },
        { 
            "name" : "household",
            "config" : "$ZINGG_ENTERPRISE_HOME$/zinggEnterprise/configHousehold.json", 
            "strategy" : {
                "vDataStrategy" : "unique_edge",
                "props" : {
                        "column" : "zingg_personId",
                        "edge" : "zingg_personId,z_zingg_personId"
                    }
            },
            "vertexType" : "zingg_match", 
             "edges" :  
            {   "edgeType" : "same_edge",
                "edges":[
                    {
                        "dataColumn" : "zingg_personId",
                        "column" : "zingg_personId",
                        "name" : "zingg_personId1"
                    },
                    {
                        "dataColumn" : "zingg_personId",
                        "column" : "z_zingg_personId",
                        "name" : "zingg_personId2"
                    }
                ]
            }
        }
    ],
    "output" : [{
        "name":"relatedCustomers", 
        "format":"snowflake", 
        "props": {
            "table": "RELATED_CUSTOMERS_PARTIAL"
            }
    }],
    "strategy":"pairs_and_vertices"
}


```
PreviousApproval of ClustersNextModel Difference

Last updated 3 months ago

Was this helpful?