# ZinggEC Python API

## Zingg Enterpise Entity Resolution Python Package

Zingg Enterprise Python APIs for entity resolution, identity resolution, record linkage, data mastering and deduplication using ML (<https://www.zingg.ai>)

**NOTE**

Requires python 3.6+; spark 3.5.0 Otherwise, `zinggES.enterprise.spark.ESparkClient()` cannot be executed

* [Zingg Enterprise Common Package](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/zinggEC.md)
  * [ApproverArguments](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md)
    * [zinggEC.enterprise.common.ApproverArguments](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggec-enterprise-common-approverarguments)
    * [`ApproverArguments`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggEC.enterprise.common.ApproverArguments.ApproverArguments)
      * [`ApproverArguments.getApprovalQuery()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggEC.enterprise.common.ApproverArguments.ApproverArguments.getApprovalQuery)
      * [`ApproverArguments.getArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggEC.enterprise.common.ApproverArguments.ApproverArguments.getArgs)
      * [`ApproverArguments.getDestination()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggEC.enterprise.common.ApproverArguments.ApproverArguments.getDestination)
      * [`ApproverArguments.getParentArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggEC.enterprise.common.ApproverArguments.ApproverArguments.getParentArgs)
      * [`ApproverArguments.setApprovalQuery()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggEC.enterprise.common.ApproverArguments.ApproverArguments.setApprovalQuery)
      * [`ApproverArguments.setArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggEC.enterprise.common.ApproverArguments.ApproverArguments.setArgs)
      * [`ApproverArguments.setDestination()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggEC.enterprise.common.ApproverArguments.ApproverArguments.setDestination)
      * [`ApproverArguments.setParentArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/ApproverArguments.md#zinggEC.enterprise.common.ApproverArguments.ApproverArguments.setParentArgs)
  * [IncrementalArguments](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md)
    * [zinggEC.enterprise.common.IncrementalArguments](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggec-enterprise-common-incrementalarguments)
    * [`IncrementalArguments`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments)
      * [`IncrementalArguments.getArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.getArgs)
      * [`IncrementalArguments.getDeleteAction()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.getDeleteAction)
      * [`IncrementalArguments.getDeletedData()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.getDeletedData)
      * [`IncrementalArguments.getIncrementalData()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.getIncrementalData)
      * [`IncrementalArguments.getOutputTmp()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.getOutputTmp)
      * [`IncrementalArguments.getParentArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.getParentArgs)
      * [`IncrementalArguments.setArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.setArgs)
      * [`IncrementalArguments.setDeleteAction()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.setDeleteAction)
      * [`IncrementalArguments.setDeletedData()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.setDeletedData)
      * [`IncrementalArguments.setIncrementalData()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.setIncrementalData)
      * [`IncrementalArguments.setOutputTmp()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.setOutputTmp)
      * [`IncrementalArguments.setParentArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/IncrementalArguments.md#zinggEC.enterprise.common.IncrementalArguments.IncrementalArguments.setParentArgs)
  * [MappingMatchType](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/MappingMatchType.md)
    * [zinggEC.enterprise.common.MappingMatchType](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/MappingMatchType.md#zinggec-enterprise-common-mappingmatchtype)
    * [`MappingMatchType`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/MappingMatchType.md#zinggEC.enterprise.common.MappingMatchType.MappingMatchType)
      * [`MappingMatchType.getMappingMatchType()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/MappingMatchType.md#zinggEC.enterprise.common.MappingMatchType.MappingMatchType.getMappingMatchType)
  * [epipes](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md)
    * [zinggEC.enterprise.common.epipes](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggec-enterprise-common-epipes)
    * [`ECsvPipe`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.ECsvPipe)
      * [`ECsvPipe.setDelimiter()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.ECsvPipe.setDelimiter)
      * [`ECsvPipe.setHeader()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.ECsvPipe.setHeader)
      * [`ECsvPipe.setLocation()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.ECsvPipe.setLocation)
    * [`EPipe`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.EPipe)
      * [`EPipe.getPassthroughExpr()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.EPipe.getPassthroughExpr)
      * [`EPipe.getPassthruData()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.EPipe.getPassthruData)
      * [`EPipe.getUsableData()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.EPipe.getUsableData)
      * [`EPipe.hasPassThru()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.EPipe.hasPassThru)
      * [`EPipe.setPassthroughExpr()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.EPipe.setPassthroughExpr)
    * [`InMemoryPipe`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.InMemoryPipe)
      * [`InMemoryPipe.getDataset()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.InMemoryPipe.getDataset)
      * [`InMemoryPipe.setDataset()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.InMemoryPipe.setDataset)
    * [`UCPipe`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.UCPipe)
      * [`UCPipe.setTable()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/epipes.md#zinggEC.enterprise.common.epipes.UCPipe.setTable)
  * [EArguments](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md)
    * [zinggEC.enterprise.common.EArguments](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggec-enterprise-common-earguments)
    * [`DeterministicMatching`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.DeterministicMatching)
      * [`DeterministicMatching.getDeterministicMatching()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.DeterministicMatching.getDeterministicMatching)
    * [`EArguments`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments)
      * [`EArguments.getArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.getArgs)
      * [`EArguments.getData()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.getData)
      * [`EArguments.getDeterministicMatching()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.getDeterministicMatching)
      * [`EArguments.getFieldDefinition()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.getFieldDefinition)
      * [`EArguments.getOutputStats()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.getOutputStats)
      * [`EArguments.getPassthroughExpr()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.getPassthroughExpr)
      * [`EArguments.getPrimaryKey()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.getPrimaryKey)
      * [`EArguments.setArgs()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.setArgs)
      * [`EArguments.setBlockingModel()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.setBlockingModel)
      * [`EArguments.setData()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.setData)
      * [`EArguments.setDeterministicMatchingCondition()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.setDeterministicMatchingCondition)
      * [`EArguments.setFieldDefinition()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.setFieldDefinition)
      * [`EArguments.setOutputStats()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.setOutputStats)
      * [`EArguments.setPassthroughExpr()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EArguments.md#zinggEC.enterprise.common.EArguments.EArguments.setPassthroughExpr)
  * [EFieldDefinition](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EFieldDefinition.md)
    * [zinggEC.enterprise.common.EFieldDefinition](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EFieldDefinition.md#zinggec-enterprise-common-efielddefinition)
    * [`EFieldDefinition`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EFieldDefinition.md#zinggEC.enterprise.common.EFieldDefinition.EFieldDefinition)
      * [`EFieldDefinition.getMatchTypeArray()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EFieldDefinition.md#zinggEC.enterprise.common.EFieldDefinition.EFieldDefinition.getMatchTypeArray)
      * [`EFieldDefinition.getPrimaryKey()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EFieldDefinition.md#zinggEC.enterprise.common.EFieldDefinition.EFieldDefinition.getPrimaryKey)
      * [`EFieldDefinition.setPrimaryKey()`](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/EFieldDefinition.md#zinggEC.enterprise.common.EFieldDefinition.EFieldDefinition.setPrimaryKey)

## API Reference

* [Module Index](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/py-modindex.md)
* [Index](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/genindex.md)
* [Search Page](https://github.com/zinggAI/zingg/blob/main/docs/pythonEC/markdown/search.md)

## Example API Usage

```python
from zingg.client import *
from zingg.pipes import *
from zinggEC.enterprise.common.ApproverArguments import *
from zinggEC.enterprise.common.IncrementalArguments import *
from zinggEC.enterprise.common.MappingMatchType import *
from zinggEC.enterprise.common.epipes import *
from zinggEC.enterprise.common.EArguments import *
from zinggEC.enterprise.common.EFieldDefinition import EFieldDefinition
from zinggES.enterprise.spark.ESparkClient import *
import os

#build the arguments for zingg
args = EArguments()
#set field definitions
recId = EFieldDefinition("recId", "string", MatchType.DONT_USE)
recId.setPrimaryKey(True)
fname = EFieldDefinition("fname", "string", MatchType.FUZZY)
# for mapping match type
#fname = EFieldDefinition("fname", "string", MatchType.FUZZY, MappingMatchType("MAPPING", "NICKNAMES_TEST"))
lname = EFieldDefinition("lname", "string", MatchType.FUZZY)
stNo = EFieldDefinition("stNo", "string", MatchType.FUZZY)
add1 = EFieldDefinition("add1","string", MatchType.FUZZY)
add2 = EFieldDefinition("add2", "string", MatchType.FUZZY)
city = EFieldDefinition("city", "string", MatchType.FUZZY)
areacode = EFieldDefinition("areacode", "string", MatchType.FUZZY)
state = EFieldDefinition("state", "string", MatchType.FUZZY)
dob = EFieldDefinition("dob", "string", MatchType.FUZZY)
ssn = EFieldDefinition("ssn", "string", MatchType.FUZZY)

fieldDefs = [recId, fname, lname, stNo, add1, add2, city, areacode, state, dob, ssn]
args.setFieldDefinition(fieldDefs)
#set the modelid and the zingg dir
args.setModelId("100")
args.setZinggDir("./models")
args.setNumPartitions(4)
args.setLabelDataSampleSize(0.5)

# Set the blocking strategy for the Zingg Model as either DEFAULT or WIDER - if you do not set anything, the model follows DEFAULT strategy
args.setBlockingModel("DEFAULT")

#setting pass thru condition
args.setPassthroughExpr("fname = 'matilda'")

#setting deterministic matching conditions
dm1 = DeterministicMatching('fname','stNo','add1')
dm2 = DeterministicMatching('ssn')
dm3 = DeterministicMatching('fname','stNo','lname')
args.setDeterministicMatchingCondition(dm1,dm2,dm3)

#reading dataset into inputPipe and setting it up in 'args'
#below line should not be required if you are reading from in memory dataset
#in that case, replace df with input df
schema = "recId string, fname string, lname string, stNo string, add1 string, add2 string, city string, areacode string, state string, dob string, ssn  string"
inputPipe = ECsvPipe("testFebrl", "examples/febrl/test.csv", schema)
args.setData(inputPipe)

outputPipe = ECsvPipe("resultFebrl", "/tmp/febrlOutput")
outputPipe.setHeader("true")
args.setOutput(outputPipe)

# Zingg execution for the given phase
# options = ClientOptions([ClientOptions.PHASE,"findAndLabel"])

options = ClientOptions([ClientOptions.PHASE,"trainMatch"])
zingg = EZingg(args, options)
zingg.initAndExecute()

incrArgs = IncrementalArguments()
incrArgs.setParentArgs(args)
incrPipe = ECsvPipe("testFebrlIncr", "examples/febrl/test-incr.csv", schema)
incrArgs.setIncrementalData(incrPipe)

incrOptions = ClientOptions([ClientOptions.PHASE,"runIncremental"])
zinggIncr = EZingg(incrArgs, incrOptions)
zinggIncr.initAndExecute()
```
