Zingg-0.3.4
  • Welcome to Zingg
  • Step-By-Step Guide
    • Installation
      • Docker
        • Sharing custom data and config files
        • Shared locations
        • File read/write permissions
        • Copying Files To and From the Container
      • Installing From Release
        • Single Machine Setup
        • Spark Cluster Checklist
        • Installing Zingg
        • Verifying The Installation
      • Compiling From Source
    • Hardware Sizing
    • Zingg Runtime Properties
    • Zingg Command Line
    • Configuration
      • Configuring Through Environment Variables
      • Data Input and Output
        • Input Data
        • Output
      • Field Definitions
      • Model Location
      • Tuning Label, Match And Link Jobs
      • Telemetry
    • Working With Training Data
      • Finding Records For Training Set Creation
      • Labeling Records
      • Find And Label
      • Using pre-existing training data
      • Updating Labeled Pairs
      • Exporting Labeled Data
    • Building and saving the model
    • Finding the matches
    • Linking across datasets
  • Data Sources and Sinks
    • Zingg Pipes
    • Snowflake
    • JDBC
      • Postgres
      • MySQL
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
    • BigQuery
  • Working With Python
  • Running Zingg on Cloud
    • Running on AWS
    • Running on Azure
    • Running on Databricks
  • Zingg Models
    • Pre-trained models
  • Improving Accuracy
    • Ignoring Commonly Occuring Words While Matching
    • Defining Domain Specific Blocking And Similarity Functions
  • Documenting The Model
  • Interpreting Output Scores
  • Reporting bugs and contributing
    • Setting Zingg Development Environment
  • Community
  • Frequently Asked Questions
  • Reading Material
  • Security And Privacy
Powered by GitBook
On this page

Working With Python

A whole new way to work with Zingg!

PreviousBigQueryNextRunning Zingg on Cloud

Last updated 2 years ago

Instead of configuring Zingg using the JSON, we can now use Python to build and run Zingg entity and identity resolution programs. This is handy when you want to run Zingg on an existing Spark cluster. To run on local machine, please do the installation of the release before running Zingg python programs.

The Zingg Python package can be installed by invoking

python -m pip install zingg

Detailed documentation of the python api is available at

Example programs for python exist under examples. Please check examples/febrl/FebrlExample.py to get started.

Please refer to the for running python programs. Please note that Zingg Python programs are PySpark programs and hence need the Zingg cli to execute.

https://readthedocs.org/projects/zingg/
command line guide