Zingg
  • Welcome To Zingg
  • Step-By-Step Guide
    • Installation
      • Docker
        • Sharing Custom Data And Config Files
        • Shared Locations
        • File Read/Write Permissions
        • Copying Files To And From The Container
      • Installing From Release
        • Single Machine Setup
        • Spark Cluster Checklist
        • Installing Zingg
        • Verifying The Installation
      • Enterprise Installation for Snowflake
        • Setting up Zingg
        • Snowflake Properties
        • Match Configuration
        • Running Asynchronously
        • Verifying The Installation
      • Compiling From Source
    • Hardware Sizing
    • Zingg Runtime Properties
    • Zingg Command Line
    • Configuration
      • Configuring Through Environment Variables
      • Data Input And Output
        • Input Data
        • Output
      • Field Definitions
      • User Defined Mapping Match Types
      • Deterministic Matching
      • Pass Thru Data
      • Model Location
      • Tuning Label, Match And Link Jobs
      • Telemetry
    • Working With Training Data
      • Finding Records For Training Set Creation
      • Labeling Records
      • Find And Label
      • Using Pre-existing Training Data
      • Updating Labeled Pairs
      • Exporting Labeled Data
    • Verification of Blocking Model
    • Building And Saving The Model
    • Finding The Matches
    • Adding Incremental Data
    • Linking Across Datasets
    • Explanation of Models
    • Approval of Clusters
    • Combining Different Match Models
    • Model Difference
    • Persistent ZINGG ID
  • Data Sources and Sinks
    • Zingg Pipes
    • Databricks
    • Snowflake
    • JDBC
      • Postgres
      • MySQL
    • AWS S3
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
    • BigQuery
    • Exasol
  • Working With Python
    • Python API
  • Running Zingg On Cloud
    • Running On AWS
    • Running On Azure
    • Running On Databricks
    • Running on Fabric
  • Zingg Models
    • Pre-Trained Models
  • Improving Accuracy
    • Ignoring Commonly Occuring Words While Matching
    • Defining Domain Specific Blocking And Similarity Functions
  • Documenting The Model
  • Interpreting Output Scores
  • Reporting Bugs And Contributing
    • Setting Up Zingg Development Environment
  • Community
  • Frequently Asked Questions
  • Reading Material
  • Security And Privacy
Powered by GitBook

@2021 Zingg Labs, Inc.

On this page

Was this helpful?

Edit on GitHub
  1. Reporting Bugs And Contributing

Setting Up Zingg Development Environment

PreviousReporting Bugs And ContributingNextCommunity

Last updated 1 month ago

Was this helpful?

The following steps will help you set up the Zingg Development Environment. While the steps remain the same across different OS, we have provided detailed instructions for Ubuntu OS. The below steps have been created using Ubuntu 22.04.2 LTS

Make sure to update your Ubuntu installation:

sudo apt update

Step 0: Install Ubuntu on WSL2 on Windows

  • Install wsl: Type the following command in Windows PowerShell.

wsl --install
  • Download Ubuntu from Microsoft Store, Ubuntu 20.04 LTS

  • Configure Ubuntu with a username and password

  • Open Ubuntu 20.04 LTS and start working

sudo apt update
  • Follow this for more information.

Step 1: Clone The Zingg Repository

  • Install and SetUp Git: sudo apt install git

  • Verify : git --version

  • Set up Git by following the .

  • Clone the Zingg Repository: git clone https://github.com/zinggAI/zingg.git

Note: It is suggested to fork the repository to your account and then clone the repository.

Step 2: Install JDK 1.8 (Java Development Kit)

  • For example:

sudo apt install openjdk-11-jdk openjdk-11-jre
javac -version
java -version

Step 3: Install Apache Spark

  • For example for 3.5.0:

wget https://www.apache.org/dyn/closer.lua/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz
tar -xvf spark-3.5.0-bin-hadoop3.tgz
rm -rf spark-3.5.0-bin-hadoop3.tgz
sudo mv spark-3.5.0-bin-hadoop3 /opt/spark

Make sure that Spark version you have installed is compatible with Java you have installed, and Zingg is supporting those versions.

Note: Zingg supports Spark 3.5 and the corresponding Java version.

Step 4: Install Apache Maven

  • Install the latest maven package.

  • For example for 3.8.8:

wget https://dlcdn.apache.org/maven/maven-3/3.8.8/binaries/apache-maven-3.8.8-bin.tar.gz
tar -xvf apache-maven-3.8.8-bin.tar.gz 
rm -rf apache-maven-3.8.8-bin.tar.gz 
cd apache-maven-3.8.8/
cd bin
./mvn --version

Make sure that mvn -version should display correct java version as well(JAVA 11)
Apache Maven 3.8.7
Maven home: /usr/share/maven
Java version: 11.0.23, vendor: Ubuntu, runtime: /usr/lib/jvm/java-11-openjdk-amd64

Step 5: Update Environment Variables

Open .bashrc and add env variables at the end of the file.

vim ~/.bashrc
export SPARK_HOME=/opt/spark
export SPARK_MASTER=local[*]
export MAVEN_HOME=/home/ubuntu/apache-maven-3.8.8
export ZINGG_HOME=<path_to_zingg>/assembly/target
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$JAVA_HOME/bin

<path_to_zingg> will be a directory where you clone the repository of the Zingg. Similarly, if you have installed spark on a different directory you can set SPARK_HOME accordingly.

Note :- Skip exporting MAVEN_HOME if multiple maven version are not required

  • Save/exit and do source .bashrc so that they reflect

source ~/.bashrc
  • Verify:

echo $PATH
mvn --version

Note: If you have already set up JAVA_HOME and SPARK_HOME in the steps before you don't need to do this again.

Step 6: Compile The Zingg Repository

  • Run the following to compile the Zingg Repository -

git branch
  • Run the following to Compile the Zingg Repository

mvn initialize
mvn clean compile package -Dspark=sparkVer
  • Run the following to Compile while skipping tests

mvn initialize
mvn clean compile package -Dspark=sparkVer -Dmaven.test.skip=true

Note: Replace the sparkVer with the version of Spark you installed. For example, -Dspark=3.5 you still face an error, include -Dmaven.test.skip=true with the above command.

Step 7: If you have any issue with 'SPARK_LOCAL_IP'

  • Install net-tools using sudo apt-get install -y net-tools

  • Run ifconfig in the terminal, find the IP address and paste the same in /opt/hosts IP address of your Pc-Name

Step 8: Run Zingg To Find Training Data

  • Run this script in the terminal opened in Zingg clones directory ./scripts/zingg.sh --phase findTrainingData --conf examples/febrl/config.json

If everything is right, it should show Zingg banner.

Follow this to install Java8 JDK1.8 in Ubuntu.

Download Apache Spark - from the .

Install downloaded Apache Spark - on your Ubuntu by following .

tutorial
tutorial
tutorial
Apache Spark Official Website
this tutorial