Zingg-0.3.4
  • Welcome to Zingg
  • Step-By-Step Guide
    • Installation
      • Docker
        • Sharing custom data and config files
        • Shared locations
        • File read/write permissions
        • Copying Files To and From the Container
      • Installing From Release
        • Single Machine Setup
        • Spark Cluster Checklist
        • Installing Zingg
        • Verifying The Installation
      • Compiling From Source
    • Hardware Sizing
    • Zingg Runtime Properties
    • Zingg Command Line
    • Configuration
      • Configuring Through Environment Variables
      • Data Input and Output
        • Input Data
        • Output
      • Field Definitions
      • Model Location
      • Tuning Label, Match And Link Jobs
      • Telemetry
    • Working With Training Data
      • Finding Records For Training Set Creation
      • Labeling Records
      • Find And Label
      • Using pre-existing training data
      • Updating Labeled Pairs
      • Exporting Labeled Data
    • Building and saving the model
    • Finding the matches
    • Linking across datasets
  • Data Sources and Sinks
    • Zingg Pipes
    • Snowflake
    • JDBC
      • Postgres
      • MySQL
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
    • BigQuery
  • Working With Python
  • Running Zingg on Cloud
    • Running on AWS
    • Running on Azure
    • Running on Databricks
  • Zingg Models
    • Pre-trained models
  • Improving Accuracy
    • Ignoring Commonly Occuring Words While Matching
    • Defining Domain Specific Blocking And Similarity Functions
  • Documenting The Model
  • Interpreting Output Scores
  • Reporting bugs and contributing
    • Setting Zingg Development Environment
  • Community
  • Frequently Asked Questions
  • Reading Material
  • Security And Privacy
Powered by GitBook
On this page
  • Step 1 : Clone the Zingg Repository
  • Step 2 : Install JDK 1.8 (Java Development Kit)
  • Step 3 : Install Apache Spark - version spark-3.1.2-bin-hadoop3.2
  • Step 4 : Install Apache Maven
  • Step 5 : Set JAVA_HOME to JDK base directory
  • Step 6 : Compile the Zingg Repository
  • Step 7 : If had any issue with 'SPARK_LOCAL_IP'
  • Step 8 : Run Zingg to Find Training Data
  1. Reporting bugs and contributing

Setting Zingg Development Environment

Instructions on how to set up Zingg Development Environment

PreviousReporting bugs and contributingNextCommunity

Last updated 2 years ago

The following steps will help you set up the Zingg Development Environment. While the steps remain the same across different OS, we have provided detailed instructions for Ubuntu OS.

Step 1 : Clone the Zingg Repository

  • Install and SetUp Git: sudo apt install git

  • Set up Git by following the .

  • Clone the Zingg Repository: git clone https://github.com/zinggAI/zingg.git

Step 2 : Install JDK 1.8 (Java Development Kit)

  • Follow this to install Java8 JDK1.8 in Ubuntu.

Step 3 : Install Apache Spark - version spark-3.1.2-bin-hadoop3.2

  • Download Apache Spark - version spark-3.1.2-bin-hadoop3.2 from the .

  • Install downloaded Apache Spark - version spark-3.1.2-bin-hadoop3.2 on your Ubuntu by following .

Step 4 : Install Apache Maven

  • Install the latest maven package using the following Linux command:

sudo apt install maven

Step 5 : Set JAVA_HOME to JDK base directory

  • Go to cd /etc directory in your Ubuntu system, and open the ‘profile’ file using gedit. Just run sudo gedit profile

  • Paste these in the ‘profile’ file.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export SPARK_HOME=~/spark-3.1.2-bin-hadoop3.2
export SPARK_MASTER=local[\*]
export ZINGG_HOME=<path_to_zingg>/assembly/target

where <path_to_zingg> will be a directory where you clone the repository of the Zingg. Similarly, if you have installed spark on a different directory you can set SPARK_HOME accordingly.

Note :- If you have already set up JAVA_HOME and SPARK_HOME in the steps before you don't need to do this again.

Step 6 : Compile the Zingg Repository

  • Run the following to Compile the Zingg Repository - mvn clean compile package -Dspark=sparkVer

Step 7 : If had any issue with 'SPARK_LOCAL_IP'

  • Install net-tools using sudo apt-get install -y net-tools

  • Run command in the terminal ifconfig, find the IP address and paste the same in /opt/hosts IP address of your Pc-Name

Step 8 : Run Zingg to Find Training Data

  • Run this Script in terminal opened in zingg clones directory - ./scripts/zingg.sh --phase findTrainingData --conf examples/febrl/config.json

If everything is right, it should show Zingg Icon.

tutorial
tutorial
Apache Spark Official Website
this tutorial