githubEdit

Ubuntu/WSL2 Setup Guide

The following steps will help you set up the Zingg Development Environment on Ubuntu/WSL2.

Step 0: Initial OS Setup (Ubuntu/WSL2)

Make sure to update your Ubuntu installation:

sudo apt update

Step 0: Install Ubuntu on WSL2 on Windows

  • Install wsl: Type the following command in Windows PowerShell.

wsl --install
  • Download Ubuntu from Microsoft Store, Ubuntu 20.04 LTS

  • Configure Ubuntu with a username and password

  • Open Ubuntu 20.04 LTS and start working

sudo apt update

Step 1: Clone The Zingg Repository (Ubuntu)

  • Install and SetUp Git: sudo apt install git

  • Verify : git --version

  • Set up Git by following the tutorialarrow-up-right.

  • Clone the Zingg Repository: git clone https://github.com/zinggAI/zingg.git

Note: It is suggested to fork the repository to your account and then clone the repository.

Step 2: Install JDK 11 (Ubuntu)

Step 3: Install Apache Spark

Common Steps

Original Ubuntu Instructions (Manual Wget)

Make sure that Spark version you have installed is compatible with Java you have installed, and Zingg is supporting those versions.

Note: Zingg supports Spark 3.5 and the corresponding Java version.

Step 4: Install Apache Maven (Ubuntu)

  • Install the latest maven package.

  • For example for 3.8.8:

Step 5: Update Environment Variables (Ubuntu - ~/.bashrc)

Open .bashrc and add env variables at the end of the file.

<path_to_zingg> will be a directory where you clone the repository of the Zingg. Similarly, if you have installed spark on a different directory you can set SPARK_HOME accordingly.

Note :- Skip exporting MAVEN_HOME if multiple maven version are not required

  • Save/exit and do source .bashrc so that they reflect

  • Verify:

Note: If you have already set up JAVA_HOME and SPARK_HOME in the steps before you don't need to do this again.

Step 6: Compile The Zingg Repository

  • Make sure you are executing the following commands in the same terminal window where you saved the bashrc. Run the following to compile the Zingg Repository -

  • Run the following to Compile the Zingg Repository

  • Run the following to Compile while skipping tests

Note: Replace the sparkVer with the version of Spark you installed. For example, -Dspark=3.5 you still face an error, include -Dmaven.test.skip=true with the above command.

Step 7: If you have any issue with 'SPARK_LOCAL_IP' (Ubuntu)

  • Install net-tools using sudo apt-get install -y net-tools

  • Run ifconfig in the terminal, find the IP address and paste the same in /opt/hosts IP address of your Pc-Name

Step 8: Run Zingg To Find Training Data

  • Run this script in the terminal opened in Zingg clones directory ./scripts/zingg.sh --phase findTrainingData --conf examples/febrl/config.json

If everything is right, it should show Zingg banner.

Step 9: Run Zingg To label Data

  • Run this script in the terminal opened in Zingg clones directory ./scripts/zingg.sh --phase label --conf examples/febrl/config.json --properties-file config/zingg.conf

Step 10: Run Zingg To train model based on labelling

  • Run this script in the terminal opened in Zingg clones directory ./scripts/zingg.sh --phase train --conf examples/febrl/config.json --properties-file config/zingg.conf

Step 11: Run Zingg To prepare final output data

  • Run this script in the terminal opened in Zingg clones directory ./scripts/zingg.sh --phase match --conf examples/febrl/config.json --properties-file config/zingg.conf

  • change directory cd /tmp/zinggOutput (path provided in config file) to see the output files.

Last updated

Was this helpful?