Ubuntu/WSL2 Setup Guide
The following steps will help you set up the Zingg Development Environment on Ubuntu/WSL2.
Step 0: Initial OS Setup (Ubuntu/WSL2)
Make sure to update your Ubuntu installation:
sudo apt update
Step 0: Install Ubuntu on WSL2 on Windows
Install wsl: Type the following command in Windows PowerShell.
wsl --installDownload Ubuntu from Microsoft Store, Ubuntu 20.04 LTS
Configure Ubuntu with a username and password
Open Ubuntu 20.04 LTS and start working
sudo apt updateFollow this tutorial for more information.
Step 1: Clone The Zingg Repository (Ubuntu)
Install and SetUp Git: sudo apt install git
Verify : git --version
Set up Git by following the tutorial.
Clone the Zingg Repository: git clone https://github.com/zinggAI/zingg.git
Note: It is suggested to fork the repository to your account and then clone the repository.
Step 2: Install JDK 11 (Ubuntu)
Follow this tutorial to install Java 11 JDK 11 in Ubuntu.
For example:
Step 3: Install Apache Spark
Common Steps
Download Apache Spark - from the Apache Spark Official Website.
For example for 3.5.0:
Original Ubuntu Instructions (Manual Wget)
Install downloaded Apache Spark - on your Ubuntu by following this tutorial.
For example for 3.5.0:
Make sure that Spark version you have installed is compatible with Java you have installed, and Zingg is supporting those versions.
Note: Zingg supports Spark 3.5 and the corresponding Java version.
Step 4: Install Apache Maven (Ubuntu)
Install the latest maven package.
For example for 3.8.8:
Step 5: Update Environment Variables (Ubuntu - ~/.bashrc)
Open .bashrc and add env variables at the end of the file.
<path_to_zingg> will be a directory where you clone the repository of the Zingg. Similarly, if you have installed spark on a different directory you can set SPARK_HOME accordingly.
Note :- Skip exporting MAVEN_HOME if multiple maven version are not required
Save/exit and do source .bashrc so that they reflect
Verify:
Note: If you have already set up JAVA_HOME and SPARK_HOME in the steps before you don't need to do this again.
Step 6: Compile The Zingg Repository
Make sure you are executing the following commands in the same terminal window where you saved the bashrc. Run the following to compile the Zingg Repository -
Run the following to Compile the Zingg Repository
Run the following to Compile while skipping tests
Note: Replace the sparkVer with the version of Spark you installed.
For example, -Dspark=3.5 you still face an error, include -Dmaven.test.skip=true with the above command.
Step 7: If you have any issue with 'SPARK_LOCAL_IP' (Ubuntu)
Install net-tools using sudo apt-get install -y net-tools
Run
ifconfigin the terminal, find the IP address and paste the same in /opt/hosts IP address of your Pc-Name
Step 8: Run Zingg To Find Training Data
Run this script in the terminal opened in Zingg clones directory
./scripts/zingg.sh --phase findTrainingData --conf examples/febrl/config.json
If everything is right, it should show Zingg banner.
Step 9: Run Zingg To label Data
Run this script in the terminal opened in Zingg clones directory
./scripts/zingg.sh --phase label --conf examples/febrl/config.json --properties-file config/zingg.conf
Step 10: Run Zingg To train model based on labelling
Run this script in the terminal opened in Zingg clones directory
./scripts/zingg.sh --phase train --conf examples/febrl/config.json --properties-file config/zingg.conf
Step 11: Run Zingg To prepare final output data
Run this script in the terminal opened in Zingg clones directory
./scripts/zingg.sh --phase match --conf examples/febrl/config.json --properties-file config/zingg.confchange directory
cd /tmp/zinggOutput(path provided in config file) to see the output files.
Last updated
Was this helpful?