# AWS S3

Zingg can use AWS S3 as a source and sink

## Steps to run zingg on S3

* Set a bucket e.g. zingg28032023 and a folder inside it e.g. zingg
* Create aws access key and export via env vars (ensure that the user with below keys has read/write access to above) export AWS\_ACCESS\_KEY\_ID= export AWS\_SECRET\_ACCESS\_KEY= (if mfa is enabled AWS\_SESSION\_TOKEN env var would also be needed )
* Download hadoop-aws-3.1.0.jar and aws-java-sdk-bundle-1.11.271.jar via maven
* Set above in zingg.conf spark.jars=//hadoop-aws-3.1.0.jar,//aws-java-sdk-bundle-1.11.271.jar
* Run using below commands

```bash
 ./scripts/zingg.sh --phase findTrainingData --properties-file config/zingg.conf  --conf examples/febrl/config.json --zinggDir  s3a://zingg28032023/zingg
 ./scripts/zingg.sh --phase label --properties-file config/zingg.conf  --conf examples/febrl/config.json --zinggDir  s3a://zingg28032023/zingg
 ./scripts/zingg.sh --phase train --properties-file config/zingg.conf  --conf examples/febrl/config.json --zinggDir  s3a://zingg28032023/zingg
 ./scripts/zingg.sh --phase match --properties-file config/zingg.conf  --conf examples/febrl/config.json --zinggDir  s3a://zingg28032023/zingg
```

## Model location

```
Models etc. would get saved in 
Amazon S3 > Buckets > zingg28032023 >zingg > 100
```
