Hardware Sizing

Zingg has been built to scale. Performance is dependent on

Here are some performance numbers you can use to determine the appropriate hardware for your data.

120k records of examples/febrl120k/test.csv take 5 minutes to run on a 4 core, 10 GB RAM local Spark cluster.
5m records of North Carolina Voters take ~4 hours on a 4 core, 10 GB RAM local Spark cluster.
9m records with 3 fields - first name, last name, email take 45 minutes to run on AWS m5.24xlarge instance with 96 cores, 384 gb RAM

If you have upto a few million records, it may be easier to run Zingg on a single machine in Spark local mode.

Last updated 3 years ago