# Hardware Sizing

Zingg has been built to scale. Performance is dependent on:

* The **number of records** to be matched.
* The **number of fields** to be compared against each other.
* The **actual number** of duplicates.

Here are some performance numbers you can use to determine the appropriate hardware for your data.

* 120k records of **examples/febrl120k/test.csv** take 5 minutes to run on a 4 core, 10 GB RAM local Spark cluster.
* 5m records of [North Carolina Voters](https://github.com/zinggAI/zingg/tree/main/examples/ncVoters5M) take \~4 hours on a 4 core, 10 GB RAM local Spark cluster.
* 9m records with 3 fields - first name, last name, email take 45 minutes to run on AWS m5.24xlarge instance with 96 cores, 384 GB RAM
* 80m records with 8-10 fields took less than 2 hours on 1 driver (128 GB RAM, 32 cores), 8 workers (224 GB RAM, 64 cores). This is a user-reported stat without any optimization.
* ![image](https://github.com/user-attachments/assets/4dadeb56-9d66-4ed2-be6e-e0f0b9ab68c6)

If you have up to a few million records, it may be easier to run Zingg on a single machine in Spark local mode.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.zingg.ai/latest/stepbystep/hardwaresizing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
