# Defining Domain Specific Blocking And Similarity Functions

You can add your own [blocking functions](https://github.com/zinggAI/zingg/tree/main/common/core/src/main/java/zingg/common/core/hash) which will be evaluated by Zingg to build the [blocking tree.](/latest/zmodels.md)

The blocking tree works on the matched records provided by the user as part of the training. At every node, it selects the hash function and the field on which it should be applied so that there is the least elimination of the matching pairs.\
\
Say we have data like this:

|  Pair 1  | firstname | lastname |
| :------: | :-------: | :------: |
| Record A |    john   |    doe   |
| Record B |   johnh   |   d oe   |

***

|  Pair 2  | firstname | lastname |
| :------: | :-------: | :------: |
| Record A |    mary   |    ann   |
| Record B |   marry   |          |

Let us assume we have hash function **first1char** and we want to check if it is a good function to apply to **firstname**:

| Pair |  Record  | Output |
| :--: | :------: | ------ |
|   1  | Record A | j      |
|   1  | Record B | j      |
|   2  | Record A | m      |
|   2  | Record B | m      |

There is no elimination in the pairs above, hence it is a good function.

Now let us try **last1char** on **firstname:**

| Pair |  Record  | Output |
| :--: | :------: | ------ |
|   1  | Record A | n      |
|   1  | Record B | h      |
|   2  | Record A | y      |
|   2  | Record B | y      |

Pair 1 is getting eliminated above, hence **last1char** is not a good function.

So, **first1char**(**firstname**) will be chosen. This brings near similar records together - in a way, clusters them to break the cartesian join.

These business-specific blocking functions go into [Hash Functions](https://github.com/zinggAI/zingg/tree/main/common/core/src/main/java/zingg/common/core/hash) and must be added to [HashFunctionRegistry](https://github.com/zinggAI/zingg/blob/main/common/core/src/main/java/zingg/common/core/hash/HashFunctionRegistry.java) and [hash functions config](https://github.com/zinggAI/zingg/blob/main/common/core/src/main/resources/hashFunctions.json).

Also, for similarity, you can define your own measures. Each **dataType** has predefined features, for example, [String](https://github.com/zinggAI/zingg/blob/main/common/core/src/main/java/zingg/common/core/feature/StringFeature.java) fuzzy type is configured for Affine and Jaro.

You can define your own [comparisons](https://github.com/zinggAI/zingg/tree/main/common/core/src/main/java/zingg/common/core/similarity/function) and use them.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.zingg.ai/latest/improving-accuracy/definingown.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
