Standardise Postprocessor
Overview
The Standardise Postprocessor standardises field values in matched output using predefined mappings. It addresses inconsistent values that commonly appear across systems (for example, job titles such as "VP of Ops", "Vice President Operations", or "V.P. Operations").
Key points:
The postprocessor runs after matching and before writing output.
It does not affect matching behavior; it transforms output values only.
Matching for mapping lookup is case-insensitive by default.
Primary key fields are never postprocessed.
Industry usage and benefits:
Human resources: standardise job titles and roles across multiple systems to improve headcount and reporting accuracy.
Sales and CRM: normalize company names for consistent account consolidation and analytics.
Product and catalog management: unify legacy and new product codes for inventory and billing reconciliation.
Compliance and reporting: enforce canonical values to simplify regulatory reporting and downstream processing.
Mapping file format and rules
Create a JSON file where each element is an array of equivalent values. The first value in each array is the canonical value and is used as the replacement for all other values in that row.
Important rules:
First value is canonical: e.g.
["Chief Executive Officer", "CEO", "Executive Director"]— all variants map toChief Executive Officer.Lookup is case-insensitive:
CEO,ceo,Ceowill match.Rows must be disjoint: a string must not appear in two different canonical groups.
The processor does not normalize extra whitespace, punctuation, or partial matches. Those are not handled automatically.
Example: jobtitles.json
Place mapping files from where you are executing. Use the base filename (without .json) when referencing the mapping.
Configure postprocessing
Here is how to configure the StandardisePostprocessor for a field in Python:
Python example
JSON configuration example
Use STANDARDISE_<basename> to reference a mapping file named <basename>.json.
Last updated
Was this helpful?