Input Data
data
An array of input data. Each array entry here refers to a Zingg Pipe.
For the Zingg Community Version, it is better to have the most important fields first so that the blocking model can be learnt more effectively. The Zingg Enterprise Version has a wider search space and field ordering is not that critical.
If the data is self-describing, for e.g. Avro or Parquet, there is no need to define the schema. Else field definitions with names and types need to be provided.
For example, for the CSV under examples/febrl/test.csv

"data" : [ {
"name" : "test",
"format" : "csv",
"props" : {
"delimiter" : ",",
"header" : "true",
"location" : "examples/febrl/test.csv"
},
"schema" : "id string, fname string, lname string, dob integer"
}
Read more about Zingg Pipes for datastore connections here.
Last updated
Was this helpful?