Zingg-0.3.3
  • Welcome to Zingg
  • Step By Step Guide
    • Installation
      • Working with Docker Image
    • Hardware Sizing
    • Configuration
    • Creating training data
      • findTrainingData
      • label
      • findAndLabel
      • Using preexisting training data
      • Exporting labeled data as csv
    • Building and saving the model
    • Finding the matches
    • Linking across datasets
  • Data Sources and Sinks
    • Zingg Pipes
    • Snowflake
    • Cassandra
    • MongoDB
    • Neo4j
    • Parquet
  • Running Zingg on Cloud
    • Running on AWS
    • Running on Azure
    • Running on Databricks
  • Zingg Models
    • Pretrained models
  • Improving Accuracy By Defining Own Functions
  • Generating Documentation
  • Output Scores
  • Security And Privacy
  • Updating Labeled Pairs
  • Reporting bugs and contributing
  • Community
  • Frequently Asked Questions
  • Reading Material
Powered by GitBook
On this page
  • name
  • format
  • options
  1. Data Sources and Sinks

Zingg Pipes

PreviousData Sources and SinksNextSnowflake

Last updated 2 years ago

Zingg Pipes are an abstraction for a data source from which Zingg fetches data for matching or to which Zingg writes its output. This lets users connect to literally any datastore that has a Spark connector.

The pipe is an easy way to specify properties and formats for the Spark connector of the relevant datasource. Zingg pipes can be configured through the config passed to the program by outlining the datastore connection properties.

Pipes can be configured for the data or the output attributes on the .

Each pipe has the following attributes

name

unique name to identify the data store

format

One of the Spark supported connector formats. jdbc/avro/parquet etc

options

Properties to be passed to spark.read and spark.write.

Let us look at some common datasources and their configurations.

JSON
JSON