githubEdit

BigQuery

Zingg can seamlessly work with Google BigQuery. Please find below details about the properties that must be set.

The two driver jars namely spark-bigquery-with-dependencies_2.12-0.24.2.jar and gcs-connector-hadoop2-latest.jar are required to work with BigQuery. To include these BigQuery drivers in the classpath, please configure the runtime properties to include these

spark.jars=./spark-bigquery-with-dependencies_2.12-0.24.2.jar,./gcs-connector-hadoop2-latest.jar

In addition, the following property needs to be set

spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem                                                      

If Zingg is run from outside the Google cloud, it requires authentication, please set the following environment variable to the location of the file containing the service account key. A service account key can be created and downloaded in JSON format from the Google Cloud consolearrow-up-right.

export GOOGLE_APPLICATION_CREDENTIALS=path to google service account key file

Connection properties for BigQuery as a data source and data sink are given below. If you are curious to know more about how Spark connects to BigQuery, you may look at the Spark BigQuery connector documentationarrow-up-right.

Properties for reading data from BigQuery:

The property credentialsFile should point to the Google service account key file location. This is the same path that is used to set variable GOOGLE_APPLICATION_CREDENTIALS. The table property should point to a BigQuery table that contains source data. The property viewsEnabled must be set to true only.

    "data" : [{
        "name":"test", 
         "format":"bigquery", 
        "props": {
            "credentialsFile": "/home/work/product/final/zingg-1/mynotification-46566-905cbfd2723f.json",
            "table": "mynotification-46566.zinggdataset.zinggtest",
            "viewsEnabled": true
        }
    }],

Properties For Writing Data To BigQuery:

To write to BigQuery, a bucket needs to be created and assigned to the temporaryGcsBucket property.

Notes:

  • The library "gcs-connector-hadoop2-latest.jar" can be downloaded from Googlearrow-up-right and the library "spark-bigquery-with-dependencies_2.12-0.24.2" from the maven repoarrow-up-right.

  • A typical service account key file looks like below (JSON).

Last updated

Was this helpful?