BigQuery
Zingg can seamlessly work with Google BigQuery. Please find below details about the properties that must be set.
The two driver jars namely spark-bigquery-with-dependencies_2.12-0.24.2.jar and gcs-connector-hadoop2-latest.jar are required to work with BigQuery. To include these BigQuery drivers in the classpath, please configure the runtime properties to include these
In addition, the following property needs to be set
If Zingg is run from outside the Google cloud, it requires authentication, please set the following environment variable to the location of the file containing the service account key. A service account key can be created and downloaded in JSON format from the Google Cloud console.
Connection properties for BigQuery as a data source and data sink are given below. If you are curious to know more about how Spark connects to BigQuery, you may look at the Spark BigQuery connector documentation.
Properties for reading data from BigQuery:
The property credentialsFile
should point to the Google service account key file location. This is the same path that is used to set variable GOOGLE_APPLICATION_CREDENTIALS
. The table
property should point to a BigQuery table that contains source data. The property viewsEnabled
must be set to true only.
Properties For Writing Data To BigQuery:
To write to BigQuery, a bucket needs to be created and assigned to the temporaryGcsBucket
property.
Notes:
The library "gcs-connector-hadoop2-latest.jar" can be downloaded from Google and the library "spark-bigquery-with-dependencies_2.12-0.24.2" from the maven repo.
A typical service account key file looks like below (JSON).
Last updated