Integration of Fivetran and dbt (Data Build Tool) with Google Cloud’s BigQuery

SADA
The SADA Engineering Blog
8 min readAug 2, 2023

--

Authors: Avenash Ramesh, Senior Data Engineer, SADA & Pradeep Gurbani, Data Engineer, SADA

Intro

In today’s data-driven world, organizations face the challenge of efficiently managing and processing vast amounts of data to derive valuable insights. Data integration and transformation are critical steps in this process, enabling businesses to consolidate data from multiple sources, optimize its structure, and make it readily accessible for analysis. The integration of Fivetran, dbt (Data Build Tool), and BigQuery presents a powerful solution that simplifies and enhances the data pipeline, facilitating seamless data integration, transformation, and analysis. This blog explores the integration of these three tools and highlights the benefits it offers for organizations seeking to unlock the full potential of their data.

ETL vs ELT: Understanding the Key Differences

ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) pipelines are used in different scenarios based on specific requirements for building data pipelines. Here’s a short description of when to choose each pipeline:

It’s important to note that the choice between ETL and ELT depends on factors such as data volume, complexity, transformation requirements, performance considerations, and the capabilities of the data platforms being used. Both approaches have their strengths and can be effective depending on the specific use case and business requirements. It’s important to evaluate your specific use case and select the pipeline that best aligns with your data integration and transformation goals.

How Fivetran(EL) and dbt(T) as EL-T tools can be incorporated into the GCP ecosystem?

  • Fivetran: Fivetran, a leading cloud-based data integration platform, streamlines the process of extracting data from various sources. With its vast array of pre-built connectors of diverse data sources, it eliminates the need for manual coding and reduces the complexity associated with data ingestion, ensuring a consistent, reliable, and up-to-date data flow.
  • dbt: dbt (Data Build Tool) is an open-source transformation framework that empowers SQL-based transformations, and allows for data cleansing, structuring, and creating reusable data models. The tool’s code-based approach and modular structure facilitate version control, collaboration, and documentation, ensuring the maintainability and scalability of data transformations.
  • BigQuery: BigQuery, Google Cloud’s serverless, highly scalable data warehouse, provides a powerful foundation for storing and querying large volumes of data. With its distributed architecture and columnar storage, BigQuery delivers exceptional query performance, allowing organizations to explore vast datasets in near real-time.

The integration of Fivetran and dbt with BigQuery offers a comprehensive data analytics ecosystem, where data from diverse sources like Google Analytics API, Jira, etc. can be seamlessly integrated, transformed, and loaded into BigQuery for advanced analytics, machine learning, and business intelligence.

Setup info

Fivetran’s effectiveness relies on three key components that together form the foundation of a strong and efficient data integration solution: a destination, connectors, and a transformation setup. These elements are essential for maximizing the potential of Fivetran and ensuring smooth and effective data integration processes.

Destinations

Fivetran destinations are platforms where data can be stored and analyzed. They include data warehouses, cloud storage, analytics tools, operational databases, collaboration tools, application-specific data stores, and big data platforms. Fivetran seamlessly integrates with these destinations, enabling organizations to consolidate and leverage their data for informed decision-making.

Following are the steps to be followed to set up BigQuery as the destination in the FiveTran environment:

Step 1:

  • Log in to the Fivetran account and go to the Fivetran Dashboard.
  • Click on “Destinations” in the navigation menu and further click on the “Add destination” button to begin setting up the destination. Enter destination name and search for “BigQuery” in the list of available connectors. Click on it to configure.

Step 2:

  • There will be a prompt to provide the necessary details to connect to your BigQuery account. This typically includes providing your BigQuery project ID, dataset name, and credentials for authentication.
  • Once the required information is entered, click on the “Save and test” button to validate the connection.

That’s it! BigQuery as a destination in Fivetran has been successfully configured. Now it can be leveraged as a sink for the future connectors being set up.

Connectors

Fivetran connectors are pre-built integrations that simplify data integration by enabling organizations to connect and extract data from various sources. They include database connectors, cloud storage connectors, application connectors, marketing connectors, collaboration connectors, event streaming connectors, and custom connectors.

These connectors automate the extraction, transformation, and loading of data into a consolidated destination, streamlining the data integration process and enabling efficient data analysis.

Different connectors require different configurations. Outlined below is the configuration for the connector to Google Sheets.

Step 1:

  • Log in to your Fivetran account and go to the Fivetran Dashboard.
  • Navigate to “Connectors” under the navigation menu, click on the “Add Connector” button and further select the configured destination where data from the connector will be dumped.

Step 2:

  • In the Connectors page, search for “Google Sheets” in the list of available connectors and click on it.

Step 3:

  • To configure Google Sheets, authorize Fivetran to access Google Sheets data via the service account method as shown below or using OAuth, and enter the NameRange to select the range for data ingestion.
  • Once the required information is entered, click on the “Save and test” button to validate the connection.

Step 4:

  • Showcases a sample sheet with a dummy Named range which will be used to ingest the data to the staging table of BigQuery.

Step 5:

  • After configuring the sync settings and initial sync, the image shows the ingested data in BigQuery.

That’s it! You have successfully added the Google Sheets connector in Fivetran. Fivetran will now automatically sync data from your Google Sheets files, making it available for further analysis and consolidation with other data sources in your chosen destination.

Transformations

Fivetran transformations are features that allow organizations to manipulate and shape their data during the integration process. Following are the two types of transformations available under the “add transformation” option in FiveTran:

  • dbt transformation: Fivetran offers dbt Core*-compatible data models to transform your destination data into analytics-ready datasets based on custom logics.
  • Quickstart transformation: Fivetran’s Quickstart data models allow you to use the pre-built data models without building out your own dbt project.

For this use case, we will be using dbt Core for transforming the data. dbt transformations can be written in a variety of languages, including SQL, Python, and Scala. They are stored in a central git repository, making them easy to manage and share.

dbt transformations can be a powerful tool for improving the quality and consistency of your data. By following these steps, you can easily add dbt transformations to your Fivetran account.

Step 1:

  • Log in to your Fivetran account and go to the Fivetran Dashboard.
  • Navigate to “Transformations” under the navigation menu, click on the “Add transformation” button which it will provide the 2 above-mentioned options for transforming the data.

Step 2:

  • Once dbt transformation is chosen, we need to create a dbt project in a git repo similar to what is shown in the image below. Once the dbt project is initialized, we need to add our SQL models under the model's directory as shown.
  • Here we have configured transformed_sheet.sql model to use the incremental materialized option of dbt, allowing it to select only the incremental data from the source BigQuery table, i.e. poc_sheets in this case. Also, a filter of salary is being used to filter out the data with salary below 9500.
  • Once setup is completed, configured transformations are shown in the UI.

Step 3:

  • After transformations are added, we have multiple options for integrations (i.e. fully, partially, and independent).
  • Fully integrated: Fivetran will automatically run the transformation after each upstream connector is synced.
  • Partially integrated: Fivetran runs the transformation according to the schedule set, except when its schedule overlaps with the associated connector’s schedule. In that case, it will wait to run the transformation until the connector finishes running.
  • Independent: Fivetran will run the transformation based on a schedule with no regard for upstream connector schedules.

Step 4:

  • The image showcases the final transformed data where First name and Last name are concatenated salary > 9500 are stored to a separate table.

Once you have scheduled your DBT project to run, it will automatically transform your data on a regularly scheduled basis. This can help you to keep your data up-to-date and accurate.

Why would you consider using Fivetran and dbt alongside GCP tools?

While Google Cloud Platform (GCP) provides a range of powerful services for data integration, transformation, and analytics, tools like Fivetran and dbt offer additional benefits and capabilities that complement GCP services. Following are some of the points that support the same.

Conclusion

While GCP services like BigQuery, Dataflow, and others offer powerful data integration and transformation capabilities, tools like Fivetran and dbt provide specialized features and ease of use in specific areas. They can enhance your data pipeline workflows, simplify integration, and provide more robust code-driven transformation capabilities. These tools can help you to improve the quality, reliability, scalability, and cost-effectiveness of your data.

Moreover, this integrated ecosystem enhances and democratizes collaboration between data engineers, analysts, and data scientists. With the availability of version-controlled data models and transformations, teams can easily collaborate and iterate on data analytics projects. The integration represents a game-changer in the realm of data analytics, offering an end-to-end solution that unleashes organizations’ true value of their data, making informed decisions that drive success and growth.

Ultimately, the choice of using these tools alongside GCP depends on your specific requirements, project complexity, and desired workflows.

References

--

--

Global business and cloud consulting firm | Helping CIOs and #IT leaders transform in the #cloud| 3-time #GoogleCloud Partner of the Year.