Skip to main content

Transform Data

Transformations in Meltano are implemented using dbt. All Meltano generated projects have a transform/ directory, which is the default location for your dbt configuration, models, packages, etc in order to run transformations. After installing a dbt plugin you can run an initialize command to automatically populate the contents of that directory.

If you already have an existing dbt project that you'd like to migrate to Meltano, check out the existing dbt project guide for more details.

Adapter-Specific dbt Transformation

In alignment with the dbt documentation, we support adapter-specific installations of dbt. See MeltanoHub for a list of all the supported adapters (e.g. Snowflake, Postgres, Redshift, BigQuery, DuckDB, etc.)

If you are interested in another adapter, please consider contributing its definition to MeltanoHub.

Install dbt

To install a dbt utility to your project, run:

# install adapter-specific dbt, e.g. for snowflake
meltano add utility dbt-snowflake

After dbt is installed you can configure it using config CLI commands, Meltano environments or environment variables:

# list available settings
meltano config dbt-snowflake list

# configure the `dev` environment interactively
meltano --environment=dev config dbt-snowflake set --interactive

# configure the `prod` environment interactively
meltano --environment=prod config dbt-snowflake set --interactive

More details on configuring plugins, including with environment variables.

Running dbt in Meltano

There are two ways to run dbt utility plugins using Meltano; in a pipeline using the run command or standalone with arguments using the invoke command.

Running dbt as part of a Pipeline

Once you have created your models in dbt, run them as part of a pipeline:

# run a complete ELT pipeline using the `dev` environment config
meltano --environment=dev run tap-gitlab target-snowflake dbt-snowflake:run

To run a subset of your dbt project, define a plugin command with your desired dbt selection filters:

# meltano.yml
plugins:
utilities:
- name: dbt-snowflake
commands:
my_models:
args: run --select +my_model_name
description: Run dbt, selecting model `my_model_name` and all upstream models. Read more about the dbt node selection syntax at https://docs.getdbt.com/reference/node-selection/syntax

This can then be executed as follows:

meltano --environment=dev run tap-gitlab target-snowflake dbt-snowflake:my_models

Invoking dbt

dbt can also be run directly, via the invoke command:

# run your entire dbt project
meltano invoke dbt-snowflake run

# run with node selection criteria
meltano invoke dbt-snowflake run --select +my_model_name

# run with a command specified in meltano.yml
meltano invoke dbt-snowflake:my_models

dbt Installation and Configuration (Transformer Plugin Type)

These instructions are the classic way of installing and running dbt as a transformer plugin type.

Users can still install dbt in this manner but we are prioritizing dbt utility plugin types for new and existing users.

To learn more about the dbt transformer plugins, please see the transformers plugin documentation on Meltano Hub. The current recommendation to use utility dbt plugins is not supported by the elt command so continue using a transformer if you prefer using elt over run.

To install the dbt transformer to your project run:

meltano add transformer dbt-<adapter-name>

# For example:

meltano add transformer dbt-snowflake

For more details on configuring a dbt transformer see the Meltano Hub documentation.

dbt Installation and Configuration (Classic)

These instructions are the classic way of installing and running dbt as a transformer plugin type.

Users can still install dbt in this manner but we are prioritizing dbt utility plugin types for new and existing users.

To learn more about the dbt Transformer package, please see the dbt plugin documentation on Meltano Hub.

To install the dbt transformer to your project run:

meltano add transformer dbt

After dbt is installed you can change the default configurations using environment variables or config CLI commands like the following:

meltano config dbt set target <target>

# For example:
meltano config dbt set target postgres

For more details, pipeline environment variables and dbt transform settings.

Working with Transform Plugins

WARNING: Transform plugins are currently de-prioritized by the Meltano project due to the difficulty of maintaining them at scale.

Users can still install and maintain them as they please but many have grown outdated and unmaintained.

Some users chose to install the existing transform plugins as a starting point then customize them for their own transformations.

Transform plugins are dbt packages that reside in their own repositories.

When a transform is added to a project, it is added as a dbt package in transform/packages.yml, enabled in transform/dbt_project.yml, and loaded for usage the next time dbt runs.

Note: You do not have to use transform plugin packages in order to use dbt. Many teams instead choose to create their own custom transformations.

For more information on how to build your own dbt models or to customize your project directly, see the dbt docs.

Configuring Transform Plugins

Transform plugins may have additional configuration options in meltano.yml. For example, the tap-gitlab dbt package requires three variables, which are used for finding the tables where raw data has been loaded during the Extract-Load phase:

{% raw %}
transforms:
- name: tap-gitlab
pip_url: https://gitlab.com/meltano/dbt-tap-gitlab.git
vars:
entry_table: "{{ env_var('PG_SCHEMA') }}.entry"
generationmix_table: "{{ env_var('PG_SCHEMA') }}.generationmix"
region_table: "{{ env_var('PG_SCHEMA') }}.region"
{% endraw %}

As an alternative to providing values from environment variables, you can also set values directly in meltano.yml:

transforms:
- name: tap-gitlab
pip_url: https://gitlab.com/meltano/dbt-tap-gitlab.git
vars:
entry_table: "my_raw_schema.entry"
generationmix_table: "my_raw_schema.generationmix"
region_table: "my_raw_schema.region"

Whenever Meltano runs a new transformation, transform/dbt_project.yml is updated using the values provided in meltano.yml.

Running a Transform in Meltano

The two main ways to run your dbt transforms using Meltano are by calling them inline with your ELT pipeline using --transform run or decoupled from your pipeline using invoke dbt:run.

Transform in your ELT pipeline

When melatno elt runs with the --transform run option, Meltano uses the convention that the transform has the same namespace as the extractor in its pipeline, except with snake_case (tap-gitlab -> tap_gitlab). As an example, assume that the following command runs:

meltano elt <tap> <target> --transform run

# For example:
meltano elt tap-gitlab target-postgres --transform run

After the Extract and Load steps are successfully completed meaning data has been extracted from the GitLab API and loaded to a Postgres DB, the dbt transform in the /transform/models/tap_gitlab/ directory is run.

Under the hood this --transform run option is telling Meltano to run multiple dbt commands. First it installs any required dbt package dependencies using dbt deps then it runs your models using dbt run --models <models>. The <models> argument is populated using the Meltano transform models setting documented here.

Using this method for executing transforms allows Meltano to make some assumptions about the appropriate configurations for running dbt. Based on the target loader you are using, Meltano is able to default your dbt transform target config setting to the correct SQL dialect (e.g. Snowflake, Postgres, etc.).

Starting with Meltano v3, the default source_schema value of $MELTANO_LOAD__TARGET_SCHEMA will stop working since the target extra was removed. To fix this, you can set the source_schema value to the appropriate environment variable for your target (e.g. $MELTANO_LOAD__DEFAULT_TARGET_SCHEMA for Postgres).

Transform directly

Just like other Meltano plugins, dbt transforms can be executed directly using invoke. Using this method decouples dbt transformations from ELT pipelines which could be preferred for certain users depending on their dbt project.

Users might choose this approach if they want to replicate data from many sources before running a set of dbt models that blend all of them together or maybe multiple models reference the same source data but are refreshed on different cadences (i.e. one is updated right when data arrives while another is only refreshed once a week).

For example, to run the same transforms as the tap-gitlab --transform=run example above, the following command can be run:

meltano invoke dbt:<command>

# For example:
meltano invoke dbt:run --models tap_gitlab.*

Again, this runs all dbt models in the /transform/models/tap_gitlab/ directory.

The downside of running directly vs in a pipeline is that Meltano can't infer anything about how dbt should run so more settings might need to be explictly set by the user. This includes target dialet DBT_TARGET, target schema DBT_TARGET_SCHEMA, and models DBT_MODELS.

See the transformer docs from other dbt commands.

Adding a Transform to your Meltano Project

Once the dbt transformer has been installed in your Meltano project you will see the /transform directory populated with dbt artifacts. If you chose to use the --transform run option in an ELT pipeline, its important to note that Meltano uses the convention that the transform has the same namespace as the extractor in its pipeline, except with snake_case (tap-gitlab -> tap_gitlab). For instance, all you need to do is start writing your dbt models in the appropriate /transform/models/<tap_name>/ directory.

See the dbt documentation for more details on writing models.

Another common option is to install your dbt project as a package from a separate git repository. See dbt package management. To do this you just add a /transform/packages.yml file to your project with your dbt project referenced. For instance your yaml file might look like this:

packages:
- git: https://gitlab.com/your_repo/your-dbt-project.git
revision: 1.0.0

If you plan to call dbt directly using invoke then you have to first run meltano invoke dbt:deps to install your package dependencies. Using the --transform=run option in your pipeline takes care of this step for you automatically.