Projects
At the core of the Meltano experience is your Meltano project, which represents the single source of truth regarding your ELT pipelines: how data should be integrated and transformed, how the pipelines should be orchestrated, and how the various plugins that make up your pipelines should be configured.
Since a Meltano project is just a directory on your filesystem containing text-based files, you can treat it like any other software development project and benefit from DataOps best practices such as version control, code review, and continuous integration and deployment (CI/CD).
You can initialize a new Meltano project using meltano init
.
meltano.yml
project file
At a minimum, a Meltano project must contain a project file named meltano.yml
,
which contains your project configuration and tells Meltano that a particular directory is a Meltano project.
The only required property is version
, which currently always holds the value 1
. You can find a formal JSON Schema for the specification on SchemaStore.org or directly in the main repository here, which can be useful for code generation by many tools like datamodel-code-generator or swagger-codegen.
Configuration
At the root of meltano.yml
, and usually at the top of the file, you will find project-specific configuration.
In a newly initialized project, a few environments will be populated to get you started.
To learn which settings are available, refer to the Settings reference.
Plugins
Your project's plugins,
typically added to your project
using meltano add
,
are defined under the plugins
property, inside an array named after the plugin type (e.g. extractors
, loaders
).
Every plugin in your project needs to have:
- a
name
that's unique among plugins of the same type, - a base plugin description describing the package in terms Meltano can understand, and
- configuration that can be defined across various layers, including the definition's
config
property.
A base plugin description consists of the pip_url
, executable
, capabilities
, and settings
properties,
but not every plugin definition will specify these explicitly:
- An inheriting plugin definition has an
inherit_from
property and inherits its base plugin description from another plugin in your project or a discoverable plugin identified by name. - A custom plugin definition has a
namespace
property instead and explicitly defines its base plugin description. - A shadowing plugin definition has neither property and implicitly inherits its base plugin description from the discoverable plugin with the same
name
.
When inheriting a base plugin description, the plugin definition does not need to explicitly specify a pip_url
(the package's pip install
argument),
but you may want to override the inherited value and set the property explicitly to point at a (custom) fork or to pin a package to a specific version.
When a plugin is added using meltano add
, the pip_url
is automatically repeated in the plugin definition for convenience.
In order to support version-specific pip constraint files, the pip_url value can optionally be parameterized using the
${MELTANO__PYTHON_VERSION}
variable. This is a special variable populated by Meltano with the specific version of Python used to
install the plugin and will inject the major and minor versions (e.g. 3.8, 3.9, etc.).
Inheriting plugin definitions
A plugin defined with an inherit_from
property inherits its base plugin description from another plugin identified by name. To find the matching plugin, other plugins in your project are considered first, followed by
discoverable plugins:
plugins:
extractors:
- name: tap-postgres # Shadows discoverable `tap-postgres` (see below)
- name: tap-postgres--billing
inherit_from: tap-postgres # Inherits from project's `tap-postgres`
- name: tap-bigquery--events
inherit_from: tap-bigquery # Inherits from discoverable `tap-bigquery`
When inheriting from another plugin in your project, its configuration is also inherited as if the values were defaults, which can then be overridden as appropriate:
plugins:
extractors:
- name: tap-google-analytics
variant: meltano
config:
key_file_location: client_secrets.json
start_date: '2020-10-01T00:00:00Z'
- name: tap-ga--view-foo
inherit_from: tap-google-analytics
config:
# `key_file_location` and `start_date` are inherited
view_id: 123456
- name: tap-ga--view-bar
inherit_from: tap-google-analytics
config:
# `key_file_location` is inherited
start_date: '2020-12-01T00:00:00Z' # `start_date` is overridden
view_id: 789012
Note that the presence of a variant
property causes only discoverable plugins to be considered
(even if there is also a matching plugin in the project),
since only these can have multiple variants:
plugins:
loaders:
- name: target-snowflake # Shadows discoverable `target-snowflake` (see below)
variant: datamill-co # using variant `datamill-co`
- name: target-snowflake--derived
inherit_from: target-snowflake # Inherits from project's `target-snowflake`
- name: target-snowflake--transferwise
inherit_from: target-snowflake # Inherits from discoverable `target-snowflake`
variant: transferwise # using variant `transferwise`
To learn how to add an inheriting plugin to your project, refer to the Plugin Management guide.
Custom plugin definitions
A plugin defined with a namespace
property (but no inherit_from
property) is a custom plugin that explicitly defines its base plugin description:
plugins:
extractors:
- name: tap-covid-19
namespace: tap_covid_19
pip_url: tap-covid-19
executable: tap-covid-19
capabilities:
- catalog
- discover
- state
settings:
- name: api_token
- name: user_agent
- name: start_date
To learn how to add a custom plugin to your project, refer to the Plugin Management guide.
Shadowing plugin definitions
A plugin defined without an inherit_from
or namespace
property implicitly inherits its base plugin description from the discoverable plugin with the same name
, as a form of shadowing:
plugins:
extractors:
- name: tap-gitlab
To learn how to add a discoverable plugin to your project, refer to the Plugin Management guide.
Variants
If multiple variants of a discoverable plugin are available,
the variant
property can be used to choose a specific one:
plugins:
extractors:
- name: tap-gitlab
variant: meltano
If no variant
is specified, the original variant supported by Meltano is used.
Note that this is not necessarily the default variant that is recommended to new users and would be used if the plugin were newly added to the project.
Plugin configuration
A plugin's configuration is stored under a config
property.
Values for plugin extras are stored among the plugin's other properties, outside of the config
object:
extractors:
- name: tap-example
config:
# Configuration goes here!
example_setting: value
# Extras go here!
example_extra: value
Plugin commands
Plugin commands are defined by the commands
property. The keys are the name of the command and the values are the arguments to be passed to the plugin executable. These can contain dynamic references to configuration using the Environment variable form of the configuration option.
utilities:
- name: dbt-snowflake
variant: dbt-labs
commands:
my_models:
args: run --select +my_model_name
description: Run dbt, selecting model `my_model_name` and all upstream models. Read more about the dbt node selection syntax at https://docs.getdbt.com/reference/node-selection/syntax
Commands can optionally specify some documentation displayed when listing commands. They can also optionally specify an alternative executable from the default one for the plugin.
- name: dagster
variant: quantile-development
commands:
start:
args: -f $REPOSITORY_DIR/repository.py
description: Start Dagster.
executable: dagit_invoker
Containerized commands
Commands can specify a container_spec
for containerized execution. To execute containerized commands where possible, use the --containers
flag.
See the full YAML reference for the container spec for more information.
Jobs
Your project's predefined pipelines, typically created using meltano job
, are defined under the jobs
property.
A job definition must have a name
and one or more tasks
:
jobs:
- name: tap-foo-to-target-bar-dbt
tasks:
- tap-foo target-bar dbt:run
- name: tap-foo-to-targets-bar-and-baz
tasks:
- tap-foo target-bar
- tap-foo target-baz
You can learn more about how tasks are defined and run in the meltano job
documentation.
Schedules
Your project's pipeline schedules,
typically created
using meltano schedule
,
are defined under the schedules
property.
A scheduled job must have a name
, job
and interval
:
schedules:
- name: foo-to-bar
job: tap-foo-to-target
interval: "@hourly"
The value for job
must be the name of an existing job within the project.
Alternatively, you can provide a name
, extractor
, loader
, transform
, and interval
in place of a job
:
- name: foo-to-bar-elt
extractor: tap-foo
loader: target-bar
transform: skip
interval: "@hourly"
Pipeline-specific configuration can be specified using environment variables in an env
dictionary:
schedules:
- name: foo-to-bar
job: tap-foo-to-target-bat
interval: "@hourly"
env:
TAP_FOO_BAR: bar
TAP_FOO_BAZ: baz
To learn more about pipeline schedules and orchestration, refer to the Orchestration guide.
Multiple YAML Files
As your project grows, and your meltano.yml
with it, you may wish to break your config into multiple .yml
files and to store those subfiles in various places in your Project folder hierachy.
This can be done by creating new .yml
files and adding them (directly or via a glob pattern) to the include_paths
key of your meltano.yml
:
include_paths:
- "./subconfig_[0-9].yml"
- "./*/subconfig_[0-9].yml"
- "./*/**/subconfig_[0-9].yml"
Meltano will use these paths or patterns to collect the config from them for use in your Project. Although the creation of subfiles is manual, once created any elements within each subfile can be updated using the meltano config
CLI. Adding new config elements places them in meltano.yml
. We are working on ways to direct new config into specific subfiles (#2985).
Currently supported elements in subfiles are plugins, schedules and environments.
Annotations
To better integrate with software other than the core Meltano library and CLI, meltano.yml
support "annotations", which is a dictionary that map from tool/vendor names to arbitrary dictionaries with whatever that tool/vendor wants to annotate the Meltano config with.
annotations:
arbitrary-third-party-tool: {
# Configuration for the third party tool
}
# etc.
The core Meltano library and CLI never access the annotations
field. To access it, one must read meltano.yml
. Nothing within an annotations
field should be thought of as part of Meltano's own configuration - it is merely extra data that Meltano permits within its configuration files.
Annotations are supported in the following locations within meltano.yml
:
- At the top level
- In a job definition
- In a schedule definition
- In an environment definition
- In a plugin definition
- In an environment plugin definition
- In a plugin setting definition
.gitignore
A newly initialized project comes with a .gitignore
file to ensure that
environment-specific and potentially sensitive configuration stored inside the
.meltano
directory and .env
file is not leaked accidentally.
All other files are recommended to be checked into the repository and shared between all users and environments that may use the project.
.env
Optionally, your project can contain a .env
file specifying
environment variables
used to configure Meltano and its plugins.
Typically, this file is used to store configuration that is environment-specific or sensitive,
and should not be stored in meltano.yml
and checked into version control.
meltano config <plugin> set
will automatically store configuration in meltano.yml
or .env
as appropriate.
In a newly initialized project, this file will be included in .gitignore
by default.
.meltano
directory
Meltano stores various files for internal use inside a .meltano
directory inside your project.
Note: $MELTANO_SYS_DIR_ROOT
can be used as a replacement to $MELTANO_PROJECT_ROOT/.meltano
directory.
These files are specific to the environment Meltano is running in, and should not be checked into version control.
In a newly initialized project, this directory will be included in .gitignore
by default.
While you would usually not want to modify files in this directory directly, knowing what's in there can aid in debugging:
.meltano/meltano.db
: The default SQLite system database..meltano/logs/elt/<state_id>/<run_id>/elt.log
, e.g..meltano/logs/elt/gitlab-to-postgres/<UUID>/elt.log
:meltano el
,meltano elt
andmeltano run
output logs for the specified pipeline run..meltano/run/bin
: Symlink to themeltano
executable most recently used in this project..meltano/run/elt/<state_id>/<run_id>/
, e.g..meltano/run/elt/gitlab-to-postgres/<UUID>/
: Directory used bymeltano el
,meltano elt
andmeltano run
to store pipeline-specific generated plugin config files, like an extractor'stap.config.json
,tap.properties.json
, andstate.json
..meltano/run/<plugin name>/
, e.g..meltano/run/tap-gitlab/
: Directory used bymeltano invoke
to store generated plugin config files..meltano/<plugin type>/<plugin name>/venv/
, e.g..meltano/extractors/tap-gitlab/venv/
: Python virtual environment directory that a plugin's pip package was installed into bymeltano add
ormeltano install
.
If $MELTANO_SYS_DIR_ROOT
is set, all the above mentioned paths .meltano/*
will point to $MELTANO_SYS_DIR_ROOT/*
.
System database
Meltano stores various types of metadata in a project-specific system database,
that takes the shape of a meltano.db
SQLite database stored inside the .meltano
directory by default. Other database backends are supported as well.
Like all files stored in the .meltano
directory, the system database is also environment-specific.
You can choose to use a different system database backend or configuration using the database_uri
setting.
While you would usually not want to modify the system database directly, knowing what's in there can aid in debugging.
Meltano's CLI utilizes the following tables:
runs
table: One row for eachmeltano el
,meltano elt
ormeltano run
pipeline run, holding started/ended timestamps and incremental replication state.plugin_settings
table: Plugin configuration set usingmeltano config <plugin> set
or the UI when the project is deployed as read-only.
Support for other database types
Meltano currently supports the following databases as backends for state and configuration:
- SQLite (supported out of the box)
- PostgreSQL (requires the
postgres
orpsycopg2
Python extra) - MS SQL Server (requires the
mssql
Python extra
Support for other databases is planned:
If you would like to see support for a specific database, please open an issue