Skip to main content

Part 2 - Store data

Let’s learn by example.

In part 1, we extracted data from GitHub and are now ready to load the data into a PostgreSQL database.

If you're having trouble throughout this tutorial, you can always head over to the Slack channel to get help.

Getting your target ready

We're going to load our data into a dockerized PostgreSQL database running on your laptop. View the docker docs if you don't yet have docker installed. To launch a local PostgreSQL container, you just need to run:

docker run --name meltano_postgres -p 5432:5432 -e POSTGRES_USER=meltano -e POSTGRES_PASSWORD=password -d postgres

$ docker run -p 5432:5432 -e POSTGRES_USER=meltano -e POSTGRES_PASSWORD=password --name meltano_postgres -d postgres

504e2b416874dd6a5db3fe6dd3ff63f1d42095bbc4e87314f1f708f69c8188de
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
504e2b416874 postgres "docker-entrypoint.s…" 3 seconds ago Up 3 seconds 0.0.0.0:5432->5432/tcp kind_rosalind

The container will need a few seconds to initialize. You can test the connection with your favorite SQL tool using connection data:

  • host: localhost
  • port: 5432
  • database: postgres
  • user: meltano
  • password: password

Add the postgres loader

Add the postgres loader using the meltano add loader target-postgres --variant=meltanolabs command.

$ meltano add loader target-postgres --variant=meltanolabs
Added loader 'target-postgres' to your Meltano project
Variant: meltanolabs (default)
Repository: https://github.com/MeltanoLabs/target-postgres
Documentation: https://hub.meltano.com/loaders/target-postgres--meltanolabs

2024-01-01T00:25:40.604941Z [info ] Installing loader 'target-postgres'
---> 100%

2024-01-01T00:25:53.152127Z [info ] Installed loader 'target-postgres'

To learn more about loader 'target-postgres', visit https://hub.meltano.com/loaders/target-postgres--meltanolabs

Use the meltano invoke target-postgres --help command to test that the installation worked.

$ meltano invoke target-postgres --help
Usage: target-postgres [OPTIONS]

Execute the Singer target.

Options:
--input FILENAME A path to read messages from instead of from
standard in.
--config TEXT Configuration file location or 'ENV' to use
environment variables.
--format [json|markdown] Specify output style for --about
--about Display package metadata and settings.
--version Display the package version.
--help Show this message and exit.

Configure the target-postgres loader

To configure the plugin, look at the options by running meltano config target-postgres list:

$ meltano config target-postgres list

[...]
host [env: TARGET_POSTGRES_HOST] current value: 'postgres' (from `meltano.yml`)
  Host: Hostname for postgres instance. Note if sqlalchemy_url is set this will be ignored.
port [env: TARGET_POSTGRES_PORT] current value: None (default)
  Port: The port on which postgres is awaiting connection. Note if sqlalchemy_url is set this will be ignored. Defaults to 5432
user [env: TARGET_POSTGRES_USER] current value: 'meltano' (from `meltano.yml`)
  User: User name used to authenticate. Note if sqlalchemy_url is set this will be ignored.
password [env: TARGET_POSTGRES_PASSWORD] current value: None (default)
  Password: Password used to authenticate. Note if sqlalchemy_url is set this will be ignored.
database [env: TARGET_POSTGRES_DATABASE] current value: 'postgres' (from `meltano.yml`)
  Database: Database name. Note if sqlalchemy_url is set this will be ignored.
[...]
add_record_metadata [env: TARGET_POSTGRES_ADD_RECORD_METADATA] current value: None (default)
  Add Record Metadata: Note that this must be enabled for activate_version to work!This adds _sdc_extracted_at, _sdc_batched_at, and more to every table. See https://sdk.meltano.com/en/latest/implementation/record_metadata.html for more information.
[...]

To learn more about loader 'target-postgres' and its settings, visit https://hub.meltano.com/loaders/target-postgres--meltanolabs

Fill in the details for these four attributes, and set add_record_metadata to True by using the meltano config target-postgres set ATTRIBUTE VALUE command:

$ meltano config target-postgres set user meltano
  Loader 'target-postgres' setting 'user' was set in `meltano.yml`: 'meltano'
$ meltano config target-postgres set password password
  Loader 'target-postgres' setting 'password' was set in `.env`: (redacted)
$ meltano config target-postgres set database postgres
  Loader 'target-postgres' setting 'database' was set in `meltano.yml`: 'postgres'
$ meltano config target-postgres set add_record_metadata True
  Loader 'target-postgres' setting 'add_record_metadata' was set in `meltano.yml`: True
$ meltano config target-postgres set host localhost
  Loader 'target-postgres' setting 'host' was set in `meltano.yml`: localhost

This will add the non-sensitive configuration to your meltano.yml project file:

plugins:
loaders:
- name: target-postgres
variant: meltanolabs
pip_url: git+https://github.com/MeltanoLabs/target-postgres.git
config:
user: meltano
database: postgres
add_record_metadata: true
host: localhost

Sensitive configuration information (such as password) will instead be stored in your project's .env file so that it will not be checked into version control.

You can use meltano config target-postgres to check the configuration, including the default settings not visible in the project file.

$ meltano config target-postgres
{
    "add_record_metadata": true,
    "database": "postgres",
    "dialect+driver": "postgresql+psycopg2",
    "host": "localhost",
    "password": "password",
    "port": 5432,
    "user": "meltano"
}

By default, the database schema is named using the tap : tap_github. You change the target schema by setting default_target_schema in the target_postgres configuration.

Run your data integration (EL) pipeline

Now that your Meltano project, extractor, and loader are all set up, it's time to run your first data integration (EL) pipeline!

Run your newly added extractor and loader in a pipeline using meltano run:

$ meltano run tap-github target-postgres
2024-09-20T13:16:13.885045Z [warning ] No state was found, complete import.
2024-09-20T13:16:15.441183Z [info ] INFO Starting sync of repository: [...]
2024-09-20T13:16:15.901789Z [info ] INFO METRIC: {"type": "timer", "metric": "http_request_duration",[...]
---> 100%
2024-09-20T13:16:15.933874Z [info ] INFO METRIC: {"type": "counter", "metric": "record_count", "value": 21,[...]
2024-09-20T13:16:16.435885Z [info ] [...] message=Schema 'tap_github' does not exist. Creating... ...
2024-09-20T13:16:16.632945Z [info ] ... message=Table '"commits"' does not exist. Creating...
2024-09-20T13:16:16.729076Z [info ] ...message=Loading 21 rows into 'tap_github."commits"' ...
---> 100%
2024-09-20T13:16:16.864812Z [info ] ...Loading into tap_github."commits": {"inserts": 21, "updates": 0, "size_bytes": 4641} ...
2024-09-20T13:16:16.885846Z [info ] Incremental state has been updated at 2024-09-20 13:16:16.885259.
2024-09-20T13:16:16.960093Z [info ] Block run completed. ....

If everything was configured correctly, you should now see your data flow from your source into your destination!

The postgres database should now have a schema tap_github with the table commits containing your data.

Next Steps

Next, head over to Part 3 to add data processing to your pipeline.