Create a Custom Extractor

As much as we’d like to support all the data sources out there, we’ll need your help to get there. If you find a data source that Meltano doesn’t support right now, it might be time to get your hands dirty.

How to Create an Extractor #

Meltano’s SDK for Taps makes it easier than ever to create new extractors for your own custom data sources.

What is Singer?

Singer taps and targets are the mechanism Meltano uses to extract and load data. For more details about the Singer specification, please visit our Singer Spec documentation.

Create the Plugin’s Package #

  1. As a first step, follow the instructions in the SDK documentation to create a new project from the provided cookiecutter template.
  2. As you are developing, consult the SDK Dev Guide for developer documentation and the Code Samples page to find reusable sample code.
  3. For more in-depth information about the available features of the SDK, consult the Python API Reference documentation.

Cookiecutter

cookiecutter is a python tool to scaffold projects quickly from an existing template.

Add the Plugin to Your Meltano Project #

Meltano exposes each plugin configuration in the plugin definition, located in the meltano.yml project file.

To test the plugin as part of your Meltano project, you will need to add your plugin configuration in the meltano.yml file for your project.

In your existing meltano.yml:

# ...
plugins:
  extractors:
    # Insert a new entry:
    - name: tap-my-custom-source
      namespace: tap_my_custom_source
      # Installs the plugin from a local path
      # in 'editable' mode (https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs).
      # Can point to '.' if it's in the same directory as `meltano.yml`
      pip_url: -e /path/to/tap-my-custom-source
      # Name of custom tap that will be invoked.
      # Can be found in the pyproject.toml of your custom tap under CLI declaration
      executable: tap-my-custom-source
      capabilities:
        # For a reference of plugin capabilities, see:
        # https://docs.meltano.com/reference/plugin-definition-syntax#capabilities
        - state
        - catalog
        - discover
      config:
        # Configured values:
        username: me@example.com
        start_date: '2021-01-01'
      settings:
        - name: username
        - name: password
          kind: password
        - name: start_date
          # Default value for the plugin:
          value: '2010-01-01T00:00:00Z'
  loaders:
    # your loaders here:
    - name: target-jsonl
      variant: andyh1203
      pip_url: target-jsonl
    # ...

You can further customize the appearance of your custom extractor in [Meltano UI](/reference/ui) using the following options:

  • `label`
  • `logo_url`
  • `description`

Any time you manually add new plugins to meltano.yml, you will need to rerun the install command:

meltano install

Plugin Settings #

When creating a new plugin, you’ll often have to expose some settings to the user so that Meltano can generate the correct configuration to run your plugin.

To properly expose and configure your settings, you’ll need to define them:

  • name: Identifier of this setting in the configuration. The name is the most important field of a setting, as it defines how the value will be passed down to the underlying component. Nesting can be represented using the . separator.

    • foo represents the { foo: VALUE } in the output configuration.
    • foo.a represents the { foo: { a: VALUE } } in the output configuration.
  • kind: Represent the type of value this should be, (e.g. password for sensitive values or date_iso8601 for dates).
  • value (optional): Define a default value for the plugin’s setting.

Passing sensitive setting values #

It is best practice not to store sensitive values directly in meltano.yml.

Note in our example above, we provided values directly for username and start_date but we did not enter a value for password. This was to avoid storing sensitive credentials in clear text within our source code. Instead, make sure the setting kind is set to password and then run meltano config <plugin> set password <value>. You can also set the matching environment variable for this setting by running export TAP_MY_CUSTOM_SOURCE_PASSWORD=<value>.

You may use any of the following to configure setting values (in order of precedence):

  • Environment variables
  • config section in the plugin
  • Meltano UI
  • value of the setting’s definition

Interacting with your new plugin #

Now that your plugin is installed and configured, you are ready to interact with it using Meltano.

Use meltano invoke to run your plugin in isolation:

meltano invoke tap-my-custom-source

You can also use the --discover flag to see details about the supported streams:

meltano invoke tap-my-custom-source --discover

You can also use meltano select to parse your catalog and list all available entities and attributes:

meltano select --list --all

Now, run an ELT pipeline using your new tap:

meltano elt tap-my-custom-source target-sqlite

Publishing to the world #

Once you’ve built your tap and it is providing you the data you need, we hope you will consider sharing it with the world! We often find that community members who benefit from your tap also may contribute back their own improvements in the form of pull requests.

Publish to PyPI #

If you’ve built your tap using the SDK, you can take advantage of the streamlined poetry publish command to publish your tap directly to PyPI.

  1. Create an account with PyPI.
  2. Create a PyPI API token for use in automated publishing. (Optional but recommended.)
  3. Run poetry --build publish from within your repo to build and push your latest version to the PyPI servers.

Test a pip install #

We recommend using pipx to avoid dependency conflicts:

pip3 install pipx
pipx ensurepath
python -m pipx install tap-my-custom-source

After restarting your terminal, this should also work without the python -m prefix:

pipx install tap-my-custom-source

Or if you don’t want to use pipx:

pip3 install tap-my-custom-source

If you have gotten this far… Congrats! You are now a proud Singer tap developer!

Make it discoverable #

Once you have your tap published to PyPI, consider making it discoverable for other users of Meltano.

Updates for production use #

Once your repo is installable with pip, you can reference this in your meltano.yml file with three quick steps:

  1. Add a pip_url property to your extractor definition, for example pip_url: tap-my-custom-source.
    • Alternatively, you can also install the latest from your git repo directly using this syntax: pip_url: git+https://github.com/myusername/tap-my-custom-source@main
  2. Replace /path/to/tap-my-custom-source.sh with just the executable name: tap-my-custom-source.
  3. Rerun meltano install to use the version from pip in place of the local test version.

References #