SQL


SQL, Structured Query Language, is a language for manipulating databases.

What is it ?

The SQL plugin allows you to write your own SQL queries and use them into the Pipeline stack.

Installation

Before installing the SQL plugin, you must verify that the PDO extension is installed on your environment.

composer require php-etl/sql-plugin:'*'

If you want to use an engine like postgres, install ext-php_postgres on the computer. Add these lines to your pipeline to have an explicit error message if your SQL engine is not installed

  sql-to-csv:
   label: 'SQLite to CSV simple'
   composer:
     require:
       - "ext-php_sqlite"

Usage

Database connection

The SQL plugin uses the PDO extension and relies on its interface to access databases using the dsn, username and password parameters.

This connection must be present in any case, whether it be when defining the extractor, loader or lookup.

composer:
  require:
    - "ext-php_mysql"
pipeline:
  steps:
    sql:
      connection:
        dsn: 'mysql:host=127.0.0.1;port=3306;dbname=kiboko'
        username: username
        password: password

Options

Persistent

It is possible to specify options at the time of this connection using options. Currently, it is only possible to specify if the database connection should be persistent.

sql:
  connection:
    # ...
    options:
      persistent: true

Shared

In some cases, you may need to pool connections to your database to avoid having to open and close a whole new connection for every operation the database needs to perform.

sql:
  connection:
    # ...
    shared: true

Building an extractor

In the configuration of your extractor, you must write your query with the option query.

sql:
  extractor:
    query: 'SELECT * FROM table1'
  connection:
    dsn: 'mysql:host=127.0.0.1;port=3306;dbname=kiboko'
    username: username
    password: password

Building a lookup

In some cases, you will need to perform lookups by joining data from input columns to columns in a reference dataset; this is called a lookup.

In the configuration of your lookup, you must write your query with the option query.

The merge option allows you to add data to your dataset, in a sense merging your actual dataset with your new data.

The map option comes from the FastMap plugin, feel free to read its documentation to understand how to use it.

sql:
  lookup:
    query: 'SELECT * FROM table2 WHERE bar = foo'
    merge:
      map:
        - field: '[options]'
          expression: 'lookup["name"]'
  connection:
    dsn: 'mysql:host=127.0.0.1;port=3306;dbname=kiboko'
    username: username
    password: password

Building a ConditionalLookup

The conditional lookup is a lookup that takes conditions into account. Your lookup will be executed when each condition is met.

About its configuration, you will find the same options as for the classic lookup, except that there is an additional condition option.

sql:
  lookup:
    conditional:
      - condition: '@=input["id"] > 2'
        query: 'SELECT * FROM foo WHERE value IS NOT NULL AND id <= ?'
        parameters:
          identifier:
            value: '@=3'
        merge:
          map:
            - field: '[options]'
              expression: 'lookup["name"]'
  # ...

Building a loader

In the configuration of your loader, you must write your query with the option query.

sql:
  loader:
    query: 'INSERT INTO table1 VALUES (bar, foo, barfoo)'
  connection:
    dsn: 'mysql:host=127.0.0.1;port=3306;dbname=kiboko'
    username: username
    password: password

Building a ConditionalLoader

The conditional loader is a loader that takes conditions into account. Your loader will be executed when each condition is met.

About its configuration, you will find the same options as for the classic loader, except that there is an additional condition option.

sql:
  loader:
    conditional:
      - condition: '@=input["id"] > 2'
        query: 'SELECT * FROM foo WHERE value IS NOT NULL AND id <= ?'
        parameters:
          identifier:
            value: '@=3'
  # ...

Advanced Usage

Using params in your queries

Thanks to the SQL plugin, it is possible to write your queries with parameters.

If you write a prepared statement using named parameters (:param), your parameter’s key in the configuration will be the name of your parameter without the : :

sql:
  loader:
    query: 'INSERT INTO table1 VALUES (:value1, :value2, :value3)'
    parameters:
      value1:
        value: '@=input["value1"]'
      value2:
        value: '@=input["value3"]'
      value3:
        value: '@=input["value3"]'
    # ... 

If you are using a prepared statement using interrogative markers (?), your parameter’s key in the configuration will be its position (starting from 1) :

sql:
  loader:
    query: 'INSERT INTO table1 VALUES (?, ?, ?)'
    parameters:
      1:
        value: '@=input["value1"]'
      2:
        value: '@=input["value3"]'
      3:
        value: '@=input["value3"]'
  # ... 

Using an unknown number of parameters

In some cases, you may not know in advance how many parameters you will need to enter, for example if you are searching using an IN with many values.

Using from instead of value will bind as many parameters as there are values in the path.

And use the expression inSql(path, parameter_name) to prepare the values in the query.

sql:
  loader:
    query: '@="SELECT * FROM category WHERE id " ~ inSql(input["codes_list"], "identifier") ~ "'
    parameters:
      identifier:
        from: '@=input["codes_list"]'
  # ...

If at runtime there are 4 values under [codes_list], this would be equivalent to writing:

sql:
  loader:
    query: 'SELECT * FROM category WHERE id IN (:identifier_0, :identifier_1, :identifier_2, :identifier_3)'
    parameters:
      identifier_0:
        value: '@=input["codes_list"][0]'
      identifier_1:
        value: '@=input["codes_list"][1]'
      identifier_2:
        value: '@=input["codes_list"][2]'
      identifier_3:
        value: '@=input["codes_list"][3]'
  # ...

Creating before and after queries

In some cases, you may need to run queries in order to best prepare for the execution of your pipeline.

Before queries

Before queries will be executed before performing the query written in the configuration. Often, these are queries that set up the database.

sql:
  before:
    queries:
      - 'CREATE TABLE foo (id INTEGER NOT NULL, value VARCHAR(255) NOT NULL)'
      - 'INSERT INTO foo (id, value) VALUES (1, "Lorem ipsum dolor")'
      - 'INSERT INTO foo (id, value) VALUES (2, "Sit amet consecutir")'
  # ...

After queries

After queries will be executed after performing the query written in the configuration. Often, these are queries that clean up the database.

sql:
  after:
    queries:
      - 'DROP TABLE foo'
  # ...