Spreadsheet

FEATURE STATE: Gyroscops 0.1 [alpha]

XLSX is the file format used by Microsoft Excel spreadsheets.

ODS is the OpenDocument spreadsheet file format used by Libreoffice and OpenOffice.

What is it for?

The Spreadsheet plugin aims at integrating spreadsheet file formats (XLSX, ODS and CSV) into the Pipeline stack.

Installation

In a Satellite project, add the spreadsheet plugin this way:

composer require php-etl/spreadsheet-plugin

Usage

Building an extractor

To build an extractor, you need to specify the path of your file, the type of extractor to build, and the name of the sheet to read.

The different types of extractor supported are:

  • excel
  • open_document
  • csv



spreadsheet:
  extractor:
    file_path: 'input.xlsx'
    excel: 
      sheet: 'sheet2'



spreadsheet:
  extractor:
    file_path: 'input.ods'
    open_document: 
      sheet: 'sheet2'



spreadsheet:
  extractor:
    file_path: 'input.csv'
    csv: 
      delimiter: ','
      enclosure: '"'

Building a loader

To build a loader, you must specify the path of your file, the type of loader to build and the name of the sheet to write data into.

The different types of loader supported are:

  • excel
  • open_document
  • csv



spreadsheet:
  loader:
    file_path: 'output.xlsx'
    excel:
      sheet: 'sheet2'



spreadsheet:
  loader:
    file_path: 'output.ods'
    open_document: 
      sheet: 'sheet2'



spreadsheet:
  loader:
    file_path: 'output.csv'
    csv: 
      delimiter: ','
      enclosure: '"'

Advanced usage

Skip one or more lines

Your file may have a header of one or more lines, you must use the skip_lines option to get your pipeline to start extracting from the right row.



spreadsheet:
  extractor: 
    excel:
      # ...
      skip_lines: 2




spreadsheet:
  extractor: 
    open_document:
      # ...
      skip_lines: 2



spreadsheet:
  extractor: 
    csv:
      # ...
      skip_lines: 2

Note: This option is only available when building extractors

Splitting into several files

To limit the number of lines you will write into the output files, you can specify the max_lines option.



spreadsheet:
  loader:
    excel:
      # ...
      max_lines: 20



spreadsheet:
  loader:
    open_document:
      # ...
      max_lines: 20



spreadsheet:
  loader:
    csv:
      # ...
      max_lines: 20

Note: this option is only available for loaders