===============================================
SSC (SAMPO Session Configuration) Specification
===============================================

.. contents:: Contents
    :local:

Overview
========
A SAMPO session configuration (SSC) describes a session which contains multiple run configurations.

**Example**::

    learn_1:
        type: learn
        spd: example.spd
        data_sources:
            dl1:
                path: learn.csv
                attr_schema: learn.asd

    predict_1:
        type: predict
        data_sources:
            dl1:
                path: predict.csv
                attr_schema: learn.asd
        model_process: learn_1

|

Format
======
The SSC can only be prepared as a text file (usable in SAMPO Command).

SSC File
--------
The SSC file must fit the following constraints:

+----------------+------------------------------------------------------+
| Property       | Constraint                                           |
+================+======================================================+
| File name      | *ASCII characters*.ssc                               |
+----------------+------------------------------------------------------+
| Character code | | Python 3: UTF-8 (ASCII + Japanese Characters)      |
|                | | Python 2: ASCII                                    |
+----------------+------------------------------------------------------+
| Newline code   | CRLF (Recommended),  LF (Not Recommended)            |
+----------------+------------------------------------------------------+

|

SSC Syntax
==========
The basic syntax of the SSC file is YAML.

* YAML Version 1.2
    https://yaml.org/spec/1.2/spec.html

A simple example::

    learn_1:
        type: learn
        spd: example.spd
        data_sources:
            dl1:
                path: learn.csv
                attr_schema: learn.asd

    predict_1:
        type: predict
        data_sources:
            dl1:
                path: predict.csv
                attr_schema: learn.asd
        model_process: learn_1

SSC supports Jinja2 template syntax.
You can define variables and flow controls such as **if...else**, **for**, etc.

* Jinja2
    http://jinja.pocoo.org

With Jinja2 template::

    {%- set num = 3 %}
    {%- for i in range(num) %}

    ex_learn_{{ i }}:
        type: learn
        spd: ex_{{ i }}.spd
        data_sources:
            dl1:
                path: learn.csv
                attr_schema: learn.asd

    ex_predict_{{ i }}:
        type: predict
        data_sources:
            dl1:
                path: predict.csv
                attr_schema: learn.asd
        model_process: ex_learn_{{ i }}

    {%- endfor %}

SSC Parameters
==============
SSCs have different parameter combination configurations depending on the type of data source to be used.

For CSV file data_sources::

    <process_name>:
        type: <learn|predict>
        data_sources:
            <cid>:
                path: <csv_file_path>
                attr_schema: <asd_file_path>
                filters:
                    - <filter_name>
                    - ...

            ...
        spd: <spd_file>
        model_process: <process_name>

|

For database data_sources::

    <process_name>:
        type: <learn|predict>
        data_sources:
            <cid>:
                sql: <sql_query_or_database_table_or_view_name>
                connection_uri: <connection_uri>
                attr_schema: <asd_file_path>
                filters:
                    - <filter_name>
                    - ...

            ...
        spd: <spd_file>
        model_process: <process_name>
    ...

|

For ARFF file data_sources::

    <process_name>:
        type: <learn|predict>
        data_sources:
            <cid>:
                <path|data_source>: <arff_file_path>
                filters:
                    - <filter_name>
                    - ...

            ...
        model_process: <process_name>

.. warning::

   ARFF file data_sources format is deprecated. Use CSV file data_sources format instead of ARFF file.

Parameters common to all data source patterns
---------------------------------------------

* <process_name>
    * Only alphanumeric characters and underscores can be used.
    * The first character must be an alphabetic character.

* type
    * Specifies the process type: **learn** or **predict**

* spd (learning process only)
    * Specifies the file for the learning process.

* model_process (prediction process only)
    * Specifies a model for a prediction, which has been learned in a **learn** process.
    * You can use **process name** for it.
    * You need to use this only when the process type is **predict**.

Parameters specific to each data source
---------------------------------------

* data_sources
    Defines the data source for each data loader component.

    * CSV File
        * path: Specifies the input file path.
        * attr_schema: Specifies an ASD file path.

    * SQL Query, database table, or view
        * sql: Specifies the input select sql query, database table, or view name.
          Table or column names with spaces in sql queries must be enclosed in double quotations.
          When working with time-series data, it is recommended to use ORDER BY in your query.

        * table_name: Same as sql. If specified together with sql, this parameter's value is ignored.

        .. warning::

           table_name is deprecated. sql should be used instead of table_name.

        * connection_uri: Specifies a database connection URI as following format:

          ::

            schema://[user[:password]@][host][:port][/database]

          * schema: postgresql is supported.

          * The password file (.pgpass) of PostgreSQL can be used to hold parts of
            the information of the database connection URI.
            A sample using .pgpass as following:

            .. code-block:: python

               postgresql://aapfuser@dbhost:5432/testdb

        * attr_schema: Specifies an ASD file path.

    * ARFF File
        * path: Specifies the input file path.
        * data_source: Same as path. If specified together with path, this parameter's value is ignored.

        .. warning::

           * ARFF file is deprecated format. Use CSV file data_sources instead of ARFF file.
           * data_source is deprecated. path should be used instead of data_source.

Filters
-------
**data_sources** supports the following filters, which can select samples from the data:

.. method:: slice(start=0, stop, step=1)
    :noindex:

Slices the data from ``start`` to ``stop`` at intervals of ``step``.
If there is only one argument assigned, the argument is considered as ``stop``, and rest of parameters are set to default.
If there are two arguments assigned, the arguments are considered as ``start`` and ``stop``, and ``step`` is set to default.

|

.. method:: k_split(k, pos=0, complementary=False)
    :noindex:

Splits the data into ``k`` parts and returns the ``pos``-th part.
If ``complementary`` is **True**, return the complementary set of the
specified part instead of the part itself.

|

Examples
========
Three different learn/predict pairs::

    {%- for i in range(3) %}

    ex_learn_{{ i }}:
        type: learn
        spd: ex_{{ i }}.spd
        data_sources:
            dl1:
                path: learn.csv
                attr_schema: data.asd

    ex_predict_{{ i }}:
        type: predict
        data_sources:
            dl1:
                path: predict.csv
                attr_schema: data.asd
        model_process: ex_learn_{{ i }}

    {%- endfor %}

|

Three different learn/predict pairs with Database table::

    {%- for i in range(3) %}

    ex_learn_{{ i }}:
        type: learn
        spd: ex_{{ i }}.spd
        data_sources:
            dl1:
                sql: SELECT * FROM table_learn ORDER BY _datetime ASC
                connection_uri: postgresql://aapfuser:aapfpass@localhost:5432/testdb
                attr_schema: table_learn.asd

    ex_predict_{{ i }}:
        type: predict
        data_sources:
            dl1:
                sql: SELECT * FROM table_predict ORDER BY _datetime ASC
                connection_uri: postgresql://aapfuser:aapfpass@localhost:5432/testdb
                attr_schema: table_predict.asd
        model_process: ex_learn_{{ i }}

    {%- endfor %}

|

Cross validation::

    {%- set num = 3 %}
    {%- for i in range(num) %}

    cv1_learn_{{ i }}:
        type: learn
        spd: cv.spd
        data_sources:
            dl1:
                path: cv.csv
                attr_schema: cv.asd
                filters:
                    - k_split({{ num }}, {{ i }}, True)

    cv1_predict_{{ i }}:
        type: predict
        data_sources:
            dl1:
                path: cv.csv
                attr_schema: cv.asd
                filters:
                    - k_split({{ num }}, {{ i }}, False)
        model_process: cv1_learn_{{ i }}

    {%- endfor %}
