SSC (SAMPO Session Configuration) Specification

Overview

A SAMPO session configuration (SSC) describes a session which contains multiple run configurations.

Example:

learn_1:
    type: learn
    spd: example.spd
    data_sources:
        dl1:
            path: learn.csv
            attr_schema: learn.asd

predict_1:
    type: predict
    data_sources:
        dl1:
            path: predict.csv
            attr_schema: learn.asd
    model_process: learn_1

Format

The SSC can only be prepared as a text file (usable in SAMPO Command).

SSC File

The SSC file must fit the following constraints:

Property

Constraint

File name

ASCII characters.ssc

Character code

Python 3: UTF-8 (ASCII + Japanese Characters)
Python 2: ASCII

Newline code

CRLF (Recommended), LF (Not Recommended)


SSC Syntax

The basic syntax of the SSC file is YAML.

A simple example:

learn_1:
    type: learn
    spd: example.spd
    data_sources:
        dl1:
            path: learn.csv
            attr_schema: learn.asd

predict_1:
    type: predict
    data_sources:
        dl1:
            path: predict.csv
            attr_schema: learn.asd
    model_process: learn_1

SSC supports Jinja2 template syntax. You can define variables and flow controls such as if…else, for, etc.

With Jinja2 template:

{%- set num = 3 %}
{%- for i in range(num) %}

ex_learn_{{ i }}:
    type: learn
    spd: ex_{{ i }}.spd
    data_sources:
        dl1:
            path: learn.csv
            attr_schema: learn.asd

ex_predict_{{ i }}:
    type: predict
    data_sources:
        dl1:
            path: predict.csv
            attr_schema: learn.asd
    model_process: ex_learn_{{ i }}

{%- endfor %}

SSC Parameters

SSCs have different parameter combination configurations depending on the type of data source to be used.

For CSV file data_sources:

<process_name>:
    type: <learn|predict>
    data_sources:
        <cid>:
            path: <csv_file_path>
            attr_schema: <asd_file_path>
            filters:
                - <filter_name>
                - ...

        ...
    spd: <spd_file>
    model_process: <process_name>

For database data_sources:

<process_name>:
    type: <learn|predict>
    data_sources:
        <cid>:
            sql: <sql_query_or_database_table_or_view_name>
            connection_uri: <connection_uri>
            attr_schema: <asd_file_path>
            filters:
                - <filter_name>
                - ...

        ...
    spd: <spd_file>
    model_process: <process_name>
...

For ARFF file data_sources:

<process_name>:
    type: <learn|predict>
    data_sources:
        <cid>:
            <path|data_source>: <arff_file_path>
            filters:
                - <filter_name>
                - ...

        ...
    model_process: <process_name>

Warning

ARFF file data_sources format is deprecated. Use CSV file data_sources format instead of ARFF file.

Parameters common to all data source patterns

  • <process_name>
    • Only alphanumeric characters and underscores can be used.

    • The first character must be an alphabetic character.

  • type
    • Specifies the process type: learn or predict

  • spd (learning process only)
    • Specifies the file for the learning process.

  • model_process (prediction process only)
    • Specifies a model for a prediction, which has been learned in a learn process.

    • You can use process name for it.

    • You need to use this only when the process type is predict.

Parameters specific to each data source

  • data_sources

    Defines the data source for each data loader component.

    • CSV File
      • path: Specifies the input file path.

      • attr_schema: Specifies an ASD file path.

    • SQL Query, database table, or view
      • sql: Specifies the input select sql query, database table, or view name. Table or column names with spaces in sql queries must be enclosed in double quotations. When working with time-series data, it is recommended to use ORDER BY in your query.

      • table_name: Same as sql. If specified together with sql, this parameter’s value is ignored.

      Warning

      table_name is deprecated. sql should be used instead of table_name.

      • connection_uri: Specifies a database connection URI as following format:

        schema://[user[:password]@][host][:port][/database]
        
        • schema: postgresql is supported.

        • The password file (.pgpass) of PostgreSQL can be used to hold parts of the information of the database connection URI. A sample using .pgpass as following:

          postgresql://aapfuser@dbhost:5432/testdb
          
      • attr_schema: Specifies an ASD file path.

    • ARFF File
      • path: Specifies the input file path.

      • data_source: Same as path. If specified together with path, this parameter’s value is ignored.

      Warning

      • ARFF file is deprecated format. Use CSV file data_sources instead of ARFF file.

      • data_source is deprecated. path should be used instead of data_source.

Filters

data_sources supports the following filters, which can select samples from the data:

slice(start=0, stop, step=1)

Slices the data from start to stop at intervals of step. If there is only one argument assigned, the argument is considered as stop, and rest of parameters are set to default. If there are two arguments assigned, the arguments are considered as start and stop, and step is set to default.


k_split(k, pos=0, complementary=False)

Splits the data into k parts and returns the pos-th part. If complementary is True, return the complementary set of the specified part instead of the part itself.


Examples

Three different learn/predict pairs:

{%- for i in range(3) %}

ex_learn_{{ i }}:
    type: learn
    spd: ex_{{ i }}.spd
    data_sources:
        dl1:
            path: learn.csv
            attr_schema: data.asd

ex_predict_{{ i }}:
    type: predict
    data_sources:
        dl1:
            path: predict.csv
            attr_schema: data.asd
    model_process: ex_learn_{{ i }}

{%- endfor %}

Three different learn/predict pairs with Database table:

{%- for i in range(3) %}

ex_learn_{{ i }}:
    type: learn
    spd: ex_{{ i }}.spd
    data_sources:
        dl1:
            sql: SELECT * FROM table_learn ORDER BY _datetime ASC
            connection_uri: postgresql://aapfuser:aapfpass@localhost:5432/testdb
            attr_schema: table_learn.asd

ex_predict_{{ i }}:
    type: predict
    data_sources:
        dl1:
            sql: SELECT * FROM table_predict ORDER BY _datetime ASC
            connection_uri: postgresql://aapfuser:aapfpass@localhost:5432/testdb
            attr_schema: table_predict.asd
    model_process: ex_learn_{{ i }}

{%- endfor %}

Cross validation:

{%- set num = 3 %}
{%- for i in range(num) %}

cv1_learn_{{ i }}:
    type: learn
    spd: cv.spd
    data_sources:
        dl1:
            path: cv.csv
            attr_schema: cv.asd
            filters:
                - k_split({{ num }}, {{ i }}, True)

cv1_predict_{{ i }}:
    type: predict
    data_sources:
        dl1:
            path: cv.csv
            attr_schema: cv.asd
            filters:
                - k_split({{ num }}, {{ i }}, False)
    model_process: cv1_learn_{{ i }}

{%- endfor %}