SSC (SAMPO Session Configuration) Specification¶
Contents
Overview¶
A SAMPO session configuration (SSC) describes a session which contains multiple run configurations.
Example:
learn_1:
type: learn
spd: example.spd
data_sources:
dl1:
path: learn.csv
attr_schema: learn.asd
predict_1:
type: predict
data_sources:
dl1:
path: predict.csv
attr_schema: learn.asd
model_process: learn_1
SSC Syntax¶
The basic syntax of the SSC file is YAML.
- YAML Version 1.2
A simple example:
learn_1:
type: learn
spd: example.spd
data_sources:
dl1:
path: learn.csv
attr_schema: learn.asd
predict_1:
type: predict
data_sources:
dl1:
path: predict.csv
attr_schema: learn.asd
model_process: learn_1
SSC supports Jinja2 template syntax. You can define variables and flow controls such as if…else, for, etc.
- Jinja2
With Jinja2 template:
{%- set num = 3 %}
{%- for i in range(num) %}
ex_learn_{{ i }}:
type: learn
spd: ex_{{ i }}.spd
data_sources:
dl1:
path: learn.csv
attr_schema: learn.asd
ex_predict_{{ i }}:
type: predict
data_sources:
dl1:
path: predict.csv
attr_schema: learn.asd
model_process: ex_learn_{{ i }}
{%- endfor %}
SSC Parameters¶
SSCs have different parameter combination configurations depending on the type of data source to be used.
For CSV file data_sources:
<process_name>:
type: <learn|predict>
data_sources:
<cid>:
path: <csv_file_path>
attr_schema: <asd_file_path>
filters:
- <filter_name>
- ...
...
spd: <spd_file>
model_process: <process_name>
For database data_sources:
<process_name>:
type: <learn|predict>
data_sources:
<cid>:
sql: <sql_query_or_database_table_or_view_name>
connection_uri: <connection_uri>
attr_schema: <asd_file_path>
filters:
- <filter_name>
- ...
...
spd: <spd_file>
model_process: <process_name>
...
For ARFF file data_sources:
<process_name>:
type: <learn|predict>
data_sources:
<cid>:
<path|data_source>: <arff_file_path>
filters:
- <filter_name>
- ...
...
model_process: <process_name>
Warning
ARFF file data_sources format is deprecated. Use CSV file data_sources format instead of ARFF file.
Parameters common to all data source patterns¶
- <process_name>
Only alphanumeric characters and underscores can be used.
The first character must be an alphabetic character.
- type
Specifies the process type: learn or predict
- spd (learning process only)
Specifies the file for the learning process.
- model_process (prediction process only)
Specifies a model for a prediction, which has been learned in a learn process.
You can use process name for it.
You need to use this only when the process type is predict.
Parameters specific to each data source¶
- data_sources
Defines the data source for each data loader component.
- CSV File
path: Specifies the input file path.
attr_schema: Specifies an ASD file path.
- SQL Query, database table, or view
sql: Specifies the input select sql query, database table, or view name. Table or column names with spaces in sql queries must be enclosed in double quotations. When working with time-series data, it is recommended to use ORDER BY in your query.
table_name: Same as sql. If specified together with sql, this parameter’s value is ignored.
Warning
table_name is deprecated. sql should be used instead of table_name.
connection_uri: Specifies a database connection URI as following format:
schema://[user[:password]@][host][:port][/database]
schema: postgresql is supported.
The password file (.pgpass) of PostgreSQL can be used to hold parts of the information of the database connection URI. A sample using .pgpass as following:
postgresql://aapfuser@dbhost:5432/testdb
attr_schema: Specifies an ASD file path.
- ARFF File
path: Specifies the input file path.
data_source: Same as path. If specified together with path, this parameter’s value is ignored.
Warning
ARFF file is deprecated format. Use CSV file data_sources instead of ARFF file.
data_source is deprecated. path should be used instead of data_source.
Filters¶
data_sources supports the following filters, which can select samples from the data:
-
slice(start=0, stop, step=1)
Slices the data from start to stop at intervals of step.
If there is only one argument assigned, the argument is considered as stop, and rest of parameters are set to default.
If there are two arguments assigned, the arguments are considered as start and stop, and step is set to default.
-
k_split(k, pos=0, complementary=False)
Splits the data into k parts and returns the pos-th part.
If complementary is True, return the complementary set of the
specified part instead of the part itself.
Examples¶
Three different learn/predict pairs:
{%- for i in range(3) %}
ex_learn_{{ i }}:
type: learn
spd: ex_{{ i }}.spd
data_sources:
dl1:
path: learn.csv
attr_schema: data.asd
ex_predict_{{ i }}:
type: predict
data_sources:
dl1:
path: predict.csv
attr_schema: data.asd
model_process: ex_learn_{{ i }}
{%- endfor %}
Three different learn/predict pairs with Database table:
{%- for i in range(3) %}
ex_learn_{{ i }}:
type: learn
spd: ex_{{ i }}.spd
data_sources:
dl1:
sql: SELECT * FROM table_learn ORDER BY _datetime ASC
connection_uri: postgresql://aapfuser:aapfpass@localhost:5432/testdb
attr_schema: table_learn.asd
ex_predict_{{ i }}:
type: predict
data_sources:
dl1:
sql: SELECT * FROM table_predict ORDER BY _datetime ASC
connection_uri: postgresql://aapfuser:aapfpass@localhost:5432/testdb
attr_schema: table_predict.asd
model_process: ex_learn_{{ i }}
{%- endfor %}
Cross validation:
{%- set num = 3 %}
{%- for i in range(num) %}
cv1_learn_{{ i }}:
type: learn
spd: cv.spd
data_sources:
dl1:
path: cv.csv
attr_schema: cv.asd
filters:
- k_split({{ num }}, {{ i }}, True)
cv1_predict_{{ i }}:
type: predict
data_sources:
dl1:
path: cv.csv
attr_schema: cv.asd
filters:
- k_split({{ num }}, {{ i }}, False)
model_process: cv1_learn_{{ i }}
{%- endfor %}