SSC (SAMPO Session Configuration) Specification¶
Contents
Overview¶
A SAMPO session configuration (SSC) describes a session which contains multiple run configurations.
Example:
learn_1:
type: learn
spd: example.spd
data_sources:
dl1:
path: learn.csv
attr_schema: learn.asd
predict_1:
type: predict
data_sources:
dl1:
path: predict.csv
attr_schema: learn.asd
model_process: learn_1
SSC Syntax¶
The basic syntax of the SSC file is YAML.
- YAML Version 1.2
A simple example:
learn_1:
type: learn
spd: example.spd
data_sources:
dl1:
path: learn.csv
attr_schema: learn.asd
predict_1:
type: predict
data_sources:
dl1:
path: predict.csv
attr_schema: learn.asd
model_process: learn_1
SSC supports Jinja2 template syntax. You can define variables and flow controls such as if…else, for, etc.
- Jinja2
With Jinja2 template:
{%- set num = 3 %}
{%- for i in range(num) %}
ex_learn_{{ i }}:
type: learn
spd: ex_{{ i }}.spd
data_sources:
dl1:
path: learn.csv
attr_schema: learn.asd
ex_predict_{{ i }}:
type: predict
data_sources:
dl1:
path: predict.csv
attr_schema: learn.asd
model_process: ex_learn_{{ i }}
{%- endfor %}
SSC Parameters¶
SSCs have different parameter combination configurations depending on the type of data source to be used.
For CSV file data_sources:
<process_name>:
type: <learn|predict>
data_sources:
<cid>:
path: <csv_file_path>
attr_schema: <asd_file_path>
filters:
- <filter_name>
- ...
...
spd: <spd_file>
model_process: <process_name>
For database data_sources:
<process_name>:
type: <learn|predict>
data_sources:
<cid>:
sql: <sql_query_or_database_table_or_view_name>
connection_uri: <connection_uri>
attr_schema: <asd_file_path>
filters:
- <filter_name>
- ...
...
spd: <spd_file>
model_process: <process_name>
...
For ARFF file data_sources:
<process_name>:
type: <learn|predict>
data_sources:
<cid>:
<path|data_source>: <arff_file_path>
filters:
- <filter_name>
- ...
...
model_process: <process_name>
Warning
ARFF file data_sources format is deprecated. Use CSV file data_sources format instead of ARFF file.
Parameters common to all data source patterns¶
- <process_name>
Only alphanumeric characters and underscores can be used.
The first character must be an alphabetic character.
- type
Specifies the process type: learn or predict
- spd (learning process only)
Specifies the file for the learning process.
- model_process (prediction process only)
Specifies a model for a prediction, which has been learned in a learn process.
You can use process name for it.
You need to use this only when the process type is predict.
Parameters specific to each data source¶
- data_sources
Defines the data source for each data loader component.
- CSV File
path: Specifies the input file path.
attr_schema: Specifies an ASD file path.
- SQL Query, database table, or view
sql: Specifies the input select sql query, database table, or view name. Table or column names with spaces in sql queries must be enclosed in double quotations. When working with time-series data, it is recommended to use ORDER BY in your query.
table_name: Same as sql. If specified together with sql, this parameter’s value is ignored.
Warning
table_name is deprecated. sql should be used instead of table_name.
connection_uri: Specifies a database connection URI as following format:
schema://[user[:password]@][host][:port][/database]
schema: postgresql is supported.
The password file (.pgpass) of PostgreSQL can be used to hold parts of the information of the database connection URI. A sample using .pgpass as following:
postgresql://aapfuser@dbhost:5432/testdb
attr_schema: Specifies an ASD file path.
- ARFF File
path: Specifies the input file path.
data_source: Same as path. If specified together with path, this parameter’s value is ignored.
Warning
ARFF file is deprecated format. Use CSV file data_sources instead of ARFF file.
data_source is deprecated. path should be used instead of data_source.
Filters¶
data_sources supports the following filters, which can select samples from the data:
-
slice
(start=0, stop, step=1)
Slices the data from start
to stop
at intervals of step
.
If there is only one argument assigned, the argument is considered as stop
, and rest of parameters are set to default.
If there are two arguments assigned, the arguments are considered as start
and stop
, and step
is set to default.
-
k_split
(k, pos=0, complementary=False)
Splits the data into k
parts and returns the pos
-th part.
If complementary
is True, return the complementary set of the
specified part instead of the part itself.
Examples¶
Three different learn/predict pairs:
{%- for i in range(3) %}
ex_learn_{{ i }}:
type: learn
spd: ex_{{ i }}.spd
data_sources:
dl1:
path: learn.csv
attr_schema: data.asd
ex_predict_{{ i }}:
type: predict
data_sources:
dl1:
path: predict.csv
attr_schema: data.asd
model_process: ex_learn_{{ i }}
{%- endfor %}
Three different learn/predict pairs with Database table:
{%- for i in range(3) %}
ex_learn_{{ i }}:
type: learn
spd: ex_{{ i }}.spd
data_sources:
dl1:
sql: SELECT * FROM table_learn ORDER BY _datetime ASC
connection_uri: postgresql://aapfuser:aapfpass@localhost:5432/testdb
attr_schema: table_learn.asd
ex_predict_{{ i }}:
type: predict
data_sources:
dl1:
sql: SELECT * FROM table_predict ORDER BY _datetime ASC
connection_uri: postgresql://aapfuser:aapfpass@localhost:5432/testdb
attr_schema: table_predict.asd
model_process: ex_learn_{{ i }}
{%- endfor %}
Cross validation:
{%- set num = 3 %}
{%- for i in range(num) %}
cv1_learn_{{ i }}:
type: learn
spd: cv.spd
data_sources:
dl1:
path: cv.csv
attr_schema: cv.asd
filters:
- k_split({{ num }}, {{ i }}, True)
cv1_predict_{{ i }}:
type: predict
data_sources:
dl1:
path: cv.csv
attr_schema: cv.asd
filters:
- k_split({{ num }}, {{ i }}, False)
model_process: cv1_learn_{{ i }}
{%- endfor %}