SPD (SAMPO Process Description) Specification¶
Contents
Overview¶
A SAMPO Process Description (SPD) describes directed relationships of components and the parameters of each component.
The SPD can be prepared in two formats: Python object or file.
- Each SPD has two main sections separated by
---
(three or more ‘-
‘ characters) and are written in the following order: - Data Flow Section:
Describing connections among components.
- Parameters Section:
Describing parameters of each component and global settings.
Example:
dl1 -> std1 -> rg1
---
components:
dl1:
component: DataLoader
std1:
component: StandardizeFDComponent
features: scale == 'real' or scale == 'integer'
rg1:
component: FABHMEBernGateLinearRgComponent
features: scale == 'real' or scale == 'integer
target: name == 'target'
global_settings:
keep_attributes:
- a
feature_exclude:
- c
Format¶
The SPD can be prepared either as:
a Python object (usable in SAMPO API)
a text file (usable in SAMPO API and SAMPO Command)
SPD Object¶
The SPD object is an instance of SAMPO API ProcessDescription and can only be used via SAMPO API.
See also
Generating ProcessDescription objects requires any of the following:
SPD file
String object that follows the SPD base syntax.
Text file or string object that follows the SPD template syntax with a parameter dictionary for rendering.
See also
SPD File¶
The SPD file is a text file that follows only the SPD base syntax. SAMPO Command can only support SPD files.
An SPD file must fit the following constraints:
Property |
Constraint |
---|---|
File name |
ASCII characters.spd |
Character code |
Python 3: UTF-8 (ASCII + Japanese Characters)
Python 2: ASCII
|
Newline code |
CRLF (Recommended), LF (Not Recommended) |
SPD Syntax¶
SPDs follow either of the two possible syntaxes:
- Base Syntax
Provides complete SPD information.
- Template Syntax
Provides a renderable template that allows dynamic value changes in SAMPO API.
Base Syntax¶
The base syntax contains complete SPD information that allows it to be directly used in SAMPO. SPD Files strictly follow the base syntax for SAMPO Command.
In SAMPO API gen_spd(), the params
parameter
is ignored whenever loading an SPD written in base syntax.
Base Syntax Example:
dl1 -> fd1 -> rg1
-> fd2 -> rg1
---
components:
dl1:
component: DataLoader
fd1:
component: StandardizeFDComponent
features: scale == 'real' or scale == 'integer'
fd2:
component: BinaryExpandFDComponent
features: scale == 'nominal'
disable_feature_exclude: True
rg1:
component: FABHMEBernGateLinearRgComponent
features: scale == 'real' and re_match('^a', name)
target: name == 'target'
global_settings:
keep_attributes:
- a
- b
feature_exclude:
- c
Template Syntax¶
Since SAMPO API gen_spd() allows the use of templates for creating SPD objects, templates for both file and string object cases must follow a similar syntax. Templates must be Jinja2 compliant and need to be rendered first before using in SAMPO.
- Jinja2
Template Example:
dl1 -> fd1 -> rg1
---
components:
dl1:
component: DataLoader
fd1:
component: StandardizeFDComponent
features: scale == 'real' or scale == 'integer'
rg1:
component: FABHMEBernGateLinearRgComponent
features: scale == 'real' and re_match('^a', name)
target: name == {{ target_attr }}
tree_depth: {{ tree_depth }}
global_settings:
keep_attributes:
- a
- b
feature_exclude:
- c
Data Flow Section¶
In the data flow section, connections between each component are described with component IDs and arrows (->
):
dl1 -> fd1 -> rg1
Component ID¶
You can use only alphanumeric characters and underscores for the component ID. The first character must be an alphabetic character:
aaa_2 -> bbb_2 # OK
_a -> b # NG: the first character is an underscore
3a -> b3 # NG: the first character is a number
あいう -> かきく # NG: non-ASCII characters
Spaces¶
There can be zero or more spaces between a component and an arrow. Tab characters are not allowed:
a->b->c # OK
a -> b->c # OK
a□->b # NG: □ is a tab
Comments¶
In the data flow section, only single-line comments are available. Single-line comments begin with the hash character # and are terminated by the end of line:
a -> b -> c # this is a comment
#a -> b -> c # this line is commented out
Branching¶
When a line begins with spaces and followed by an arrow which is aligned to an arrow in the previous line, it represents a branch from the preceding component of the arrow on the previous line:
a -> b -> c
-> d -> c
-> e -> c
The above configuration is same as the following:
a -> b -> c
a -> d -> c
d -> e -> c
The above configuration creates the following relationship:
![digraph foo {
graph [rankdir = LR];
"a" -> "b";
"a" -> "d";
"b" -> "c";
"d" -> "c";
"d" -> "e";
"e" -> "c";
}](../_images/graphviz-d65ef849ab54ee0552ff500d9a1be570d95dbede.png)
Duplicate and Join¶
- Duplicate
If multiple components have the same parent (preceding component), every child component receives the same data.
Below is an example.
Data flow:
dl1 -> fd1 dl1 -> fd2
In the above data flow, both fd1 and fd2 will be received the same data from dl1.
- Join
If a component has multiple parents, the component receives the output of all parents joined by the sample indices (_sid).
Below is an example.
Data flow:
dl1 -> rg1 dl2 -> rg1
Output of dl1:
_sid
dl1[0]
0
1
1
2
2
3
3
4
Output of dl2:
_sid
dl2[0]
3
1
4
2
5
3
6
4
Input to rg1:
_sid
dl1[0]
dl2[0]
0
1
NaN
1
2
NaN
2
3
NaN
3
4
1
4
NaN
2
5
NaN
3
6
NaN
4
Parameters Section¶
The parameters section has the following two sub-sections:
- Components sub-section:
Describing parameters of each component.
- Global settings sub-section:
Describing process-common parameters.
The basic syntax of the parameters section is YAML.
- YAML Version 1.2
Format:
components:
<cid>:
component: <component_class_name>
<parameter_key>: <parameter_value>
...
global_settings:
keep_attributes:
- <attr_name>
- ...
feature_exclude:
- <attr_name>
- ...
Components Sub-section¶
Components sub-section describes parameters of each component as shown below:
rg1:
component: FABHMEBernGateLinearRgComponent
features: scale == 'real' and re_match('^a', name)
target: name == 'target'
features
and target
are special parameters to select feature/target attributes.
Component parameters other than features
and target
are described in each Component Specifications.
Attribute Selection by features
and target
¶
Attribute selection by features
and target
parameters can be written in Python syntax as shown below:
features: scale == 'real' and not name == 'velocity'
target: name == 'target'
The following operators, matching functions, and defined variables are available in attribute selection.
Matching Functions¶
-
re_match
(pattern, variable)
Matches all attributes whose variable matches the pattern (regular expression):
features: re_match('^a', name)
In the above example, attributes whose name
begin with ‘a’ are selected as features.
-
derived_from
(attr_name)
Matches all attributes that are derived from the specified attribute name:
features: derived_from('a')
In the above example, attributes which are derived from the attribute ‘a’ are selected as features.
Warning
derived_from() is deprecated.
-
generated_by
(cid)
Matches all attributes that are generated by the specified component:
features: generated_by('fd1')
In the above example, attributes which are generated by the component ‘fd1’ are selected as features.
-
all
()
Matches all attributes:
features: all()
In the above example, all the attributes are selected as features.
-
empty
()
Matches no attributes:
features: empty()
In the above example, no attributes are selected as features.
Defined Variables¶
- name
Attribute name that is assigned by a component or a user.
- scale
Scale of the attribute as follows:
integer
real
date
nominal
disable_feature_exclude
¶
When the global settings parameter feature_exclude
is True, specified
attributes will be excluded from the selected features (see the Global Settings Sub-section).
However, when the component parameter disable_feature_exclude
is True, the feature exclusion for the component will be disabled:
components:
...
log1:
...
features: re_match('a', name)
disable_feature_exclude: True
std1:
...
features: re_match('a', name)
global_settings:
...
feature_exclude:
- a1
In the above example, the component log1 selects the attribute a1 as a feature, whereas the component std1 doesn’t because of feature_exclude
.
Global Settings Sub-section¶
Global Settings sub-section describes process-common parameters: keep_attributes
and feature_exclude
:
global_settings:
keep_attributes:
- a
- b
feature_exclude:
- c
keep_attributes
¶
keep_attributes
parameter describes the attribute names which will be kept even after running every component.
Attributes which are specified in this parameter can go through all components, be utilized by any component within the process, and get carried over to the succeeding components. These attributes can be utilized even though they are not generated by a preceding component.
feature_exclude
¶
feature_exclude
parameter describes the attribute names which will not be selected as features even though it matches the feature selection condition.