StandardizeFD Component Specification¶
Contents
Overview¶
StandardizeFD component is a feature descriptor. In the learning phase, this component calculates mean and standard deviation. In the running phase, this component transforms data, using the mean and standard deviation.
Example:
SPD:
dl1 -> std1 --- components: dl1: component: DataLoader std1: component: StandardizeFDComponent features: scale == 'real' or scale == 'integer'
Input of the component:
_sid
temperature
pressure
cloudage
0
22.3
1001
NaN
1
21.8
1002
NaN
2
inf
NaN
NaN
3
23.4
1002
NaN
4
-inf
1002
NaN
Output of the component:
_sid
std1_temperature
std1_pressure
cloudage
0
-0.299253
-1.732051
0.0
1
-1.047385
0.577350
0.0
2
inf
NaN
0.0
3
1.346638
0.577350
0.0
4
-inf
0.577350
0.0
This component has no component-specific external formats.
See also
Component-common external format files in convert_process
Output Attributes¶
StandardizeFD component generates the following attributes:
Attribute Name |
Scale |
Description |
---|---|---|
<component_id>_<original_attribute_name> |
REAL |
Standardized value of the original attribute. |
These attributes are in the component output data. These can be loaded in SAMPO API or saved as data.csv after executing convert_process.
See also
Obtaining process results via ProcessResultLoader.
Attribute Metadata¶
The metadata of the output attributes is created with the following rules.
Context Rule¶
Attribute Name |
Context Name |
Description |
---|---|---|
<component_id>_<original_attribute_name> |
mean |
Mean of the original attribute values for learning. |
<component_id>_<original_attribute_name> |
std |
Standard deviation of the original attribute values for learning. |
Derivation Rule¶
Each new attribute is derived from the corresponding attribute selected by the features
parameter of the component.
Example¶
{
"nodes": [
{"aid": "_sid", "name": "_sid", ... },
{"aid": "dl1[0]", "name": "temperature", ... },
{"aid": "dl1[1]", "name": "pressure", ... },
{"aid": "std1[0]", "name": "std1_temperature",
"scale": "real", "is_excluded": false, "cid": "std1",
"cindex": 0, "values": null, "is_kept": false,
"context": {"std": 6.6833125519211312e-01, "mean": 2.2500000000000000e+01}},
{"aid": "std1[1]", "name": "std1_pressure",
"scale": "real", "is_excluded": false, "cid": "std1",
"cindex": 1, "values": null, "is_kept": false,
"context": {"std": 4.3301270189221930e-01, "mean": 1.0017500000000000e+03}}
],
"links": [
{"source": "dl1[0]", "target": "std1[0]"},
{"source": "dl1[1]", "target": "std1[1]"}
]
}
See also
Attribute metadata file format in Attribute Metadata File Specification
Model¶
The model of this component can be described by its fd_params.
fd_params |
Type |
Description |
---|---|---|
source_attr_names |
list of string |
A list of attribute names where the output attribute is derived from. |
params |
dict |
The keys of this dictionary are the same as the context of this component’s Attribute Metadata. |
When loaded in the SAMPO API, the model is represented as a dict of its fd_params.
See also
Obtaining process results via ProcessResultLoader.
{'fd_params':
[{'source_attr_names': ['temperature'], 'params': {'std': 6.6833125519211312e-01, 'mean': 2.2500000000000000e+01}},
{'source_attr_names': ['pressure'], 'params': {'std': 4.3301270189221930e-01, 'mean': 1.0017500000000000e+03}}]}
Details¶
In the learning phase, this component calculates the mean and standard deviation with the following rules:
The attribute scale must be INTEGER or REAL.
Missing values and +/- Inf values are skipped.
Standard deviation is not unbiased, but maximum likelihood of normal distribution as shown below:
\(\sqrt{\mbox{mean}((x - \bar{x})^2)}\)
where \(x\) is a learning data, and \(\bar{x}\) is the mean of x.
In the running phase, the component returns transformed data, using mean
mean
and standard deviationstd
as shown below:\((z -\)
mean
\()/\)std
where \(z\) is data, and missing and +/- Inf values in \(z\) are invariant.
Note
If std = 0, standardized data is zero vector.