StandardizeFD Component Specification

Overview

StandardizeFD component is a feature descriptor. In the learning phase, this component calculates mean and standard deviation. In the running phase, this component transforms data, using the mean and standard deviation.

Example:

  • SPD:

    dl1 -> std1
    
    ---
    
    components:
        dl1:
            component: DataLoader
    
        std1:
            component: StandardizeFDComponent
            features: scale == 'real' or scale == 'integer'
    
  • Input of the component:

_sid

temperature

pressure

cloudage

0

22.3

1001

NaN

1

21.8

1002

NaN

2

inf

NaN

NaN

3

23.4

1002

NaN

4

-inf

1002

NaN

  • Output of the component:

_sid

std1_temperature

std1_pressure

cloudage

0

-0.299253

-1.732051

0.0

1

-1.047385

0.577350

0.0

2

inf

NaN

0.0

3

1.346638

0.577350

0.0

4

-inf

0.577350

0.0

This component has no component-specific external formats.

See also

Component-common external format files in convert_process


Parameters

There are no component-specific parameters.


Utilizable Sample Metadata

There are no component-specific sample metadata available.


Output Attributes

StandardizeFD component generates the following attributes:

Attribute Name

Scale

Description

<component_id>_<original_attribute_name>

REAL

Standardized value of the original attribute.

These attributes are in the component output data. These can be loaded in SAMPO API or saved as data.csv after executing convert_process.

See also

Obtaining process results via ProcessResultLoader.


Attribute Metadata

The metadata of the output attributes is created with the following rules.

Context Rule

Attribute Name

Context Name

Description

<component_id>_<original_attribute_name>

mean

Mean of the original attribute values for learning.

<component_id>_<original_attribute_name>

std

Standard deviation of the original attribute values for learning.

Derivation Rule

Each new attribute is derived from the corresponding attribute selected by the features parameter of the component.

Example

{
    "nodes": [
        {"aid": "_sid", "name": "_sid", ... },
        {"aid": "dl1[0]", "name": "temperature", ... },
        {"aid": "dl1[1]", "name": "pressure", ... },
        {"aid": "std1[0]", "name": "std1_temperature",
         "scale": "real", "is_excluded": false, "cid": "std1",
         "cindex": 0, "values": null, "is_kept": false,
         "context": {"std": 6.6833125519211312e-01, "mean": 2.2500000000000000e+01}},
        {"aid": "std1[1]", "name": "std1_pressure",
         "scale": "real", "is_excluded": false, "cid": "std1",
         "cindex": 1, "values": null, "is_kept": false,
         "context": {"std": 4.3301270189221930e-01, "mean": 1.0017500000000000e+03}}
    ],
    "links": [
        {"source": "dl1[0]", "target": "std1[0]"},
        {"source": "dl1[1]", "target": "std1[1]"}
    ]
}

See also

Attribute metadata file format in Attribute Metadata File Specification


Model

The model of this component can be described by its fd_params.

fd_params

Type

Description

source_attr_names

list of string

A list of attribute names where the output attribute is derived from.

params

dict

The keys of this dictionary are the same as the context of this component’s Attribute Metadata.

When loaded in the SAMPO API, the model is represented as a dict of its fd_params.

See also

Obtaining process results via ProcessResultLoader.

{'fd_params':
    [{'source_attr_names': ['temperature'], 'params': {'std': 6.6833125519211312e-01, 'mean': 2.2500000000000000e+01}},
     {'source_attr_names': ['pressure'], 'params': {'std': 4.3301270189221930e-01, 'mean': 1.0017500000000000e+03}}]}

Details

  • In the learning phase, this component calculates the mean and standard deviation with the following rules:

    • The attribute scale must be INTEGER or REAL.

    • Missing values and +/- Inf values are skipped.

  • Standard deviation is not unbiased, but maximum likelihood of normal distribution as shown below:

    \(\sqrt{\mbox{mean}((x - \bar{x})^2)}\)

    where \(x\) is a learning data, and \(\bar{x}\) is the mean of x.

  • In the running phase, the component returns transformed data, using mean mean and standard deviation std as shown below:

    \((z -\) mean \()/\) std

    where \(z\) is data, and missing and +/- Inf values in \(z\) are invariant.

    Note

    If std = 0, standardized data is zero vector.