======================================
StandardizeFD Component Specification
======================================

.. contents:: Contents
    :local:

Overview
========
**StandardizeFD component** is a feature descriptor.
In the learning phase, this component calculates mean and standard deviation.
In the running phase, this component transforms data, using the mean and standard deviation.

**Example**:

* SPD:

  .. code-block:: python

     dl1 -> std1

     ---

     components:
         dl1:
             component: DataLoader
  
         std1:
             component: StandardizeFDComponent
             features: scale == 'real' or scale == 'integer'

* Input of the component:

 +--------+---------------+------------+----------+
 |   _sid |   temperature |   pressure | cloudage |
 +========+===============+============+==========+
 | 0      | 22.3          | 1001       | NaN      |
 +--------+---------------+------------+----------+
 | 1      | 21.8          | 1002       | NaN      |
 +--------+---------------+------------+----------+
 | 2      | inf           | NaN        | NaN      |
 +--------+---------------+------------+----------+
 | 3      | 23.4          | 1002       | NaN      |
 +--------+---------------+------------+----------+
 | 4      | -inf          | 1002       | NaN      |
 +--------+---------------+------------+----------+

* Output of the component:

 +--------+--------------------+-----------------+----------+
 |   _sid |   std1_temperature |   std1_pressure | cloudage |
 +========+====================+=================+==========+
 | 0      | -0.299253          | -1.732051       | 0.0      |
 +--------+--------------------+-----------------+----------+
 | 1      | -1.047385          | 0.577350        | 0.0      |
 +--------+--------------------+-----------------+----------+
 | 2      | inf                | NaN             | 0.0      |
 +--------+--------------------+-----------------+----------+
 | 3      | 1.346638           | 0.577350        | 0.0      |
 +--------+--------------------+-----------------+----------+
 | 4      | -inf               | 0.577350        | 0.0      |
 +--------+--------------------+-----------------+----------+

This component has no component-specific external formats.

.. seealso::

    Component-common external format files in :ref:`convert_process`

|

Parameters
==========
There are no component-specific parameters.

|

Utilizable Sample Metadata
==========================
There are no component-specific sample metadata available.

|

Output Attributes
=================
**StandardizeFD component** generates the following attributes:

.. list-table::
  :header-rows: 1
  :widths: 3,1,3

  * - Attribute Name
    - Scale
    - Description
  * - *<component_id>*\ _\ *<original_attribute_name>*
    - REAL
    - Standardized value of the original attribute.

These attributes are in the component output data. These can be loaded
in SAMPO API or saved as data.csv after executing :ref:`convert_process`.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_.

|

Attribute Metadata
==================
The metadata of the output attributes is created with the following rules.

Context Rule
------------
.. list-table::
  :header-rows: 1
  :widths: 3,1,3

  * - Attribute Name 
    - Context Name
    - Description
  * - *<component_id>*\ _\ *<original_attribute_name>*
    - mean
    - Mean of the original attribute values for learning.
  * - *<component_id>*\ _\ *<original_attribute_name>*
    - std
    - Standard deviation of the original attribute values for learning.

Derivation Rule
---------------
Each new attribute is derived from the corresponding attribute selected by the ``features`` parameter of the component.

Example
-------
.. code-block:: javascript

    {
        "nodes": [
            {"aid": "_sid", "name": "_sid", ... },
            {"aid": "dl1[0]", "name": "temperature", ... },
            {"aid": "dl1[1]", "name": "pressure", ... },
            {"aid": "std1[0]", "name": "std1_temperature",
             "scale": "real", "is_excluded": false, "cid": "std1", 
             "cindex": 0, "values": null, "is_kept": false, 
             "context": {"std": 6.6833125519211312e-01, "mean": 2.2500000000000000e+01}}, 
            {"aid": "std1[1]", "name": "std1_pressure",
             "scale": "real", "is_excluded": false, "cid": "std1", 
             "cindex": 1, "values": null, "is_kept": false, 
             "context": {"std": 4.3301270189221930e-01, "mean": 1.0017500000000000e+03}}
        ], 
        "links": [
            {"source": "dl1[0]", "target": "std1[0]"}, 
            {"source": "dl1[1]", "target": "std1[1]"}
        ]
    }

.. seealso::
    
    Attribute metadata file format in :ref:`Attribute Metadata File Specification <attribute-metadata>`

|

Model
=====
The model of this component can be described by its fd_params.

.. list-table::
  :header-rows: 1
  :widths: 3,1,3

  * - fd_params
    - Type
    - Description
  * - source_attr_names
    - list of string
    - A list of attribute names where the output attribute is derived from.
  * - params
    - dict
    - The keys of this dictionary are the same as the context of this component's Attribute Metadata.

When loaded in the SAMPO API, the model is represented as a dict of its fd_params.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_.

::

    {'fd_params':
        [{'source_attr_names': ['temperature'], 'params': {'std': 6.6833125519211312e-01, 'mean': 2.2500000000000000e+01}},
         {'source_attr_names': ['pressure'], 'params': {'std': 4.3301270189221930e-01, 'mean': 1.0017500000000000e+03}}]}

Details
=======
* In the learning phase, this component calculates the mean and standard deviation with the following rules:

  * The attribute scale must be INTEGER or REAL.
  * Missing values and +/- Inf values are skipped.

* Standard deviation is not unbiased, but maximum likelihood of normal distribution as shown below:

  :math:`\sqrt{\mbox{mean}((x - \bar{x})^2)}`

  where :math:`x` is a learning data, and :math:`\bar{x}` is the mean of x.
* In the running phase, the component returns transformed data, using mean ``mean`` and standard deviation ``std`` as shown below:

  :math:`(z -` ``mean`` :math:`)/` ``std``

  where :math:`z` is data, and missing and +/- Inf values in :math:`z` are invariant.

  .. note::
     If std = 0, standardized data is zero vector.
