===============================================
FABHMELogitGateLinearRg Component Specification
===============================================

.. contents:: Contents
    :local:

Overview
========
FABHMELogitGateLinearRg component is a linear regression component with FAB/HME algorithm.
This component learns a tree-structured model in which each sample is assigned to a component according to Logistic gating functions.

.. note::

    FAB engine uses the word 'component' with a different meaning from that of SAMPO.
    Each component in FAB/HME is a prediction formula, and each sample data is assigned to a specific component for prediction.

**Example**:

* SPD:

  .. code-block:: yaml

    # fabhmerg.spd
    dl1 -> std1 -> fab1

    ---

    components:
        dl1:
            component: DataLoader
        std1:
            component: StandardizeFDComponent
            features: scale == 'real' or scale == 'integer'
        fab1:
            component: FABHMELogitGateLinearRgComponent
            features: name != 'Concrete_compressive_strength_MPa'
            standardize_target: True
            target: name == 'Concrete_compressive_strength_MPa'
            tree_depth: 5
            shrink_threshold: 1.0%

    global_settings:
        keep_attributes:
            - Concrete_compressive_strength_MPa
        feature_exclude:
            - Concrete_compressive_strength_MPa

* Input of the component:


 +--------+-------------------------+-------------------------+------------------------+
 |   _sid | \std1_Superplasticizer_ | \std1_Coarse_Aggregate_ | \Concrete_compressive_ |
 |        | kg_in_a_m3_mixture      | kg_in_a_m3_mixture      | strength_MPa           |
 +========+=========================+=========================+========================+
 | 0      | 0.729738484             | 0.705292074             | 9                      |
 +--------+-------------------------+-------------------------+------------------------+
 | 1      | 1.413868312             | 1.434904563             | 9                      |
 +--------+-------------------------+-------------------------+------------------------+
 | 2      | -1.387806224            | -1.321409287            | 4                      |
 +--------+-------------------------+-------------------------+------------------------+
 | ...    | ...                     | ...                     | ...                    |
 +--------+-------------------------+-------------------------+------------------------+
 | 8      | -0.019546567            | 0.016213611             | 8                      |
 +--------+-------------------------+-------------------------+------------------------+
 | 9      | -0.736254006            | -0.835000961            | 6                      |
 +--------+-------------------------+-------------------------+------------------------+

 |

* Output of the component:

 +----------+-----------+---------------+--------------+---------------+------------------+
 |   _sid   | \fab1_    |  \fab1_       |   \fab1_     |   \fab1_      | \fab1_           |
 |          | actual    |  std_actual   |   predict    |   std_predict | assigned_comp_id |
 +==========+===========+===============+==============+===============+==================+
 | 0        | 9         | -4.686873e-01 | 1.193986e+01 | 3.613565e-01  | 2                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 1        | 9         | -4.686873e-01 | 1.487464e+01 | 1.189968e+00  | 2                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 2        | 4         | -1.880396e+00 | 5.744445e+00 | -1.387866e+00 | 0                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | ...      | ...       | ...           | ...          | ...           | ...              |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 8        | 8         | -7.510290e-01 | 9.614527e+00 | -2.951807e-01 | 0                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 9        | 6         | -1.315712e+00 | 7.587341e+00 | -8.675398e-01 | 0                |
 +----------+-----------+---------------+--------------+---------------+------------------+

 |

 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 |   _sid   |   \fab1_      |   \fab1_         |   \fab1_      |   \fab1_         |   \fab1_      |   \fab1_         |
 |          |   predict_c0  |   std_predict_c0 |   predict_c1  |   std_predict_c1 |   predict_c2  |   std_predict_c2 |
 +==========+===============+==================+===============+==================+===============+==================+
 | 0        |  1.173386e+01 |  3.031946e-01    |  1.285257e+01 |  6.190547e-01    |  1.193986e+01 |  3.613565e-01    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 1        |  1.366890e+01 |  8.495373e-01    |  1.503635e+01 |  1.235628e+00    |  1.487464e+01 |  1.189968e+00    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 2        |  5.744445e+00 | -1.387866e+00    |  6.786511e+00 | -1.093648e+00    |  3.787681e+00 | -1.940342e+00    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | ...      | ...           | ...              | ...           | ...              | ...           | ...              |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 8        |  9.614527e+00 | -2.951807e-01    |  1.079011e+01 |  3.673590e-02    |  9.168116e+00 | -4.212211e-01    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 9        |  7.587341e+00 | -8.675398e-01    |  8.242365e+00 | -6.825991e-01    |  5.744203e+00 | -1.387935e+00    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+

This component has component-specific external formats for model and prediction result evaluation.

.. seealso::

    Component-common external format files in :ref:`convert_process`

|

Parameters
==========
This component has the following component-specific parameters.

SPD
---

The following parameters are for "components" section of SPD.

.. list-table::
  :header-rows: 1
  :widths: 10, 5, 15, 10, 50

  * - Parameter Name
    - Type
    - Domain
    - Default Value
    - Description
  * - standardize_target
    - bool
    - True / False
    - False
    - If this parameter is True, the target attribute is standardized.
  * - max_fab_iterations
    - int
    - [1, inf)
    - 100
    - Maximum number of FAB-iterations.
  * - start_from_mstep [2]_ [3]_
    - bool
    - True / False
    - False
    - If True, the first iteration starts with M-step; otherwise, E-step.
  * - num_acceleration_steps
    - int
    - [0, inf)
    - 0
    - The number of steps of acceleration algorithm for each FAB-iteration.
      If 0, the acceleration algorithm is disabled.
  * - repeat_until_convergence
    - bool
    - True / False
    - False
    - If False, FAB-iterations and the post-processing are executed only once
      even if the FAB-iterations are stopped not by convergence condition but
      by ``max_fab_iterations`` condition.
  * - projection_estep
    - bool
    - True / False
    - False
    - Whether the projection E-step algorithm is enabled.
  * - shrink_threshold
    - float or str
    - [1, inf) or (0%, 100%)
    - 1.0
    - Threshold value for shrinkage. If a percentage value (e.g. ``'1.0%'``)
      is specified, shrinkage is executed according to relative value,
      :math:`N_{\rm scaled\_sample} \times t_{\rm shrink}` where
      :math:`t_{\rm shrink}` is the threshold value and :math:`N_{\rm scaled\_sample}`
      is the number of scaled expected samples.
  * - fab_stop_threshold
    - float or str
    - (0, inf) or (0%, inf%)
    - 0.001
    - Threshold value for FAB-iterations: if the increase of FIC value
      is less than the threshold, the FAB-iterations is considered to
      be converged. If a percentage value (e.g. ``'1.0%'``) is specified,
      convergence check is executed according to relative value,
      :math:`(FIC^{(t)} - FIC^{(t-1)}) / | FIC^{(t-1)} |`.
  * - gate_features
    - str
    - Query format
    - all()
    - Features which are applied to gate parameter optimizations.
      If not specified, all features are used.
  * - comp_features
    - str
    - Query format
    - all()
    - Features which are applied to component parameter optimizations.
      If not specified, all features are used.
      If empty, the model is learned as a decision tree.
  * - comp_mandatory_features
    - str
    - Query format
    - See Description
    - Features which non-L0-regularize constraints are applied to.
      It means the specified features will always be relevant for all components.
      If not specified, no features are specified for non-L0-regularization,
      which implies all relevant features are selected by FoBa algorithm.
  * - comp_positive_features
    - str
    - Query format
    - See Description
    - Features whose weight values for all components are constrained to positive values.
      If not specified, all features are optimized with no constraints.
  * - comp_negative_features
    - str
    - Query format
    - See Description
    - Features whose weight values for all components are constrained to negative values.
      If not specified, all features are optimized with no constraints.
  * - tree_depth [2]_ [3]_
    - int
    - [0, inf)
    - 5
    - Initial depth of the gate-tree structure of latent variable prior.
      The initial number of components is :math:`2^d` where :math:`d` is
      tree depth. If 0, the optimization with only one component will be
      executed.
  * - comp_weights_min_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - -0.5
    - Scale value for the initialization of weight values of components.
  * - comp_weights_max_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.5
    - Scale value for the initialization of weight values of components.
  * - comp_bias_min_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.25
    - Scale value for the initialization of bias values of components.
  * - comp_bias_max_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.75
    - Scale value for the initialization of bias values of components.
  * - comp_variance_min_scale [2]_ [3]_
    - float
    - (0, inf)
    - 0.1
    - Scale value for the initialization of variance values of components.
  * - comp_variance_max_scale [2]_ [3]_
    - float
    - (0, inf)
    - 0.25
    - Scale value for the initialization of variance values of components.
  * - gate_l2_regularize
    - float
    - [0, inf)
    - 0.0
    - L2-regularization hyper-parameter for gate-parameter optimization.
      The larger the specified value, the stronger the regularization effect is.
      If 0.0, L2-regularization is disabled.
  * - with_gate_scaled_l0_regularize
    - bool
    - True / False
    - True
    - Whether with scaled L0-regularization using a tighter lower bound of
      FIC for gate parameter optimization; approximation of det(F) is
      refined, where F is a Fisher matrix.
  * - max_gate_relevant_features
    - int
    - [1, inf)
    - 3
    - Maximum number of the relevant features for each gate.
  * - comp_foba_skip
    - str
    - {'power_of_two', 'quarter_square', 'none'}
    - 'power_of_two'
    - The judging function type for the FoBa algorithm skipping. If 'none',
      FoBa is executed for all FAB-iteration steps. FoBa is skipped at
      :math:`{\rm log}_{2}t \ne {\rm ceil}({\rm log}_{2}t)` if 'power_of_two', or
      :math:`t \bmod {\rm ceil}(\sqrt{t}) \ne 0` if 'quarter_square'.
      :math:`t` is FAB-iteration step index (:math:`t` starts from 1).
  * - comp_foba_skip_max_interval
    - int
    - [2, inf)
    - 25
    - The maximum interval for the FoBa algorithm skipping. If comp_foba_skip is 'none',
      this value is ignored.
  * - comp_two_stage_opt
    - bool
    - True / False
    - False
    - Whether the two-stage optimization is enabled.
      If True, the first stage performs the parameter optimization on
      user-specified mandatory features (``comp_mandatory_features``), and
      the second stage carries out parameter optimization to the residual of
      the first stage for only the relevant non-mandatory features.
  * - comp_backward_step
    - bool
    - True / False
    - False
    - Whether the backward-steps of FoBa algorithm are enabled. In the
      post-process, backward-steps are carried out regardless of this argument
      value.
  * - comp_l2_regularize
    - float
    - [0, inf)
    - 0.0
    - L2-regularization hyper-parameter for component parameter optimization.
      The larger the specified value, the stronger the regularization effect
      is. If 0.0, L2-regularization is disabled.
  * - with_comp_scaled_l0_regularize
    - bool
    - True / False
    - True
    - Whether with scaled L0-regularization using a tighter lower bound of
      FIC for component parameter optimization; approximation of det(F) is
      refined, where F is a Fisher matrix.
  * - max_comp_relevant_features
    - int
    - [1, inf)
    - 100
    - Maximum number of the relevant features for each component.
  * - max_comp_foba_iterations
    - int
    - [1, inf)
    - 100
    - Maximum number of the FoBa-iterations for each component.
  * - num_threads_gates
    - int
    - [1, inf)
    - 1
    - Maximum number of OpenMP threads of gate parameter optimization where
      tasks for all gates are divided into.
  * - num_threads_comps
    - int
    - [1, inf)
    - 1
    - Maximum number of OpenMP threads of component parameter optimization.

.. [2] Ignore parameter in posterior hot-start
.. [3] Ignore parameter in model hot-start

SRC
---

The following parameter is for "hotstart" section of SRC.

.. list-table::
  :header-rows: 1
  :widths: 10, 5, 15, 10, 50

  * - Parameter Name
    - Type
    - Domain
    - Default Value
    - Description
  * - type
    - str
    - {'posterior', 'mh_refit_comp', 'mh_opt_comp', 'mh_refit_gate_and_refit_comp', 'mh_refit_gate_and_opt_comp', 'mh_opt_gate_and_opt_comp'}
    - 
    - The hot-start type. If 'posterior', FAB learns with posterior hot-start which use the
      initial model whose tree structure is generated by base model and data. Each gate and
      component parameters are initialized randomly. 'mh_XXX' means FAB learns with model
      hot-start which uses base model as initial model. 'refit_{gate, comp}' means refitting the
      gate functions or prediction formulas with current data. 'opt_{gate, comp}' means optimizing
      (feature selection and fitting) the gate functions or prediction formulas with current data.


|

Utilizable Sample Metadata
==========================
.. warning::

   _fabhme_assigned_comp_id is deprecated. Use hotstart section of SRC instead of _fabhme_assigned_comp_id data column.

This component can utilize the _fabhme_assigned_comp_id attribute of the sample metadata to hot-start with posterior.
When the attribute _fabhme_assigned_comp_id attribute is specified in the input data,
this component will start the FAB/HME algorithm with the _fabhme_assigned_comp_id attribute values as its initial posterior.

To create the attribute _fabhme_assigned_comp_id, see the specification of the command sampo_ps_fabhme export_assigned_comp_id.

|

Output Attributes
=================

.. include:: ./fabhme/linear_rg_output_attributes.rst

These attributes are in the component output data. These can be loaded in SAMPO API.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_.

When :ref:`convert_process` is executed,
the component output data will be saved in *<component_id>*\_predict_result.csv.

.. include:: ./fabhme/rg_predict_result.rst
|

Attribute Metadata
==================

.. include:: ./fabhme/linear_rg_attr_metadata.rst

|

Model
=====

.. include:: ./fabhme/linear_rg_model_params.rst
.. include:: ./fabhme/logit_gate_tree_keys.rst

When the model is loaded in the SAMPO API, the model parameters will be output as a single dictionary.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_

::

    {'fic': -51.39944300304459,
     'num_active_comps': 3,
     'num_initial_comps': 4,
     'standardize_mean': 1.1303215277777777e+04,
     'standardize_std': 5.7343353765366674e+03,
     'gate_tree':
         {'gate_type': 'logit',
          'hard_gate': True,
          'nodes': [
              {'node_type': 'gate',
               'node_id': 1,
               'gate_func':
                   {'bias': 5.111062768521956,
                    'weights': [
                        {'aid': 'std1[4]', 'attr_name': 'std1_Superplasticizer_kg_in_a_m3_mixture', 'weight': -4.383147346299667},
                        {'aid': 'std1[6]', 'attr_name': 'std1_Fine_Aggregate_kg_in_a_m3_mixture', 'weight': 10.213844035316507}]}},
              {'node_type': 'gate',
               'node_id': 0,
               'gate_func':
                   {'bias': -2.5521447932552697,
                    'weights': [
                        {'aid': 'std1[0]', 'attr_name': 'std1_Cement_kg_in_a_m3_mixture', 'weight': -11.13428640672036},
                        {'aid': 'std1[1]', 'attr_name': 'std1_Blast_Furnace_Slag_kg_in_a_m3_mixture', 'weight': -8.404401460418903}]}},
              {'comp_id': 1, 'node_type': 'component', 'node_id': 3},
              {'comp_id': 0, 'node_type': 'component', 'node_id': 2},
              {'comp_id': 3, 'node_type': 'component', 'node_id': 4}],
          'edges': [
              {'source': 1, 'target': 3, 'is_left': False},
              {'source': 1, 'target': 2, 'is_left': True},
              {'source': 0, 'target': 1, 'is_left': True},
              {'source': 0, 'target': 4, 'is_left': False}]}},
     'prediction_formulas':
                                                      prediction_formula_0  prediction_formula_1  prediction_formula_3
         attr_name
         std1_Cement_kg_in_a_m3_mixture                           0.000000              0.000000              0.000000
         std1_Blast_Furnace_Slag_kg_in_a_m3_mixture              -0.255175              0.000000             -0.373861
         std1_Fly_Ash_kg_in_a_m3_mixture                          0.000000              0.000000              0.000000
         std1_Water_kg_in_a_m3_mixture                            0.000000              0.000000              0.000000
         std1_Superplasticizer_kg_in_a_m3_mixture                 0.000000              0.000000              0.000000
         std1_Coarse_Aggregate_kg_in_a_m3_mixture                 0.000000              0.000000              0.000000
         std1_Fine_Aggregate_kg_in_a_m3_mixture                   0.236025              0.000000              0.000000
         bias                                                     0.594811             -1.915292             -0.149098
         variance                                                 0.011392              0.614541              0.492770}


External Format
---------------
When :ref:`convert_process` is executed,
the model parameters are saved into different files and are grouped as: general information,
gating function, and prediction formula.

General Information
```````````````````
This file describes :math:`FIC` after learning the model, initial number of components, and the terminal number of components.

::

    fic,num_initial_comps,num_active_comps
    -1.294308e+02,8,3

Gate Tree
`````````

.. include:: ./fabhme/model_logit_gate_tree.rst

Prediction Formulas
```````````````````

.. include:: ./fabhme/model_linear_rg_prediction_formulas.rst

|

Prediction Result Evaluation
============================

.. include:: ./fabhme/rg_predict_result_evaluation_indices.rst

When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded
with the evaluation indices as the columns of the DataFrame.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_

External Format
---------------
When :ref:`convert_process` is executed, the evaluation results
are saved as a CSV file with the evaluation indices as the header of the CSV.

.. include:: ./fabhme/rg_predict_result_evaluation.rst

|

Details
=======
If a data set has samples with missing or +/-Inf values, this component ignores those samples.
