==============================================
FABHMEBernGateLinearRg Component Specification
==============================================

.. contents:: Contents
    :local:

Overview
========
FABHMEBernGateLinearRg component is a linear regression component with FAB/HME algorithm.
This component learns a tree-structured model in which each sample is assigned to a component according to Bernoulli gating functions.

.. note::

    FAB engine uses the word 'component' with a different meaning from that of SAMPO.
    Each component in FAB/HME is a prediction formula, and each sample data is assigned to a specific component for prediction.

**Example**:

* SPD:

  .. code-block:: yaml

    # fabhmerg.spd
    dl1 -> std1 -> fab1

    ---

    components:
        dl1:
            component: DataLoader
        std1:
            component: StandardizeFDComponent
            features: scale == 'real' or scale == 'integer'
        fab1:
            component: FABHMEBernGateLinearRgComponent
            features: name != 'Concrete_compressive_strength_MPa'
            standardize_target: True
            target: name == 'Concrete_compressive_strength_MPa'
            tree_depth: 5
            shrink_threshold: 1.0%

    global_settings:
        keep_attributes:
            - Concrete_compressive_strength_MPa
        feature_exclude:
            - Concrete_compressive_strength_MPa

* Input of the component:


 +--------+-------------------------+-------------------------+------------------------+
 |   _sid | \std1_Superplasticizer_ | \std1_Coarse_Aggregate_ | \Concrete_compressive_ |
 |        | kg_in_a_m3_mixture      | kg_in_a_m3_mixture      | strength_MPa           |
 +========+=========================+=========================+========================+
 | 0      | 0.729738484             | 0.705292074             | 9                      |
 +--------+-------------------------+-------------------------+------------------------+
 | 1      | 1.413868312             | 1.434904563             | 9                      |
 +--------+-------------------------+-------------------------+------------------------+
 | 2      | -1.387806224            | -1.321409287            | 4                      |
 +--------+-------------------------+-------------------------+------------------------+
 | ...    | ...                     | ...                     | ...                    |
 +--------+-------------------------+-------------------------+------------------------+
 | 8      | -0.019546567            | 0.016213611             | 8                      |
 +--------+-------------------------+-------------------------+------------------------+
 | 9      | -0.736254006            | -0.835000961            | 6                      |
 +--------+-------------------------+-------------------------+------------------------+

 |

* Output of the component:

 +----------+-----------+---------------+--------------+---------------+------------------+
 |   _sid   | \fab1_    |  \fab1_       |   \fab1_     |   \fab1_      | \fab1_           |
 |          | actual    |  std_actual   |   predict    |   std_predict | assigned_comp_id |
 +==========+===========+===============+==============+===============+==================+
 | 0        | 9         | -4.686873e-01 | 1.193986e+01 | 3.613565e-01  | 2                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 1        | 9         | -4.686873e-01 | 1.487464e+01 | 1.189968e+00  | 2                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 2        | 4         | -1.880396e+00 | 5.744445e+00 | -1.387866e+00 | 0                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | ...      | ...       | ...           | ...          | ...           | ...              |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 8        | 8         | -7.510290e-01 | 9.614527e+00 | -2.951807e-01 | 0                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 9        | 6         | -1.315712e+00 | 7.587341e+00 | -8.675398e-01 | 0                |
 +----------+-----------+---------------+--------------+---------------+------------------+

 |

 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 |   _sid   |   \fab1_      |   \fab1_         |   \fab1_      |   \fab1_         |   \fab1_      |   \fab1_         |
 |          |   predict_c0  |   std_predict_c0 |   predict_c1  |   std_predict_c1 |   predict_c2  |   std_predict_c2 |
 +==========+===============+==================+===============+==================+===============+==================+
 | 0        |  1.173386e+01 |  3.031946e-01    |  1.285257e+01 |  6.190547e-01    |  1.193986e+01 |  3.613565e-01    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 1        |  1.366890e+01 |  8.495373e-01    |  1.503635e+01 |  1.235628e+00    |  1.487464e+01 |  1.189968e+00    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 2        |  5.744445e+00 | -1.387866e+00    |  6.786511e+00 | -1.093648e+00    |  3.787681e+00 | -1.940342e+00    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | ...      | ...           | ...              | ...           | ...              | ...           | ...              |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 8        |  9.614527e+00 | -2.951807e-01    |  1.079011e+01 |  3.673590e-02    |  9.168116e+00 | -4.212211e-01    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 9        |  7.587341e+00 | -8.675398e-01    |  8.242365e+00 | -6.825991e-01    |  5.744203e+00 | -1.387935e+00    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+

This component has component-specific external formats for model and prediction result evaluation.

.. seealso::

    Component-common external format files in :ref:`convert_process`

|

Parameters
==========
This component has the following component-specific parameters.

SPD
---

The following parameters are for "components" section of SPD.

.. list-table::
  :header-rows: 1
  :widths: 10, 5, 15, 10, 50

  * - Parameter Name
    - Type
    - Domain
    - Default Value
    - Description
  * - standardize_target
    - bool
    - True / False
    - False
    - If this parameter is True, the target attribute is standardized.
  * - max_fab_iterations
    - int
    - [1, inf)
    - 100
    - Maximum number of FAB-iterations.
  * - start_from_mstep [2]_ [3]_
    - bool
    - True / False
    - False
    - If True, the first iteration starts with M-step; otherwise, E-step.
  * - num_acceleration_steps
    - int
    - [0, inf)
    - 0
    - The number of steps of acceleration algorithm for each FAB-iteration.
      If 0, the acceleration algorithm is disabled.
  * - repeat_until_convergence
    - bool
    - True / False
    - False
    - If False, FAB-iterations and the post-processing are executed only once
      even if the FAB-iterations are stopped not by convergence condition but
      by ``max_fab_iterations`` condition.
  * - projection_estep
    - bool
    - True / False
    - False
    - Whether the projection E-step algorithm is enabled.
  * - shrink_threshold
    - float or str
    - [1, inf) or (0%, 100%)
    - 1.0
    - Threshold value for shrinkage. If a percentage value (e.g. ``'1.0%'``)
      is specified, shrinkage is executed according to relative value,
      :math:`N_{\rm scaled\_sample} \times t_{\rm shrink}` where
      :math:`t_{\rm shrink}` is the threshold value and :math:`N_{\rm scaled\_sample}`
      is the number of scaled expected samples.
  * - fab_stop_threshold
    - float or str
    - (0, inf) or (0%, inf%)
    - 0.001
    - Threshold value for FAB-iterations: if the increase of FIC value
      is less than the threshold, the FAB-iterations is considered to
      be converged. If a percentage value (e.g. ``'1.0%'``) is specified,
      convergence check is executed according to relative value,
      :math:`(FIC^{(t)} - FIC^{(t-1)}) / | FIC^{(t-1)} |`.
  * - gate_features
    - str
    - Query format
    - all()
    - Features which are applied to gate parameter optimizations.
      If not specified, all features are used.
  * - comp_features
    - str
    - Query format
    - all()
    - Features which are applied to component parameter optimizations.
      If not specified, all features are used.
      If empty, the model is learned as a decision tree.
  * - comp_mandatory_features
    - str
    - Query format
    - See Description
    - Features which non-L0-regularize constraints are applied to.
      It means the specified features will always be relevant for all components.
      If not specified, no features are specified for non-L0-regularization,
      which implies all relevant features are selected by FoBa algorithm.
  * - comp_positive_features
    - str
    - Query format
    - See Description
    - Features whose weight values for all components are constrained to positive values.
      If not specified, all features are optimized with no constraints.
  * - comp_negative_features
    - str
    - Query format
    - See Description
    - Features whose weight values for all components are constrained to negative values.
      If not specified, all features are optimized with no constraints.
  * - tree_depth [2]_ [3]_
    - int
    - [0, inf)
    - 5
    - Initial depth of the gate-tree structure of latent variable prior.
      The initial number of components is :math:`2^d` where :math:`d` is
      tree depth. If 0, the optimization with only one component will be
      executed.
  * - comp_weights_min_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - -0.5
    - Scale value for the initialization of weight values of components.
  * - comp_weights_max_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.5
    - Scale value for the initialization of weight values of components.
  * - comp_bias_min_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.25
    - Scale value for the initialization of bias values of components.
  * - comp_bias_max_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.75
    - Scale value for the initialization of bias values of components.
  * - comp_variance_min_scale [2]_ [3]_
    - float
    - (0, inf)
    - 0.1
    - Scale value for the initialization of variance values of components.
  * - comp_variance_max_scale [2]_ [3]_
    - float
    - (0, inf)
    - 0.25
    - Scale value for the initialization of variance values of components.
  * - gate_max_bins
    - int
    - [1, inf)
    - See Description
    - Maximum number of binning for each feature, which is used for gate
      parameter optimization. If not specified, all unique samples for each feature
      are used; otherwise, the equal-width binning algorithm is adopted.
  * - comp_foba_skip
    - str
    - {'power_of_two', 'quarter_square', 'none'}
    - 'power_of_two'
    - The judging function type for the FoBa algorithm skipping. If 'none',
      FoBa is executed for all FAB-iteration steps. FoBa is skipped at
      :math:`{\rm log}_{2}t \ne {\rm ceil}({\rm log}_{2}t)` if 'power_of_two',
      or :math:`t \bmod {\rm ceil}(\sqrt{t}) \ne 0` if 'quarter_square'.
      :math:`t` is FAB-iteration step index number starting from 1.
  * - comp_foba_skip_max_interval
    - int
    - [2, inf)
    - 25
    - The maximum interval for the FoBa algorithm skipping. If comp_foba_skip
      is 'none', this value is ignored.
  * - comp_two_stage_opt
    - bool
    - True / False
    - False
    - Whether the two-stage optimization is enabled.
      If True, the first stage performs the parameter optimization on
      user-specified mandatory features (``comp_mandatory_features``), and
      the second stage carries out parameter optimization to the residual of
      the first stage for only the relevant non-mandatory features.
  * - comp_backward_step
    - bool
    - True / False
    - False
    - Whether the backward-steps of FoBa algorithm are enabled. In the
      post-process, backward-steps are carried out regardless of this argument
      value.
  * - comp_l2_regularize
    - float
    - [0, inf)
    - 0.0
    - L2-regularization hyper-parameter for component parameter optimization.
      The larger the specified value, the stronger the regularization effect
      is. If 0.0, L2-regularization is disabled.
  * - with_comp_scaled_l0_regularize
    - bool
    - True / False
    - True
    - Whether with scaled L0-regularization using a tighter lower bound of
      FIC for component parameter optimization; approximation of det(F) is
      refined, where F is a Fisher matrix.
  * - max_comp_relevant_features
    - int
    - [1, inf)
    - 100
    - Maximum number of the relevant features for each component.
  * - max_comp_foba_iterations
    - int
    - [1, inf)
    - 100
    - Maximum number of the FoBa-iterations for each component.
  * - num_threads_gates
    - int
    - [1, inf)
    - 1
    - Maximum number of OpenMP threads of gate parameter optimization where
      tasks for all gates are divided into.
  * - num_threads_gate_features
    - int
    - [1, inf)
    - 1
    - Maximum number of OpenMP threads of gate parameter optimization where
      tasks for all features are divided into.
  * - num_threads_comps
    - int
    - [1, inf)
    - 1
    - Maximum number of OpenMP threads of component parameter optimization.

.. [2] Ignore parameter in posterior hot-start
.. [3] Ignore parameter in model hot-start

SRC
---

The following parameters is for "hotstart" section of SRC.

.. list-table::
  :header-rows: 1
  :widths: 10, 5, 15, 10, 50

  * - Parameter Name
    - Type
    - Domain
    - Default Value
    - Description
  * - type
    - str
    - {'posterior', 'mh_refit_comp', 'mh_opt_comp', 'mh_refit_gate_and_refit_comp', 'mh_refit_gate_and_opt_comp', 'mh_opt_gate_and_opt_comp'}
    - 
    - The hot-start type. If 'posterior', FAB learns with posterior hot-start which use the
      initial model whose tree structure is generated by base model and data. Each gate and
      component parameters are initialized randomly. 'mh_XXX' means FAB learns with model
      hot-start which uses base model as initial model. 'refit_{gate, comp}' means refitting the
      gate functions or prediction formulas with current data. 'opt_{gate, comp}' means optimizing
      (feature selection and fitting) the gate functions or prediction formulas with current data.

|

Utilizable Sample Metadata
==========================
.. warning::

   _fabhme_assigned_comp_id is deprecated. Use hotstart section of SRC instead of _fabhme_assigned_comp_id data column.

This component can utilize the _fabhme_assigned_comp_id attribute of the sample metadata to posterior hot-start.
When the attribute _fabhme_assigned_comp_id attribute is specified in the input data,
this component will start the FAB/HME algorithm with the _fabhme_assigned_comp_id attribute values as its initial posterior.

To create the attribute _fabhme_assigned_comp_id, see the specification of the command sampo_ps_fabhme export_assigned_comp_id.

|

Output Attributes
=================

.. include:: ./fabhme/linear_rg_output_attributes.rst

These attributes are in the component output data. These can be loaded in SAMPO API.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_.

When :ref:`convert_process` is executed,
the component output data will be saved in *<component_id>*\_predict_result.csv.

.. include:: ./fabhme/rg_predict_result.rst
|

Attribute Metadata
==================

.. include:: ./fabhme/linear_rg_attr_metadata.rst

|

Model
=====

.. include:: ./fabhme/linear_rg_model_params.rst
.. include:: ./fabhme/bern_gate_tree_keys.rst

When the model is loaded in the SAMPO API, the model parameters will be output as a single dictionary.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_

::

    {'fic': -113.32992684082878,
     'num_initial_comps': 8,
     'num_active_comps': 7,
     'standardize_mean': 1.1303215277777777e+04,
     'standardize_std': 5.7343353765366674e+03,
     'gate_tree':
         {'gate_type': 'bern',
          'hard_gate': True,
          'nodes': [
              {'node_type': 'gate',
               'node_id': 0,
               'gate_func':
                   {'threshold': 1.0926798109856788,
                    'aid': 'std1[6]',
                    'attr_name': 'std1_Fine_Aggregate_kg_in_a_m3_mixture',
                    'prob_left_smaller_than_threshold': 1.0}},
              {'node_type': 'gate',
               'node_id': 1,
               'gate_func':
                   {'threshold': -0.3327413962726527,
                    'aid': 'std1[6]',
                    'attr_name': 'std1_Fine_Aggregate_kg_in_a_m3_mixture',
                    'prob_left_smaller_than_threshold': 0.0}},
              {'node_type': 'gate',
               'node_id': 2,
               'gate_func':
                   {'threshold': -0.9288885581274996,
                    'aid': 'std1[0]',
                    'attr_name': 'std1_Cement_kg_in_a_m3_mixture',
                    'prob_left_smaller_than_threshold': 1.0}},
              {'node_type': 'gate',
               'node_id': 5,
               'gate_func':
                   {'threshold': -1.8014738709189047,
                    'aid': 'std1[0]',
                    'attr_name': 'std1_Cement_kg_in_a_m3_mixture',
                    'prob_left_smaller_than_threshold': 0.0}},
              {'node_type': 'gate',
               'node_id': 8,
               'gate_func':
                   {'threshold': -0.995668911902564,
                    'aid': 'std1[5]',
                    'attr_name': 'std1_Coarse_Aggregate_kg_in_a_m3_mixture',
                    'prob_left_smaller_than_threshold': 1.0}},
              {'node_type': 'gate',
               'node_id': 10,
               'gate_func':
                   {'threshold': 0.03525594329899589,
                    'aid': 'std1[1]',
                    'attr_name': 'std1_Blast_Furnace_Slag_kg_in_a_m3_mixture',
                    'prob_left_smaller_than_threshold': 1.0}},
              {'comp_id': 0, 'node_type': 'component', 'node_id': 3},
              {'comp_id': 1, 'node_type': 'component', 'node_id': 4},
              {'comp_id': 2, 'node_type': 'component', 'node_id': 6},
              {'comp_id': 3, 'node_type': 'component', 'node_id': 7},
              {'comp_id': 5, 'node_type': 'component', 'node_id': 9},
              {'comp_id': 6, 'node_type': 'component', 'node_id': 11},
              {'comp_id': 7, 'node_type': 'component', 'node_id': 12}],
          'edges': [
              {'source': 10, 'target': 11, 'is_left': True},
              {'source': 10, 'target': 12, 'is_left': False},
              {'source': 1, 'target': 2, 'is_left': True},
              {'source': 1, 'target': 5, 'is_left': False},
              {'source': 0, 'target': 1, 'is_left': True},
              {'source': 0, 'target': 8, 'is_left': False},
              {'source': 2, 'target': 3, 'is_left': True},
              {'source': 2, 'target': 4, 'is_left': False},
              {'source': 5, 'target': 7, 'is_left': False},
              {'source': 5, 'target': 6, 'is_left': True},
              {'source': 8, 'target': 9, 'is_left': True},
              {'source': 8, 'target': 10, 'is_left': False}]}},
     'prediction_formulas':
                                                      prediction_formula_0  prediction_formula_1  prediction_formula_2  prediction_formula_3  prediction_formula_5  prediction_formula_6  prediction_formula_7
         attr_name
         std1_Cement_kg_in_a_m3_mixture                           0.000000             -0.842738              0.890938              0.000000              0.000000              0.000000              0.000000
         std1_Blast_Furnace_Slag_kg_in_a_m3_mixture               0.000000             -0.925857              0.000000              0.000000              0.000000              0.000000              0.000000
         std1_Fly_Ash_kg_in_a_m3_mixture                          0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
         std1_Water_kg_in_a_m3_mixture                            0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
         std1_Superplasticizer_kg_in_a_m3_mixture                 0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
         std1_Coarse_Aggregate_kg_in_a_m3_mixture                 0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
         std1_Fine_Aggregate_kg_in_a_m3_mixture                   0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
         bias                                                     0.141556              0.004476             -0.729938             -1.092049              0.045834              1.169998              0.889939
         variance                                                 0.413225              0.004247              0.505680              0.000000              0.061580              0.019360              0.000000}


External Format
---------------
When :ref:`convert_process` is executed,
the model parameters are saved into different files and are grouped as: general information,
gating function, and prediction formula.

General Information
```````````````````
This file describes :math:`FIC` after learning the model, initial number of components, and the terminal number of components.

::

    fic,num_initial_comps,num_active_comps
    -1.294308e+02,8,3

Gate Tree
`````````

.. include:: ./fabhme/model_bern_gate_tree.rst

Prediction Formulas
```````````````````

.. include:: ./fabhme/model_linear_rg_prediction_formulas.rst

|

Prediction Result Evaluation
============================

.. include:: ./fabhme/rg_predict_result_evaluation_indices.rst

When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded
with the evaluation indices as the columns of the DataFrame.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_

External Format
---------------
When :ref:`convert_process` is executed, the evaluation
results are saved as a CSV file with the evaluation indices as the header of the CSV.

.. include:: ./fabhme/rg_predict_result_evaluation.rst

|

Details
=======
If a data set has samples with missing or +/-Inf values, this component ignores those samples.
