================================================
FABHMELogitGateBSplineRg Component Specification
================================================

.. contents:: Contents
    :local:

Overview
========
FABHMELogitGateBSplineRg component is a B-spline non-linear regression component with FAB/HME algorithm.
This component learns a tree-structured model in which each sample is assigned to a component according to Logistic gating functions.

.. note::

    FAB engine uses the word 'component' with a different meaning from that of SAMPO.
    Each component in FAB/HME is a prediction formula, and each sample data is assigned to a specific component for prediction.

**Example**:

* SPD:

  .. code-block:: yaml

    # fabhmerg.spd
    dl1 -> std1 -> fab1

    ---

    components:
        dl1:
            component: DataLoader
        std1:
            component: StandardizeFDComponent
            features: scale == 'real' or scale == 'integer'
        fab1:
            component: FABHMELogitGateBSplineRgComponent
            features: name != 'Concrete_compressive_strength_MPa'
            standardize_target: True
            target: name == 'Concrete_compressive_strength_MPa'
            tree_depth: 5
            shrink_threshold: 1.0%

    global_settings:
        keep_attributes:
            - Concrete_compressive_strength_MPa
        feature_exclude:
            - Concrete_compressive_strength_MPa

* Input of the component:


 +--------+-------------------------+-------------------------+------------------------+
 |   _sid | \std1_Superplasticizer_ | \std1_Coarse_Aggregate_ | \Concrete_compressive_ |
 |        | kg_in_a_m3_mixture      | kg_in_a_m3_mixture      | strength_MPa           |
 +========+=========================+=========================+========================+
 | 0      | 0.729738484             | 0.705292074             | 9                      |
 +--------+-------------------------+-------------------------+------------------------+
 | 1      | 1.413868312             | 1.434904563             | 9                      |
 +--------+-------------------------+-------------------------+------------------------+
 | 2      | -1.387806224            | -1.321409287            | 4                      |
 +--------+-------------------------+-------------------------+------------------------+
 | ...    | ...                     | ...                     | ...                    |
 +--------+-------------------------+-------------------------+------------------------+
 | 8      | -0.019546567            | 0.016213611             | 8                      |
 +--------+-------------------------+-------------------------+------------------------+
 | 9      | -0.736254006            | -0.835000961            | 6                      |
 +--------+-------------------------+-------------------------+------------------------+

 |

* Output of the component:

 +----------+-----------+---------------+--------------+---------------+------------------+
 |   _sid   | \fab1_    |  \fab1_       |   \fab1_     |   \fab1_      | \fab1_           |
 |          | actual    |  std_actual   |   predict    |   std_predict | assigned_comp_id |
 +==========+===========+===============+==============+===============+==================+
 | 0        | 9         | -4.686873e-01 | 1.193986e+01 | 3.613565e-01  | 2                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 1        | 9         | -4.686873e-01 | 1.487464e+01 | 1.189968e+00  | 2                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 2        | 4         | -1.880396e+00 | 5.744445e+00 | -1.387866e+00 | 0                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | ...      | ...       | ...           | ...          | ...           | ...              |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 8        | 8         | -7.510290e-01 | 9.614527e+00 | -2.951807e-01 | 0                |
 +----------+-----------+---------------+--------------+---------------+------------------+
 | 9        | 6         | -1.315712e+00 | 7.587341e+00 | -8.675398e-01 | 0                |
 +----------+-----------+---------------+--------------+---------------+------------------+

 |

 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 |   _sid   |   \fab1_      |   \fab1_         |   \fab1_      |   \fab1_         |   \fab1_      |   \fab1_         |
 |          |   predict_c0  |   std_predict_c0 |   predict_c1  |   std_predict_c1 |   predict_c2  |   std_predict_c2 |
 +==========+===============+==================+===============+==================+===============+==================+
 | 0        |  1.173386e+01 |  3.031946e-01    |  1.285257e+01 |  6.190547e-01    |  1.193986e+01 |  3.613565e-01    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 1        |  1.366890e+01 |  8.495373e-01    |  1.503635e+01 |  1.235628e+00    |  1.487464e+01 |  1.189968e+00    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 2        |  5.744445e+00 | -1.387866e+00    |  6.786511e+00 | -1.093648e+00    |  3.787681e+00 | -1.940342e+00    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | ...      | ...           | ...              | ...           | ...              | ...           | ...              |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 8        |  9.614527e+00 | -2.951807e-01    |  1.079011e+01 |  3.673590e-02    |  9.168116e+00 | -4.212211e-01    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+
 | 9        |  7.587341e+00 | -8.675398e-01    |  8.242365e+00 | -6.825991e-01    |  5.744203e+00 | -1.387935e+00    |
 +----------+---------------+------------------+---------------+------------------+---------------+------------------+

This component has component-specific external formats for model and prediction result evaluation.

.. seealso::

    Component-common external format files in :ref:`convert_process`

|

Parameters
==========
This component has the following component-specific parameters.

SPD
---

The following parameters are for "components" section of SPD.

.. list-table::
  :header-rows: 1
  :widths: 10, 5, 15, 10, 50

  * - Parameter Name
    - Type
    - Domain
    - Default Value
    - Description
  * - standardize_target
    - bool
    - True / False
    - False
    - If this parameter is True, the target attribute is standardized.
  * - max_fab_iterations
    - int
    - [1, inf)
    - 100
    - Maximum number of FAB-iterations.
  * - start_from_mstep [2]_ [3]_
    - bool
    - True / False
    - False
    - If True, the first iteration starts with M-step; otherwise, E-step.
  * - num_acceleration_steps
    - int
    - [0, inf)
    - 0
    - The number of steps of acceleration algorithm for each FAB-iteration.
      If 0, the acceleration algorithm is disabled.
  * - repeat_until_convergence
    - bool
    - True / False
    - False
    - If False, FAB-iterations and the post-processing are executed only once
      even if the FAB-iterations are stopped not by convergence condition but
      by ``max_fab_iterations`` condition.
  * - projection_estep
    - bool
    - True / False
    - False
    - Whether the projection E-step algorithm is enabled.
  * - shrink_threshold
    - float or str
    - [1, inf) or (0%, 100%)
    - 1.0
    - Threshold value for shrinkage. If a percentage value (e.g. ``'1.0%'``)
      is specified, shrinkage is executed according to relative value,
      :math:`N_{\rm scaled\_sample} \times t_{\rm shrink}` where
      :math:`t_{\rm shrink}` is the threshold value and :math:`N_{\rm scaled\_sample}`
      is the number of scaled expected samples.
  * - fab_stop_threshold
    - float or str
    - (0, inf) or (0%, inf%)
    - 0.001
    - Threshold value for FAB-iterations: if the increase of FIC value
      is less than the threshold, the FAB-iterations is considered to
      be converged. If a percentage value (e.g. ``'1.0%'``) is specified,
      convergence check is executed according to relative value,
      :math:`(FIC^{(t)} - FIC^{(t-1)}) / | FIC^{(t-1)} |`.
  * - gate_features
    - str
    - Query format
    - all()
    - Features which are applied to gate parameter optimizations.
      If not specified, all features are used.
  * - comp_features
    - str
    - Query format
    - all()
    - Features which are applied to component parameter optimizations.
      If not specified, all features are used.
      If empty, the model is learned as a decision tree.
  * - comp_mandatory_features
    - str
    - Query format
    - See Description
    - Features which non-L0-regularize constraints are applied to.
      It means the specified features will always be relevant for all components.
      If not specified, no features are specified for non-L0-regularization,
      which implies all relevant features are selected by FoBa algorithm.
  * - tree_depth [2]_ [3]_
    - int
    - [0, inf)
    - 5
    - Initial depth of the gate-tree structure of latent variable prior.
      The initial number of components is :math:`2^d` where :math:`d` is
      tree depth. If 0, the optimization with only one component will be
      executed.
  * - comp_bspline_degree [3]_
    - int
    - [0, inf)
    - 3
    - Degree of B-spline function.
  * - comp_bspline_basis_dim [3]_
    - int
    - [4, inf)
    - 10
    - The number of B-spline basis functions to be generated for each feature.
  * - comp_weights_min_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - -0.5
    - Scale value for the initialization of weight values of components.
  * - comp_weights_max_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.5
    - Scale value for the initialization of weight values of components.
  * - comp_bias_min_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.25
    - Scale value for the initialization of bias values of components.
  * - comp_bias_max_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.75
    - Scale value for the initialization of bias values of components.
  * - comp_variance_min_scale [2]_ [3]_
    - float
    - (0, inf)
    - 0.1
    - Scale value for the initialization of variance values of components.
  * - comp_variance_max_scale [2]_ [3]_
    - float
    - (0, inf)
    - 0.25
    - Scale value for the initialization of variance values of components.
  * - gate_l2_regularize
    - float
    - [0, inf)
    - 0.0
    - L2-regularization hyper-parameter for gate-parameter optimization.
      The larger the specified value, the stronger the regularization effect is.
      If 0.0, L2-regularization is disabled.
  * - with_gate_scaled_l0_regularize
    - bool
    - True / False
    - True
    - Whether with scaled L0-regularization using a tighter lower bound of
      FIC for gate parameter optimization; approximation of det(F) is
      refined, where F is a Fisher matrix.
  * - max_gate_relevant_features
    - int
    - [1, inf)
    - 3
    - Maximum number of the relevant features for each gate.
  * - comp_l2_regularize
    - float
    - [0, inf)
    - 0.0
    - L2-regularization hyper-parameter for component parameter optimization.
      The larger the specified value, the stronger the regularization effect
      is. If 0.0, L2-regularization is disabled.
  * - comp_pspline
    - float
    - [0, inf)
    - 1.0
    - L2-regularization coefficient value for penalized B-spline function
      (P-spline).
  * - with_comp_scaled_l0_regularize
    - bool
    - True / False
    - True
    - Whether with scaled L0-regularization using a tighter lower bound of
      FIC for component parameter optimization; approximation of det(F) is
      refined, where F is a Fisher matrix.
  * - max_comp_relevant_features
    - int
    - [1, inf)
    - 100
    - Maximum number of the relevant features for each component.
  * - num_threads_gates
    - int
    - [1, inf)
    - 1
    - Maximum number of OpenMP threads of gate parameter optimization where
      tasks for all gates are divided into.
  * - num_threads_comps
    - int
    - [1, inf)
    - 1
    - Maximum number of OpenMP threads of component parameter optimization.

.. [2] Ignore parameter in posterior hot-start
.. [3] Ignore parameter in model hot-start

SRC
---

The following parameter is for "hotstart" section of SRC.

.. list-table::
  :header-rows: 1
  :widths: 10, 5, 15, 10, 50

  * - Parameter Name
    - Type
    - Domain
    - Default Value
    - Description
  * - type
    - str
    - {'posterior', 'mh_refit_comp', 'mh_opt_comp', 'mh_refit_gate_and_refit_comp', 'mh_refit_gate_and_opt_comp', 'mh_opt_gate_and_opt_comp'}
    - 
    - The hot-start type. If 'posterior', FAB learns with posterior hot-start which use the
      initial model whose tree structure is generated by base model and data. Each gate and
      component parameters are initialized randomly. 'mh_XXX' means FAB learns with model
      hot-start which uses base model as initial model. 'refit_{gate, comp}' means refitting the
      gate functions or prediction formulas with current data. 'opt_{gate, comp}' means optimizing
      (feature selection and fitting) the gate functions or prediction formulas with current data.

|

Utilizable Sample Metadata
==========================
.. warning::

   _fabhme_assigned_comp_id is deprecated. Use hotstart section of SRC instead of _fabhme_assigned_comp_id data column.

This component can utilize the _fabhme_assigned_comp_id attribute of the sample metadata to hot-start with posterior.
When the attribute _fabhme_assigned_comp_id attribute is specified in the input data,
this component will start the FAB/HME algorithm with the _fabhme_assigned_comp_id attribute values as its initial posterior.

To create the attribute _fabhme_assigned_comp_id, see the specification of the command sampo_ps_fabhme export_assigned_comp_id.

|

Output Attributes
=================

.. include:: ./fabhme/bspline_rg_output_attributes.rst

These attributes are in the component output data. These can be loaded in SAMPO API.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_.

When :ref:`convert_process` is executed,
the component output data will be saved in two separate files:

#. All non-basis function value attributes will be saved as *<component_id>*\_predict_result.csv.

    .. include:: ./fabhme/rg_predict_result.rst

#. Basis function value attributes will be saved as basis_func_values.csv.

    .. include:: ./fabhme/bspline_basis_func_values.rst

|

Attribute Metadata
==================

.. include:: ./fabhme/bspline_rg_attr_metadata.rst

|

Model
=====

.. include:: ./fabhme/bspline_rg_model_params.rst
.. include:: ./fabhme/logit_gate_tree_keys.rst

When the model is loaded in the SAMPO API, the model parameters will be output as a single dictionary.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_

::

    {'fic': -38.60868914098668,
     'num_initial_comps': 8,
     'num_active_comps': 2,
     'standardize_mean': 1.1303215277777777e+04,
     'standardize_std': 5.7343353765366674e+03,
     'gate_tree': {'gate_tree':
         {'gate_type': 'logit',
          'hard_gate': True,
          'edges': [
              {'source': 0, 'target': 1, 'is_left': True},
              {'source': 0, 'target': 2, 'is_left': False}],
          'nodes': [
              {'comp_id': 4, 'node_type': 'component', 'node_id': 1},
              {'node_type': 'gate',
               'node_id': 0,
               'gate_func':
                   {'bias': 4.335421263569645,
                    'weights': [
                        {'aid': 'std1[0]', 'attr_name': 'std1_Cement_kg_in_a_m3_mixture', 'weight': -7.395785808097062},
                        {'aid': 'std1[1]', 'attr_name': 'std1_Blast_Furnace_Slag_kg_in_a_m3_mixture', 'weight': -1.1522525130914525}]}},
              {'comp_id': 5, 'node_type': 'component', 'node_id': 2}]}},
     'prediction_formulas':
                                                                          prediction_formula_4  prediction_formula_5
         attr_name                                  basis_function_index
         std1_Cement_kg_in_a_m3_mixture             0                                -0.017232              0.000000
                                                    1                                -0.375864              0.000000
                                                    2                                -0.513790              0.000000
                                                    3                                -0.185817              0.000000
                                                    4                                -0.114696              0.000000
                                                    5                                 0.128587              0.000000
                                                    6                                 0.156415              0.000000
                                                    7                                 0.139183              0.000000
                                                    8                                 0.121951              0.000000
                                                    9                                 0.104719              0.000000
         std1_Blast_Furnace_Slag_kg_in_a_m3_mixture 0                                 0.000000              0.000000
                                                    1                                 0.000000              0.000000
                                                    2                                 0.000000              0.000000
                                                    3                                 0.000000              0.000000
                                                    4                                 0.000000              0.000000
                                                    5                                 0.000000              0.000000
                                                    6                                 0.000000              0.000000
                                                    7                                 0.000000              0.000000
                                                    8                                 0.000000              0.000000
                                                    9                                 0.000000              0.000000
         std1_Fly_Ash_kg_in_a_m3_mixture            0                                 0.000000              0.000000
                                                    1                                 0.000000              0.000000
                                                    2                                 0.000000              0.000000
                                                    3                                 0.000000              0.000000
                                                    4                                 0.000000              0.000000
                                                    5                                 0.000000              0.000000
                                                    6                                 0.000000              0.000000
                                                    7                                 0.000000              0.000000
                                                    8                                 0.000000              0.000000
                                                    9                                 0.000000              0.000000
         ...                                                                               ...                   ...
         std1_Superplasticizer_kg_in_a_m3_mixture   2                                 0.212593             -0.799757
                                                    3                                -0.271219             -0.901149
                                                    4                                -0.563205             -0.619404
                                                    5                                -0.133307             -0.322681
                                                    6                                 0.119509             -0.025957
                                                    7                                 0.050894              0.270767
                                                    8                                -0.042559              0.458322
                                                    9                                -0.090570              0.452728
         std1_Coarse_Aggregate_kg_in_a_m3_mixture   0                                -0.084069              0.000000
                                                    1                                 0.431623              0.000000
                                                    2                                 0.868421              0.000000
                                                    3                                 0.671552              0.000000
                                                    4                                 0.143755              0.000000
                                                    5                                -0.603680              0.000000
                                                    6                                -0.729790              0.000000
                                                    7                                -0.365156              0.000000
                                                    8                                 0.059238              0.000000
                                                    9                                 0.229400              0.000000
         std1_Fine_Aggregate_kg_in_a_m3_mixture     0                                -0.039457              0.000000
                                                    1                                 0.585133              0.000000
                                                    2                                 0.835219              0.000000
                                                    3                                 0.752909              0.000000
                                                    4                                 0.625209              0.000000
                                                    5                                 0.537185              0.000000
                                                    6                                 0.476522              0.000000
                                                    7                                 0.478702              0.000000
                                                    8                                 0.493686              0.000000
                                                    9                                 0.481449              0.000000
                                                    bias                             -0.542800              1.162506
                                                    variance                          0.064958              0.127672
         [72 rows x 2 columns],
     'bspline_knot_vecs':
                                                       knot_value_0  knot_value_1  knot_value_2  knot_value_3  knot_value_4  knot_value_5  knot_value_6  knot_value_7  knot_value_8  knot_value_9  knot_value_10  knot_value_11  knot_value_12
           attr_name
           std1_Cement_kg_in_a_m3_mixture                 -2.545730     -2.545730     -2.054368     -1.563006     -1.071643     -0.580281     -0.088919      0.402444      0.893806      1.385169       1.876531       2.367893       2.367893
           std1_Blast_Furnace_Slag_kg_in_a_m3_mixture     -1.917254     -1.917254     -1.485959     -1.054663     -0.623367     -0.192071      0.239225      0.670521      1.101817      1.533113       1.964409       2.395704       2.395704
           std1_Fly_Ash_kg_in_a_m3_mixture                -2.548438     -2.548438     -2.175280     -1.802123     -1.428965     -1.055808     -0.682650     -0.309493      0.063665      0.436822       0.809980       1.183138       1.183138
           std1_Water_kg_in_a_m3_mixture                  -1.083007     -1.083007     -0.598550     -0.114093      0.370364      0.854821      1.339278      1.823735      2.308191      2.792648       3.277105       3.761562       3.761562
           std1_Superplasticizer_kg_in_a_m3_mixture       -2.049753     -2.049753     -1.460760     -0.871767     -0.282774      0.306219      0.895212      1.484206      2.073199      2.662192       3.251185       3.840178       3.840178
           std1_Coarse_Aggregate_kg_in_a_m3_mixture       -1.513754     -1.513754     -1.047173     -0.580592     -0.114011      0.352569      0.819150      1.285731      1.752312      2.218893       2.685473       3.152054       3.152054
           std1_Fine_Aggregate_kg_in_a_m3_mixture         -1.202162     -1.202162     -0.813606     -0.425050     -0.036494      0.352062      0.740618      1.129174      1.517730      1.906286       2.294842       2.683398       2.683398}
     'bspline_params':
            degree  basis_dim
         0       3         10}


External Format
---------------
When :ref:`convert_process` is executed,
the model parameters are saved into different files and are grouped as: general information,
gating function, prediction formula, B-spline parameters, and B-spline knot vectors.

General Information
```````````````````
This file describes :math:`FIC` after learning the model, initial number of components, and the terminal number of components.

::

    fic,num_initial_comps,num_active_comps
    -1.294308e+02,8,3

Gate Tree
`````````

.. include:: ./fabhme/model_logit_gate_tree.rst

Prediction Formulas
```````````````````

.. include:: ./fabhme/model_bspline_rg_prediction_formulas.rst

B-spline Parameters
```````````````````

.. include:: ./fabhme/model_bspline_params.rst

B-spline Knot Vectors
`````````````````````

.. include:: ./fabhme/model_bspline_knot_vecs.rst

|

Prediction Result Evaluation
============================

.. include:: ./fabhme/rg_predict_result_evaluation_indices.rst

When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded
with the evaluation indices as the columns of the DataFrame.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_

External Format
---------------
When :ref:`convert_process` is executed, the evaluation results
are saved as a CSV file with the evaluation indices as the header of the CSV.

.. include:: ./fabhme/rg_predict_result_evaluation.rst

|

Details
=======
If a data set has samples with missing or +/-Inf values, this component ignores those samples.
