================================================
FABHMELogitGateBSplineCl Component Specification
================================================

.. contents:: Contents
    :local:

Overview
========
FABHMELogitGateBSplineCl component is a B-spline non-linear binary classification component with FAB/HME algorithm.
This component learns a tree-structured model in which each sample is assigned to a component according to Logistic gating functions.

.. note::

    FAB engine uses the word 'component' with a different meaning from that of SAMPO.
    Each component in FAB/HME is a prediction formula, and each sample data is assigned to a specific component for prediction.

**Example**:

* SPD:

  .. code-block:: yaml

    # fabhmecl.spd
    dl1 -> fab1

    ---
    components:
        dl1:
            component: DataLoader
        fab1:
            component: FABHMELogitGateBSplineClComponent
            features: name != 'class'
            tree_depth: 3
            target: name == 'class'
            positive_label: 'Iris-setosa'

    global_settings:
        keep_attributes:
            - class
        feature_exclude:
            - class


* Input of the component:

 +--------+----------------+---------------+----------------+
 |   _sid | \sepal_length_ | \sepal_width_ |   class        |
 |        | in_cm          | in_cm         |                |
 +========+================+===============+================+
 | 0      | 4.9            | 2.5           | Iris-versicolor|
 +--------+----------------+---------------+----------------+
 | 1      | 6.2            | 2.8           | Iris-versicolor|
 +--------+----------------+---------------+----------------+
 | 2      | 7.2            | 3.6           | Iris-versicolor|
 +--------+----------------+---------------+----------------+
 | ...    | ...            | ...           | ...            |
 +--------+----------------+---------------+----------------+
 | 28     | 6.2            | 2.9           | Iris-setosa    |
 +--------+----------------+---------------+----------------+
 | 29     | 6.7            | 3.1           | Iris-setosa    |
 +--------+----------------+---------------+----------------+

 |

* Output of the component:

 +----------+-----------+------------+---------------+-----------------------+
 |   _sid   | \fab1_    | \fab1_     | \fab1_score   |   \fab1_              |
 |          | actual    | predict    |               |   assigned_comp_id    |
 +==========+===========+============+===============+=======================+
 | 0        | -1        | 1          | 2.657069e+00  | 2                     |
 +----------+-----------+------------+---------------+-----------------------+
 | 1        | -1        | 1          | 6.524541e-01  | 2                     |
 +----------+-----------+------------+---------------+-----------------------+
 | 2        | -1        | -1         | -1.600153e+00 | 0                     |
 +----------+-----------+------------+---------------+-----------------------+
 | ...      | ...       | ...        | ...           | ...                   |
 +----------+-----------+------------+---------------+-----------------------+
 | 28       |  1        | 1          | 6.524541e-01  | 2                     |
 +----------+-----------+------------+---------------+-----------------------+
 | 29       |  1        | -1         | -1.080094e+00 | 0                     |
 +----------+-----------+------------+---------------+-----------------------+

 |

 +----------+---------------+---------------+---------------+---------------+---------------+---------------+
 |   _sid   |   \fab1_      |   \fab1_      |   \fab1_      |   \fab1_      |   \fab1_      |   \fab1_      |
 |          |   predict_c0  |   score_c0    |   predict_c1  |   score_c1    |   predict_c2  |   score_c2    |
 +==========+===============+===============+===============+===============+===============+===============+
 | 0        |  1            |  7.921206e-01 | -1            | -1.028756e+00 | 1             | 2.657069e+00  |
 +----------+---------------+---------------+---------------+---------------+---------------+---------------+
 | 1        | -1            | -5.600341e-01 | -1            | -2.346818e+01 | 1             | 6.524541e-01  |
 +----------+---------------+---------------+---------------+---------------+---------------+---------------+
 | 2        | -1            | -1.600153e+00 | -1            | -1.082974e+01 | -1            | -8.895575e-01 |
 +----------+---------------+---------------+---------------+---------------+---------------+---------------+
 | ...      | ...           | ...           | ...           | ...           | ...           | ...           |
 +----------+---------------+---------------+---------------+---------------+---------------+---------------+
 | 28       | -1            | -5.600341e-01 | -1            | -1.821556e+01 | 1             | 6.524541e-01  |
 +----------+---------------+---------------+---------------+---------------+---------------+---------------+
 | 29       | -1            | -1.080094e+00 | -1            | -2.240158e+01 | -1            | -1.185517e-01 |
 +----------+---------------+---------------+---------------+---------------+---------------+---------------+

This component has component-specific external formats for model and prediction result evaluation.

.. seealso::

    Component-common external format files in :ref:`convert_process`

|

Parameters
==========
This component has the following component-specific parameters.

SPD
---

The following parameters are for "components" section of SPD.

.. list-table::
  :header-rows: 1
  :widths: 10, 5, 15, 10, 50

  * - Parameter Name
    - Type
    - Domain
    - Default Value
    - Description
  * - positive_label [1]_
    - str
    - See Description
    - --
    - A value chosen from the target attributes to be set as positive label.
      The domain of this parameter corresponds to that of the target attribute.
  * - max_fab_iterations
    - int
    - [1, inf)
    - 100
    - Maximum number of FAB-iterations.
  * - start_from_mstep [2]_ [3]_
    - bool
    - True / False
    - False
    - If True, the first iteration starts with M-step; otherwise, E-step.
  * - num_acceleration_steps
    - int
    - [0, inf)
    - 0
    - The number of steps of acceleration algorithm for each FAB-iteration.
      If 0, the acceleration algorithm is disabled.
  * - repeat_until_convergence
    - bool
    - True / False
    - False
    - If False, FAB-iterations and the post-processing are executed only once
      even if the FAB-iterations are stopped not by convergence condition but
      by ``max_fab_iterations`` condition.
  * - projection_estep
    - bool
    - True / False
    - False
    - Whether the projection E-step algorithm is enabled.
  * - shrink_threshold
    - float or str
    - [1, inf) or (0%, 100%)
    - 1.0
    - Threshold value for shrinkage. If a percentage value (e.g. ``'1.0%'``)
      is specified, shrinkage is executed according to relative value,
      :math:`N_{\rm scaled\_sample} \times t_{\rm shrink}` where
      :math:`t_{\rm shrink}` is the threshold value and :math:`N_{\rm scaled\_sample}`
      is the number of scaled expected samples.
  * - fab_stop_threshold
    - float or str
    - (0, inf) or (0%, inf%)
    - 0.001
    - Threshold value for FAB-iterations: if the increase of FIC value
      is less than the threshold, the FAB-iterations is considered to
      be converged. If a percentage value (e.g. ``'1.0%'``) is specified,
      convergence check is executed according to relative value,
      :math:`(FIC^{(t)} - FIC^{(t-1)}) / | FIC^{(t-1)} |`.
  * - gate_features
    - str
    - Query format
    - all()
    - Features which are applied to gate parameter optimizations.
      If not specified, all features are used.
  * - comp_features
    - str
    - Query format
    - all()
    - Features which are applied to component parameter optimizations.
      If not specified, all features are used.
      If empty, the model is learned as a decision tree.
  * - comp_mandatory_features
    - str
    - Query format
    - See Description
    - Features which non-L0-regularize constraints are applied to.
      It means the specified features will always be relevant for all components.
      If not specified, no features are specified for non-L0-regularization,
      which implies all relevant features are selected by FoBa algorithm.
  * - tree_depth [2]_ [3]_
    - int
    - [0, inf)
    - 5
    - Initial depth of the gate-tree structure of latent variable prior.
      The initial number of components is :math:`2^d` where :math:`d` is
      tree depth. If 0, the optimization with only one component will be
      executed.
  * - comp_bspline_degree [3]_
    - int
    - [0, inf)
    - 3
    - Degree of B-spline function.
  * - comp_bspline_basis_dim [3]_
    - int
    - [4, inf)
    - 10
    - The number of B-spline basis functions to be generated for each feature.
  * - comp_weights_min_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - -0.5
    - Scale value for the initialization of weight values of components.
  * - comp_weights_max_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.5
    - Scale value for the initialization of weight values of components.
  * - comp_bias_min_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.25
    - Scale value for the initialization of bias values of components.
  * - comp_bias_max_scale [2]_ [3]_
    - float
    - (-inf, inf)
    - 0.75
    - Scale value for the initialization of bias values of components.
  * - gate_l2_regularize
    - float
    - [0, inf)
    - 0.0
    - L2-regularization hyper-parameter for gate-parameter optimization.
      The larger the specified value, the stronger the regularization effect is.
      If 0.0, L2-regularization is disabled.
  * - with_gate_scaled_l0_regularize
    - bool
    - True / False
    - True
    - Whether with scaled L0-regularization using a tighter lower bound of
      FIC for gate parameter optimization; approximation of det(F) is
      refined, where F is a Fisher matrix.
  * - max_gate_relevant_features
    - int
    - [1, inf)
    - 3
    - Maximum number of the relevant features for each gate.
  * - comp_l2_regularize
    - float
    - [0, inf)
    - 0.0
    - L2-regularization hyper-parameter for component parameter optimization.
      The larger the specified value, the stronger the regularization effect
      is. If 0.0, L2-regularization is disabled.
  * - comp_pspline
    - float
    - [0, inf)
    - 1.0
    - L2-regularization coefficient value for penalized B-spline function
      (P-spline).
  * - with_comp_scaled_l0_regularize
    - bool
    - True / False
    - True
    - Whether with scaled L0-regularization using a tighter lower bound of
      FIC for component parameter optimization; approximation of det(F) is
      refined, where F is a Fisher matrix.
  * - max_comp_relevant_features
    - int
    - [1, inf)
    - 100
    - Maximum number of the relevant features for each component.
  * - num_threads_gates
    - int
    - [1, inf)
    - 1
    - Maximum number of OpenMP threads of gate parameter optimization where
      tasks for all gates are divided into.
  * - num_threads_comps
    - int
    - [1, inf)
    - 1
    - Maximum number of OpenMP threads of component parameter optimization.

.. [1] Required parameter
.. [2] Ignore parameter in posterior hot-start
.. [3] Ignore parameter in model hot-start

SRC
---

The following parameter is for "hotstart" section of SRC.

.. list-table::
  :header-rows: 1
  :widths: 10, 5, 15, 10, 50

  * - Parameter Name
    - Type
    - Domain
    - Default Value
    - Description
  * - type
    - str
    - {'posterior', 'mh_refit_comp', 'mh_opt_comp', 'mh_refit_gate_and_refit_comp', 'mh_refit_gate_and_opt_comp', 'mh_opt_gate_and_opt_comp'}
    - 
    - The hot-start type. If 'posterior', FAB learns with posterior hot-start which use the
      initial model whose tree structure is generated by base model and data. Each gate and
      component parameters are initialized randomly. 'mh_XXX' means FAB learns with model
      hot-start which uses base model as initial model. 'refit_{gate, comp}' means refitting the
      gate functions or prediction formulas with current data. 'opt_{gate, comp}' means optimizing
      (feature selection and fitting) the gate functions or prediction formulas with current data.

|

Utilizable Sample Metadata
==========================
.. warning::

   _fabhme_assigned_comp_id is deprecated. Use hotstart section of SRC instead of _fabhme_assigned_comp_id data column.

This component can utilize the _fabhme_assigned_comp_id attribute of the sample metadata to hot-start with posterior.
When the attribute _fabhme_assigned_comp_id attribute is specified in the input data,
this component will start the FAB/HME algorithm with the _fabhme_assigned_comp_id attribute values as its initial posterior.

To create the attribute _fabhme_assigned_comp_id, see the specification of the command sampo_ps_fabhme export_assigned_comp_id.

|

Output Attributes
=================

.. include:: ./fabhme/bspline_cl_output_attributes.rst

These attributes are in the component output data. These can be loaded in SAMPO API.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_.

When :ref:`convert_process` is executed,
the component output data will be saved in two separate files:

#. All non-basis function value attributes will be saved as *<component_id>*\_predict_result.csv.

    .. include:: ./fabhme/cl_predict_result.rst

#. Basis function value attributes will be saved as basis_func_values.csv.

    .. include:: ./fabhme/bspline_basis_func_values.rst

|

Attribute Metadata
==================

.. include:: ./fabhme/bspline_cl_attr_metadata.rst

|

Model
=====

.. include:: ./fabhme/bspline_cl_model_params.rst
.. include:: ./fabhme/logit_gate_tree_keys.rst

When the model is loaded in the SAMPO API, the model parameters will be output as a single dictionary.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_

::

    {'fic': -23.832958802449035,
     'num_initial_comps': 32,
     'num_active_comps': 2,
     'gate_tree':
         {'gate_type': 'logit',
          'hard_gate': True,
          'nodes': [
              {'comp_id': 20, 'node_type': 'component', 'node_id': 1},
              {'node_type': 'gate',
               'node_id': 0,
               'gate_func':
                   {'bias': -14.594158450398055,
                    'weights': [
                        {'aid': 'dl[0]', 'attr_name': 'sepal_length_in_cm', 'weight': 10.426327199487217},
                        {'aid': 'dl[1]', 'attr_name': 'petal_length_in_cm', 'weight': -13.460106074504926}]}},
              {'comp_id': 30, 'node_type': 'component', 'node_id': 2}],
          'edges': [
              {'source': 0, 'target': 1, 'is_left': True},
              {'source': 0, 'target': 2, 'is_left': False}]}},
     'prediction_formulas':
                                                  prediction_formula_20  prediction_formula_30
         attr_name          basis_function_index
         sepal_length_in_cm 0                                         0                      0
                            1                                         0                      0
                            2                                         0                      0
                            3                                         0                      0
                            4                                         0                      0
                            5                                         0                      0
                            6                                         0                      0
                            7                                         0                      0
                            8                                         0                      0
                            9                                         0                      0
         petal_length_in_cm 0                                         0                      0
                            1                                         0                      0
                            2                                         0                      0
                            3                                         0                      0
                            4                                         0                      0
                            5                                         0                      0
                            6                                         0                      0
                            7                                         0                      0
                            8                                         0                      0
                            9                                         0                      0
                            bias                                     -1                      1,
     'bspline_params':    degree  basis_dim
         0       3         10,
     'bspline_knot_vecs':
                             knot_value_0  knot_value_1  knot_value_2  knot_value_3  knot_value_4  knot_value_5  knot_value_6  knot_value_7  knot_value_8  knot_value_9  knot_value_10  knot_value_11  knot_value_12
         attr_name
         sepal_length_in_cm        3.9625        3.9625           4.3        4.6375         4.975        5.3125          5.65        5.9875         6.325        6.6625            7.0         7.3375         7.3375
         petal_length_in_cm        1.7000        1.7000           2.0        2.3000         2.600        2.9000          3.20        3.5000         3.800        4.1000            4.4         4.7000         4.7000
    }


External Format
---------------
When :ref:`convert_process` is executed,
the model parameters are saved into different files and are grouped as: general information,
gating function, prediction formula, B-spline parameters, and B-spline knot vectors.

General Information
```````````````````
This file describes :math:`FIC` after learning the model, initial number of components, and the terminal number of components.

::

    fic,num_initial_comps,num_active_comps
    -1.294308e+02,8,3

Gate Tree
`````````

.. include:: ./fabhme/model_logit_gate_tree.rst

Prediction Formulas
```````````````````

.. include:: ./fabhme/model_bspline_cl_prediction_formulas.rst

B-spline Parameters
```````````````````

.. include:: ./fabhme/model_bspline_params.rst

B-spline Knot Vectors
`````````````````````

.. include:: ./fabhme/model_bspline_knot_vecs.rst

|

Prediction Result Evaluation
============================

.. include:: ./fabhme/cl_predict_result_evaluation_indices.rst

When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded
with the evaluation indices as the columns of the DataFrame.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_

External Format
---------------
When :ref:`convert_process` is executed, the evaluation results
are saved as a CSV file with the evaluation indices as the header of the CSV.

.. include:: ./fabhme/cl_predict_result_evaluation.rst

|

Details
=======
If a data set has samples with missing or +/-Inf values, this component ignores those samples.
