=============================
FAB/HME Engine Specifications
=============================

This document describes specifications of the FAB/HME-engine.

.. contents:: Contents
    :local:

Overview
========

This FAB engine consists of *models* and *learners* parts.

A learner outputs the model as a result of the learning process using
the set of learning data (feature and target data). A specific learner class
is defined for each learning type: a combination of learning types
(regression or classification), gating function types, and component types.

A model can predict target data by inputting feature data. Since FAB models
are represented by linear or simple equations and its parameters, users can
understand how the model predicts target values by interpreting the parameter
values.


Learner
=======

Learner Classes
---------------

There exist ten learners for FAB/HME learning:

+----------------+--------+-----------+----------------+-----------------------------------------------+
| learning type  | target | gate type | component type | learner class                                 |
+================+========+===========+================+===============================================+
| Regression     | Single | Bernoulli | Linear         | HMEBernGateLeastSquaresRgLearner (Bern/Rg)    |
|                |        |           +----------------+-----------------------------------------------+
|                |        |           | Non-linear     | HMEBernGateBSplineRgLearner (Bern/NLRg)       |
|                |        +-----------+----------------+-----------------------------------------------+
|                |        | Logistic  | Linear         | HMELogitGateLeastSquaresRgLearner (Logit/Rg)  |
|                |        |           +----------------+-----------------------------------------------+
|                |        |           | Non-linear     | HMELogitGateBSplineRgLearner (Logit/NLRg)     |
+----------------+        +-----------+----------------+-----------------------------------------------+
| Classification |        | Bernoulli | Linear         | HMEBernGateLogisticRgLearner (Bern/Cl)        |
|                |        |           +----------------+-----------------------------------------------+
|                |        |           | Non-linear     | HMEBernGateBSplineClLearner (Bern/NLCl)       |
|                |        +-----------+----------------+-----------------------------------------------+
|                |        | Logistic  | Linear         | HMELogitGateLogisticRgLearner (Logit/Cl)      |
|                |        |           +----------------+-----------------------------------------------+
|                |        |           | Non-linear     | HMELogitGateBSplineClLearner (Logit/NLCl)     |
|                +--------+-----------+----------------+-----------------------------------------------+
|                | Multi  | Bernoulli | Linear         | HMEBernGateSoftmaxClLearner (Bern/MCl)        |
|                |        +-----------+----------------+-----------------------------------------------+
|                |        | Logistic  | Linear         | HMELogitGateSoftmaxClLearner (Logit/MCl)      |
+----------------+--------+-----------+----------------+-----------------------------------------------+


Learn Methods
-------------

To execute a learning process, call **learn(X, Y)** method of an instance of
the learner:

+-----------+---------------+-----------------------------+--------------------------------+
| parameter | type          | size                        | description                    |
+===========+===============+=============================+================================+
| X         | numpy.ndarray | (num_samples, num_features) | Feature data.                  |
+-----------+---------------+-----------------------------+--------------------------------+
| Y         | numpy.array   | (num_samples)               | Target data for single target. |
+           +---------------+-----------------------------+--------------------------------+
|           | numpy.ndarray | (num_samples, num_targets)  | Target data for multi targets. |
+-----------+---------------+-----------------------------+--------------------------------+

.. note::
   For single target classification problems, each value in Y must be either -1.0 or 1.0.

.. note::
   For multi target classification problems, only one element for each sample
   in Y must be 1.0, and others must be 0.0.

A returned object from the learn() is a tuple of (model, vposterior, context).
Each object in the tuple are as follows:

+------------+-------------------------+------------------------------------------------+
| object     | type                    | description                                    |
+============+=========================+================================================+
| model      | HMESupervisedModel      | A model object as a result of the learning.    |
+------------+-------------------------+------------------------------------------------+
| vposterior | HMEBinaryTreeVPosterior | Variational posterior used in the learning.    |
+------------+-------------------------+------------------------------------------------+
| context    | HMELearningContext      | A context object such as histories of FIC and  |
|            |                         | the number of components in the learning.      |
+------------+-------------------------+------------------------------------------------+


Initialization
--------------

Initialization Types
^^^^^^^^^^^^^^^^^^^^

Each learner class has three class-methods to create its instance.
The default initialization method, __init__(), is not available for FAB-engine users.

  **init_random()**
      It creates a learner for *random start*.
  **init_with_posterior()**
      It creates a learner for *posterior hot-start*.
  **init_with_model_dict()**
      It creates a learner for *model hot-start*.


List of Argument Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following table indicates the existence of parameters for each learner and
initialization method. The meanings or other information of each parameter are
described in the following sections.

+-----------------------------------+-----------------------------------------------+-------------------------------------------------------------------------+
|                                   | Regression                                    | Classification                                                          |
|                                   +-----------------------------------------------+-----------------------------------------------+-------------------------+
| parameter                         | Single target                                                                                 | Multi target            |
|                                   +-----------------------+-----------------------+-----------------------+-----------------------+------------+------------+
|                                   | Bern                  | Logit                 | Bern                  | Logit                 | Bern       | Logit      |
|                                   +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
|                                   | Rg        | NLRg      | Rg        | NLRg      | Cl        | NLCl      | Cl        | NLCl      | MCl        | MCl        |
+===================================+===========+===========+===========+===========+===========+===========+===========+===========+============+============+
| max_fab_iterations                | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| start_from_mstep                  | R         | R         | R         | R         | R         | R         | R         | R         | R          | R          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| num_acceleration_steps            | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| repeat_until_convergence          | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| projection_estep                  | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| shrink_threshold                  | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| fab_stop_threshold                | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| hard_gate                         | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| gate_feature_ids                  | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_feature_ids                  | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_mandatory_feature_ids        | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_positive_feature_ids         | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M |           |            |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_negative_feature_ids         | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M |           |            |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| tree_depth                        | R         | R         | R         | R         | R         | R         | R         | R         | R          | R          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_bspline_degree               |           | R / P     |           | R / P     |           | R / P     |           | R / P     |            |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_bspline_basis_dim            |           | R / P     |           | R / P     |           | R / P     |           | R / P     |            |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_weights_min_scale            | R         | R         | R         | R         | R         | R         | R         | R         | R          | R          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_weights_max_scale            | R         | R         | R         | R         | R         | R         | R         | R         | R          | R          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_bias_min_scale               | R         | R         | R         | R         | R         | R         | R         | R         | R          | R          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_bias_max_scale               | R         | R         | R         | R         | R         | R         | R         | R         | R          | R          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_variance_min_scale           | R         | R         | R         | R         |           |           |           |           |            |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_variance_max_scale           | R         | R         | R         | R         |           |           |           |           |            |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| gate_opt_mode                     | M         | M         | M         | M         | M         | M         | M         | M         | M          | M          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| gate_max_bins                     | R / P / M | R / P / M |           |           | R / P / M | R / P / M |           |           | R / P / M  |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| gate_opt_type                     |           |           | R / P / M | R / P / M |           |           | R / P / M | R / P / M |            | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| gate_l2_regularize                |           |           | R / P / M | R / P / M |           |           | R / P / M | R / P / M |            | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| with_gate_scaled_l0_regularize    |           |           | R / P / M | R / P / M |           |           | R / P / M | R / P / M |            | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| max_gate_relevant_features        |           |           | R / P / M | R / P / M |           |           | R / P / M | R / P / M |            | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| gate_svd_threshold                |           |           | R / P / M | R / P / M |           |           | R / P / M | R / P / M |            | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_foba_skip                    | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_foba_skip_max_interval       | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_opt_mode                     | M         | M         | M         | M         | M         | M         | M         | M         | M          | M          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_two_stage_opt                | R / P / M |           | R / P / M |           |           |           |           |           |            |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_backward_step                | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_opt_type                     | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M |           |            |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| post_comp_opt_type                |           |           |           |           | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_l2_regularize                | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_pspline                      |           | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M |            |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| with_comp_scaled_l0_regularize    | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| max_comp_relevant_features        | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| max_comp_foba_iterations          | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M |           | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_svd_threshold                | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| num_threads_gates                 | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| num_threads_gate_features         | R / P / M | R / P / M |           |           | R / P / M | R / P / M |           |           | R / P / M  |            |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| num_threads_comps                 | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M | R / P / M  | R / P / M  |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| posterior_prob                    | P         | P         | P         | P         | P         | P         | P         | P         | P          | P          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| comp_ids                          | P         | P         | P         | P         | P         | P         | P         | P         | P          | P          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+
| model_dict                        | M         | M         | M         | M         | M         | M         | M         | M         | M          | M          |
+-----------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+

  * R : random start by init_random()
  * P : posterior hot-start by init_with_posterior()
  * M : model hot-start by init_with_model_dict()


Parameter Descriptions
^^^^^^^^^^^^^^^^^^^^^^

Meanings of input parameters are as follows:

.. list-table:: Argument parameters for initialization methods of the learners.
   :header-rows: 1
   :widths: 2, 1, 1, 1, 5

   * - parameter
     - type
     - domain
     - default
     - description
   * - max_fab_iterations
     - int
     - [1, inf)
     - 100
     - Maximum number of FAB-iterations.
   * - start_from_mstep
     - bool
     - True / False
     - False
     - If True, the first iteration starts with M-step; otherwise, E-step.
   * - num_acceleration_steps
     - int
     - [0, inf)
     - 0
     - The number of steps of acceleration algorithm for each FAB-iteration.
       If 0, the acceleration algorithm is disabled.
   * - repeat_until_convergence
     - bool
     - True / False
     - False
     - If False, FAB-iterations and the post-processing are executed only once
       even if the FAB-iterations are stopped not by convergence condition but
       by `max_fab_iterations` condition.
   * - projection_estep
     - bool
     - True / False
     - False
     - Whether the projection E-step algorithm is enabled.
   * - shrink_threshold
     - float or str
     - [1, inf) or (0%, 100%)
     - 1.0
     - Threshold value for shrinkage. If a percentage value (e.g. ``'1.0%'``)
       is specified, shrinkage is executed according to relative value,
       :math:`N_{\rm scaled\_sample} \times t_{\rm shrink}` where
       :math:`t_{\rm shrink}` is the threshold value and :math:`N_{\rm scaled\_sample}`
       is the number of scaled expected samples.
   * - fab_stop_threshold
     - float or str
     - (0, inf) or (0%, inf%)
     - 0.001
     - Threshold value for FAB-iterations: if the increase of FIC value
       is less than the threshold, the FAB-iterations is considered to
       be converged. If a percentage value (e.g. ``'1.0%'``) is specified,
       convergence check is executed according to relative value,
       :math:`(FIC^{(t)} - FIC^{(t-1)}) / | FIC^{(t-1)} |`.
   * - hard_gate
     - bool
     - True / False
     - True
     - If True, hard-gate post-processing is enabled.
   * - gate_feature_ids
     - None or list[int]
     - Length: [1, inf);
       Element: [0, inf)
     - None
     - List of feature IDs which are applied to the parameter optimizations.
       If None, all features are used.
   * - comp_feature_ids
     - None or list[int]
     - Length: [0, inf);
       Element: [0, inf)
     - None
     - List of feature IDs which are applied to the parameter
       optimizations. If None, all features are used. If empty list, model is
       learned as a decision tree.
   * - comp_mandatory_feature_ids
     - None or list[int]
     - Length: [1, inf);
       Element: [0, inf)
     - None
     - List of feature IDs which non-L0-regularize constraints are applied to.
       It means the specified features will always be relevant for all
       components. If None, no features are specified for non-L0-regularization,
       which implies all relevant features are selected by FoBa algorithm.
   * - comp_positive_feature_ids
     - None or list[int]
     - Length: [1, inf);
       Element: [0, inf)
     - None
     - List of feature IDs whose weight values for all components are
       constrained to positive values. If None, all features are optimized with
       no constraints.
   * - comp_negative_feature_ids
     - None or list[int]
     - Length: [1, inf);
       Element: [0, inf)
     - None
     - List of feature IDs whose weight values for all components are
       constrained to negative values. If None, all features are optimized
       with no constraints.
   * - tree_depth
     - int
     - [0, inf)
     - 5
     - Initial depth of the gate-tree structure of latent variable prior.
       The initial number of components is :math:`2^d` where :math:`d` is
       tree depth. If 0, the optimization with only one component will be
       executed.
   * - comp_bspline_degree
     - int
     - [0, inf)
     - 3
     - Degree of B-spline function.
   * - comp_bspline_basis_dim
     - int
     - [4, inf)
     - 10
     - The number of B-spline basis functions to be generated for each feature.
   * - comp_weights_min_scale
     - float
     - (-inf, inf)
     - -0.5
     - Scale value for the initialization of weight values of components.
   * - comp_weights_max_scale
     - float
     - (-inf, inf)
     - 0.5
     - Scale value for the initialization of weight values of components.
   * - comp_bias_min_scale
     - float
     - (-inf, inf)
     - 0.25
     - Scale value for the initialization of bias values of components.
   * - comp_bias_max_scale
     - float
     - (-inf, inf)
     - 0.75
     - Scale value for the initialization of bias values of components.
   * - comp_variance_min_scale
     - float
     - (0, inf)
     - 0.1
     - Scale value for the initialization of variance values of components.
   * - comp_variance_max_scale
     - float
     - (0, inf)
     - 0.25
     - Scale value for the initialization of variance values of components.
   * - gate_opt_mode
     - str
     - {'opt', 'refit', 'keep'}
     - 'opt'
     - Mode of the parameter optimization. If 'opt', the parameters are
       optimized with all features, If 'refit', the parameters are
       fit with relevant features, If 'keep', the parameters are
       kept.
   * - gate_max_bins
     - None or int
     - [1, inf)
     - None
     - Maximum number of binning for each feature, which is used for
       the parameter optimization. If None, all unique samples for each feature
       are used; otherwise, the equal-width binning algorithm is adopted.
   * - gate_opt_type
     - str
     - See description
     - See description
     - Algorithm of the parameter optimization. The domain and default
       value depends on each learner type and described in the following
       section.
   * - gate_l2_regularize
     - float
     - [0, inf)
     - 0.0
     - L2-regularization hyper-parameter for the parameter optimization.
       The larger the specified value, the stronger the regularization
       effect is. If 0.0, L2-regularization is disabled.
   * - with_gate_scaled_l0_regularize
     - bool
     - True / False
     - True
     - Whether with scaled L0-regularization using a tighter lower bound of
       FIC for the parameter optimization; approximation of det(F) is
       refined, where F is a Fisher matrix.
   * - max_gate_relevant_features
     - int
     - [1, inf)
     - 3
     - Maximum number of the relevant features for each gate.
   * - gate_svd_threshold
     - float
     - [0, inf)
     - 0.00001
     - Threshold value for singular value decomposition (SVD) in
       the parameter optimization.
   * - comp_foba_skip
     - str
     - {'power_of_two', 'quarter_square', 'none'}
     - 'power_of_two'
     - The judging function type for the FoBa algorithm skipping. If 'none',
       FoBa is executed for all FAB-iteration steps. FoBa is skipped at
       :math:`{\rm log}_{2}t \ne {\rm ceil}({\rm log}_{2}t)` if 'power_of_two',
       or :math:`t \bmod {\rm ceil}(\sqrt{t}) \ne 0` if 'quarter_square'.
       :math:`t` is FAB-iteration step index number starting from 1.
   * - comp_foba_skip_max_interval
     - int
     - [2, inf)
     - 25
     - The maximum interval for the FoBa algorithm skipping. If comp_foba_skip
       is 'none', this value is ignored.
   * - comp_opt_mode
     - str
     - {'opt', 'refit'}
     - 'opt'
     - Mode of the parameter optimization. If 'opt', the parameters
       are optimized with all features, If 'refit', the parameters are
       fit with relevant features.
   * - comp_two_stage_opt
     - bool
     - True / False
     - False
     - Whether the two-stage optimization is enabled.
       If True, the first stage performs the parameter optimization on
       user-specified mandatory features (`comp_mandatory_feature_ids`), and
       the second stage carries out the parameter optimization to the residual of
       the first stage for only the relevant non-mandatory features.
   * - comp_backward_step
     - bool
     - True / False
     - False
     - Whether the backward-steps of FoBa algorithm are enabled. In the
       post-process, backward-steps are carried out regardless of this argument
       value.
   * - comp_opt_type
     - str
     - See description
     - See description
     - Algorithm of the parameter optimization. The domain and default
       value depends on each learner type and described in the following
       section.
   * - post_comp_opt_type
     - str
     - See description
     - See description
     - Algorithm of the parameter optimization in the post-processing.
       The domain and default value depends on each learner type and described
       in the following section.
   * - comp_l2_regularize
     - float
     - [0, inf)
     - 0.0
     - L2-regularization hyper-parameter for the parameter optimization.
       The larger the specified value, the stronger the regularization effect
       is. If 0.0, L2-regularization is disabled.
   * - comp_pspline
     - float
     - [0, inf)
     - 1.0
     - L2-regularization coefficient value for penalized B-spline function
       (P-spline).
   * - with_comp_scaled_l0_regularize
     - bool
     - True / False
     - True
     - Whether with scaled L0-regularization using a tighter lower bound of
       FIC for the parameter optimization; approximation of det(F) is
       refined, where F is a Fisher matrix.
   * - max_comp_relevant_features
     - int
     - [1, inf)
     - 100
     - Maximum number of the relevant features for each component.
   * - max_comp_foba_iterations
     - int
     - [1, inf)
     - 100
     - Maximum number of the FoBa-iterations for each component.
   * - comp_svd_threshold
     - float
     - [0, inf)
     - 0.00001
     - Threshold value for singular value decomposition (SVD) in
       the parameter optimization.
   * - num_threads_gates
     - int
     - [1, inf)
     - 1
     - Maximum number of OpenMP threads of the parameter optimization where
       tasks for all gates are divided into.
   * - num_threads_gate_features
     - int
     - [1, inf)
     - 1
     - Maximum number of OpenMP threads of the parameter optimization where
       tasks for all features are divided into.
   * - num_threads_comps
     - int
     - [1, inf)
     - 1
     - Maximum number of OpenMP threads of the parameter optimization.
   * - posterior_prob
     - numpy.ndarray
     -
     -
     - Initial posterior distribution for posterior hot-start.
       Size of the posterior matrix = (num_samples, num_comps). The number
       of samples (rows) must be consistent with that for input data given
       at learn().
   * - comp_ids
     - list[int]
     - Length: [1, inf);
       Element: [0, inf)
     -
     - List of component ID numbers for posterior hot-start, whose
       IDs are assigned the same as the components with corresponding
       locations in a complete binary tree numbered from left to right
       (0 to :math:`2^d - 1`) where :math:`d` is tree depth. Initial tree
       structure is defined from this parameter. Note that the length of
       ``comp_ids`` must be the same as that of columns of
       ``posterior_prob``.
   * - model_dict
     - dict
     -
     -
     - Information on FAB/HME supervised model. For the format of model_dict,
       refer to an example for to_dict() method which is defined in HMESupervisedModel.


Learner Specific Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The domain and default values of learner type specific parameters are as
follows:

+----------------+--------------------+-----------------+---------------------------------------------+
| gate type      | parameter          | value           |  description                                |
+================+====================+=================+=============================================+
| Logistic       | gate_opt_type      | 'quadratic'     | using quadratic upper bound approximation   |
|                |                    |                 | with matrix inversion lemma. [default]      |
|                |                    +-----------------+---------------------------------------------+
|                |                    | 'quadratic_svd' | using quadratic upper bound approximation   |
|                |                    |                 | with singular value decomposition.          |
+----------------+--------------------+-----------------+---------------------------------------------+

|

+----------------+--------------------+-----------------+---------------------------------------------+
| component type | parameter          | value           |  description                                |
+================+====================+=================+=============================================+
| Linear         | comp_opt_type      | 'svd'           | using singular value decomposition.         |
| regression     |                    +-----------------+---------------------------------------------+
|                |                    | 'mil'           | using matrix inversion lemma for efficient  |
|                |                    |                 | evaluation of inversion matrices. [default] |
+----------------+--------------------+-----------------+---------------------------------------------+
| Single target  | comp_opt_type      | 'standard'      | using liblinear-weights.                    |
| linear         |                    +-----------------+---------------------------------------------+
| classification |                    | 'quadratic'     | using quadratic upper bound approximation   |
|                |                    |                 | with matrix inversion lemma. [default]      |
|                |                    +-----------------+---------------------------------------------+
|                |                    | 'quadratic_svd' | using quadratic upper bound approximation   |
|                |                    |                 | with singular value decomposition.          |
|                +--------------------+-----------------+---------------------------------------------+
|                | post_comp_opt_type | 'standard'      | using liblinear-weights. [default]          |
|                |                    +-----------------+---------------------------------------------+
|                |                    | 'quadratic'     | using quadratic upper bound approximation.  |
+----------------+--------------------+-----------------+---------------------------------------------+
| Non-linear     | post_comp_opt_type | 'standard'      | repeating optimization 10 times by the      |
| classification |                    |                 | same algorithm as 'quadratic'. [default]    |
|                |                    +-----------------+---------------------------------------------+
|                |                    | 'quadratic'     | using quadratic upper bound approximation.  |
+----------------+--------------------+-----------------+---------------------------------------------+
| Multi target   | post_comp_opt_type | 'standard'      | repeating optimization 100 times by the     |
| linear         |                    |                 | similar algorithm as 'quadratic'. [default] |
| classification |                    +-----------------+---------------------------------------------+
|                |                    | 'quadratic'     | using quadratic upper bound approximation.  |
+----------------+--------------------+-----------------+---------------------------------------------+

Models
======

HMESupervisedModel is a common model class for supervised learnings of FAB/HME.

.. list-table:: Attributes of HMESupervisedModel.
   :header-rows: 1
   :widths: 1, 2, 5

   * - attribute
     - type
     - description
   * - components
     - list[SupervisedComponoent]
     - Component objects.
   * - lvprior
     - HMELVPrior
     - Latent variable prior object.
   * - num_features
     - int
     - The number of features.
   * - num_targets
     - int
     - The number of targets.
   * - gate_feature_ids
     - list[int]
     - Feature ID numbers applied to the parameter optimizations.
   * - comp_feature_ids
     - list[int]
     - Feature ID numbers applied to the parameter optimizations.


Components
----------

HMESupervisedModel contains one or more components. The component type
is decided by the learning type: least-squares regression, logistic regression,
B-spline regression, or B-spline classification.

All types of components predict target values by using decision function:
:math:`Z = X W + b`. The target value is determined as :math:`Y = Z` for
regressions, or :math:`Y = \{ 1 + \exp(-Z) \} ^{-1}` for classifications.


Common attributes
^^^^^^^^^^^^^^^^^

.. list-table:: Common attributes of all component classes.
   :header-rows: 1
   :widths: 1, 1, 1, 2, 5

   * - attribute
     - symbol
     - type
     - domain
     - description
   * - comp_id
     -
     - int
     -
     - Component ID number.
   * - feature_ids
     -
     - list[int]
     -
     - Feature ID numbers for data applied to the parameter optimizations.
   * - weights
     - :math:`W`
     - numpy.array
     - (-inf, inf) or [-inf, inf]
     - Weight values. The size of :math:`W` is equal to the number of
       features applied to the parameter optimization. :math:`W[i]`
       for irrelevant features are zero.
       The size of :math:`W` is (num_features) for linear prediction components,
       (num_features, basis_dim) for B-spline prediction components or
       (num_features, num_targets) for softmax prediction components.
       The domain of each element of :math:`W` is (-inf, inf)
       for linear and softmax prediction components or
       [-inf, inf] for B-spline prediction components (the value can be nan).
   * - bias
     - :math:`b`
     - float or numpy.array
     - (-inf, inf) or [-inf, inf]
     - Bias value. The type of :math:`b` is float for linear and B-spline prediction components or
       numpy.array for softmax prediction components.
       The domain of :math:`b` is (-inf, inf) for linear and B-spline prediction components or
       [-inf, inf] for softmax prediction components.

Regression component specific attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. list-table:: Specific attributes for regression components
   :header-rows: 1
   :widths: 1, 1, 1, 2, 5

   * - attribute
     - symbol
     - type
     - domain
     - description
   * - variance
     - :math:`\sigma^2`
     - float
     - [0, inf]
     - Variance value.

B-spline prediction component specific attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. list-table:: Specific attributes for B-spline prediction components
   :header-rows: 1
   :widths: 1, 1, 1, 2, 5

   * - attribute
     - symbol
     - type
     - domain
     - description
   * - degree
     -
     - int
     - [0, inf)
     - Degree of the B-spline function.
   * - knot_vecs
     -
     - numpy.array
     - [-inf, inf]
     - Knot vectors for all features. The size of knot_vecs is
       (num_features, num_samples, num_knots). Each element of knot_vecs can be nan.

Basis functions of B-spline prediction components for each feature are
generated by the following algorithm. Let :math:`M` be the number of basis
functions for each feature, whose value is given by an argument parameter
``comp_bspline_basis_dim``.

* For :math:`p < M - 1`:
    .. math::
        g_p^{(k)}(x) = \frac{x-x_p}{x_{p+k}-x_p} g_p^{(k-1)}(x)
            + \frac{x_{p+k+1}-x}{x_{p+k+1}-x_{p+1}} g_{p+1}^{(k-1)}(x),

    where :math:`g_p^{(0)}(x) = 1` when :math:`x_p \leq x < x_{p+1}`,
    otherwise :math:`g_p^{(0)}(x) = 0`. If the two knot points are the same
    (:math:`x_p = x_{p+k}`), the term :math:`(x-x_p) / (x_{p+k}-x_p)` is
    defined as zero.

* For :math:`p = M - 1`:
    .. math::
        g_p^{(k)}(x) = x.

where, :math:`x` is a feature value (a column of input feature data, :math:`X`; e.g. ``X[j]`` for :math:`k`-th feature), :math:`x_p` is a knot point where
:math:`p = 0, 1, ..., M - 1` (e.g. :math:`x_p` is ``knot_vecs[p]``), and
:math:`k` is degree of B-spline functions.


Latent Variable Prior
---------------------

Prior classes
^^^^^^^^^^^^^

There are two kinds of prior classes are defined in the FAB-engine, and
both of them are sub-classes of HMEBinaryTreeLVPrior:

    * HMEBernGateLVPrior
    * HMELogitGateLVPrior

The difference of these classes are type of gating function as mentioned later.

.. list-table:: Attributes of HMEBinaryTreeLVPrior
   :header-rows: 1
   :widths: 1, 1, 3

   * - attribute
     - type
     - description
   * - root_node
     - HMEBinaryTreeNode
     - Root node object of gating-tree in the prior.
   * - num_gates
     - int
     - The number of gating-nodes in the prior.
   * - num_comps
     - int
     - The number of component-nodes in the prior.

.. note ::
   There exists lvprior.traverse_depth_first(gates_only=True) method,
   which yield the all node objects with traversing the tree structures.
   Here, ``lvprior`` is an instance of latent variable prior class. The argument
   ``gates_only`` means whether the only gate nodes (not component node) are
   traversed, and its default value is False.


Nodes of Gating-Tree
^^^^^^^^^^^^^^^^^^^^

A prior is composed of gating-nodes and component-nodes defined as
BinaryTreeGateNode and BinaryTreeComponentNode respectively.
Both classes are sub-classes of HMEBinaryTreeNode. Only when a model has just
one component, there are no BinaryTreeGateNodes in the model.

BinaryTreeGateNode is a class for gating-nodes.

.. list-table:: Attributes of BinaryTreeGateNode
   :header-rows: 1
   :widths: 1, 1, 2

   * - attribute
     - type
     - description
   * - gate_index
     - int
     - Gate index number.
   * - gate_func
     - BernGateFunction / LogitGateFunction
     - Gating function object.
   * - parent_node
     - HMEBinaryTreeNode
     - Parent node object of the node.
   * - left_node
     - HMEBinaryTreeNode
     - Left-child node object of the node.
   * - right_node
     - HMEBinaryTreeNode
     - Right-child node object of the node.

BinaryTreeComponentNode is a class for component-nodes.

.. list-table:: Attributes of BinaryTreeComponentNode
   :header-rows: 1
   :widths: 1, 1, 2

   * - attribute
     - type
     - description
   * - comp_index
     - int
     - Component index number.
   * - parent_node
     - HMEBinaryTreeNode
     - Parent node object of the node.

.. note::
   BinaryTreeComponentNode does not hold parameters of the component
   (weights, bias, etc.), a component object described above
   takes the information; BinaryTreeComponentNode maps the component list
   (``comps``) in the model to corresponding positions of component-nodes
   in the gating-tree.


Bernoulli-Gating Function
^^^^^^^^^^^^^^^^^^^^^^^^^

BernGateFunction is a class for Bernoulli-gating function.

.. list-table:: Attributes of BernGateFunction
   :header-rows: 1
   :widths: 1, 1, 1, 2, 5

   * - attribute
     - symbol
     - type
     - domain
     - description
   * - feature_ids
     -
     - list[int]
     -
     - Feature ID numbers of the data for optimizing the gate function.
   * - feature_id
     - :math:`\gamma`
     - int
     - [0, num_features)
     - Feature ID applied to the gate function. The ID number corresponds
       to the column index of user-specified X at learn().
   * - threshold
     - :math:`t`
     - float
     - (-inf, inf)
     - Threshold value for the Bernoulli-gating function.
   * - prob_left
     - :math:`g`
     - float
     - [0, 1]
     - Probability of left-down when :math:`x[\gamma] < t`, where :math:`x`
       is a sample in feature data.

.. note ::
   For a Bernoulli-gating function, probability of left-down is equal to
   :math:`(1 - g)` when :math:`x[\gamma] \ge t`.

.. note::
   A variable ``internal_feature_index`` is defined for internal use, which
   implies the feature index corresponding to the feature data applied to
   the gate optimization (not the all features).


Logistic-Gating Function
^^^^^^^^^^^^^^^^^^^^^^^^

LogitGateFunction is a class for logistic-gating function.

.. list-table:: Attributes of LogitGateFunction
   :header-rows: 1
   :widths: 1, 1, 1, 2, 5

   * - attribute
     - symbol
     - type
     - domain
     - description
   * - feature_ids
     -
     - list[int]
     -
     - Feature ID numbers of the data for optimizing the gate function.
   * - weights
     - :math:`W`
     - numpy.array
     - (-inf, inf)
     - Weight values. The size of :math:`W` is equal to the number of
       features applied to the parameter optimization. :math:`W[i]`
       for irrelevant features are zero.
   * - bias
     - :math:`b`
     - float
     - (-inf, inf)
     - Bias value.
   * - hard_gate
     -
     - bool
     - True / False
     - Whether the gate is hard-gate (True) or soft-gate (False).

.. note::
   The probability of left-down is defined as :math:`p = 1 / \{ 1+\exp(-Z) \}`
   where :math:`Z = XW + b` in the case of soft-gate. It means the sample is
   more likely to left-down if the decision function is positive value as
   :math:`Z > 0`. In the case of hard-gate, the probability of left-down is
   1.0 if the decision function is zero or positive value (:math:`Z \geq 0`).


Others
======

Logging
-------

All logging messages are output through the Python `logging` library and are
named `fab` or its sub-namespace such as `fab.hme`. Four log-levels are used
in the FAB-engine: ERROR, WARN, INFO and DEBUG.

Errors occurred in the FAB-engine are handled as Python standard exception
objects with error messages. Applications using the FAB-engine should be able
to handle the exception object properly to display the error information to
users. Errors in C++ modules in the FAB-engine are turned into Python standard
exception object inside the engine.

Random Seeds
------------

FAB learning processes theoretically depend on initial status (variational
posterior distribution and model parameters such as weights, bias, etc.).
Since the FAB-engine uses ``numpy.random`` library to generate random values
for them in initialization of learning processes, users can specify a random
seed by numpy.random.seed(SEED_VALUE).
