FAB/HME Engine Specifications¶
This document describes specifications of the FAB/HME-engine.
Overview¶
This FAB engine consists of models and learners parts.
A learner outputs the model as a result of the learning process using the set of learning data (feature and target data). A specific learner class is defined for each learning type: a combination of learning types (regression or classification), gating function types, and component types.
A model can predict target data by inputting feature data. Since FAB models are represented by linear or simple equations and its parameters, users can understand how the model predicts target values by interpreting the parameter values.
Learner¶
Learner Classes¶
There exist ten learners for FAB/HME learning:
learning type |
target |
gate type |
component type |
learner class |
---|---|---|---|---|
Regression |
Single |
Bernoulli |
Linear |
HMEBernGateLeastSquaresRgLearner (Bern/Rg) |
Non-linear |
HMEBernGateBSplineRgLearner (Bern/NLRg) |
|||
Logistic |
Linear |
HMELogitGateLeastSquaresRgLearner (Logit/Rg) |
||
Non-linear |
HMELogitGateBSplineRgLearner (Logit/NLRg) |
|||
Classification |
Bernoulli |
Linear |
HMEBernGateLogisticRgLearner (Bern/Cl) |
|
Non-linear |
HMEBernGateBSplineClLearner (Bern/NLCl) |
|||
Logistic |
Linear |
HMELogitGateLogisticRgLearner (Logit/Cl) |
||
Non-linear |
HMELogitGateBSplineClLearner (Logit/NLCl) |
|||
Multi |
Bernoulli |
Linear |
HMEBernGateSoftmaxClLearner (Bern/MCl) |
|
Logistic |
Linear |
HMELogitGateSoftmaxClLearner (Logit/MCl) |
Learn Methods¶
To execute a learning process, call learn(X, Y) method of an instance of the learner:
parameter |
type |
size |
description |
---|---|---|---|
X |
numpy.ndarray |
(num_samples, num_features) |
Feature data. |
Y |
numpy.array |
(num_samples) |
Target data for single target. |
numpy.ndarray |
(num_samples, num_targets) |
Target data for multi targets. |
Note
For single target classification problems, each value in Y must be either -1.0 or 1.0.
Note
For multi target classification problems, only one element for each sample in Y must be 1.0, and others must be 0.0.
A returned object from the learn() is a tuple of (model, vposterior, context). Each object in the tuple are as follows:
object |
type |
description |
---|---|---|
model |
HMESupervisedModel |
A model object as a result of the learning. |
vposterior |
HMEBinaryTreeVPosterior |
Variational posterior used in the learning. |
context |
HMELearningContext |
A context object such as histories of FIC and the number of components in the learning. |
Initialization¶
Initialization Types¶
Each learner class has three class-methods to create its instance. The default initialization method, __init__(), is not available for FAB-engine users.
- init_random()
It creates a learner for random start.
- init_with_posterior()
It creates a learner for posterior hot-start.
- init_with_model_dict()
It creates a learner for model hot-start.
List of Argument Parameters¶
The following table indicates the existence of parameters for each learner and initialization method. The meanings or other information of each parameter are described in the following sections.
parameter |
Regression |
Classification |
||||||||
---|---|---|---|---|---|---|---|---|---|---|
Single target |
Multi target |
|||||||||
Bern |
Logit |
Bern |
Logit |
Bern |
Logit |
|||||
Rg |
NLRg |
Rg |
NLRg |
Cl |
NLCl |
Cl |
NLCl |
MCl |
MCl |
|
max_fab_iterations |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
start_from_mstep |
R |
R |
R |
R |
R |
R |
R |
R |
R |
R |
num_acceleration_steps |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
repeat_until_convergence |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
projection_estep |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
shrink_threshold |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
fab_stop_threshold |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
hard_gate |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
gate_feature_ids |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
comp_feature_ids |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
comp_mandatory_feature_ids |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
comp_positive_feature_ids |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
||||||
comp_negative_feature_ids |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
||||||
tree_depth |
R |
R |
R |
R |
R |
R |
R |
R |
R |
R |
comp_bspline_degree |
R / P |
R / P |
R / P |
R / P |
||||||
comp_bspline_basis_dim |
R / P |
R / P |
R / P |
R / P |
||||||
comp_weights_min_scale |
R |
R |
R |
R |
R |
R |
R |
R |
R |
R |
comp_weights_max_scale |
R |
R |
R |
R |
R |
R |
R |
R |
R |
R |
comp_bias_min_scale |
R |
R |
R |
R |
R |
R |
R |
R |
R |
R |
comp_bias_max_scale |
R |
R |
R |
R |
R |
R |
R |
R |
R |
R |
comp_variance_min_scale |
R |
R |
R |
R |
||||||
comp_variance_max_scale |
R |
R |
R |
R |
||||||
gate_opt_mode |
M |
M |
M |
M |
M |
M |
M |
M |
M |
M |
gate_max_bins |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
|||||
gate_opt_type |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
|||||
gate_l2_regularize |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
|||||
with_gate_scaled_l0_regularize |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
|||||
max_gate_relevant_features |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
|||||
gate_svd_threshold |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
|||||
comp_foba_skip |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
||||
comp_foba_skip_max_interval |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
||||
comp_opt_mode |
M |
M |
M |
M |
M |
M |
M |
M |
M |
M |
comp_two_stage_opt |
R / P / M |
R / P / M |
||||||||
comp_backward_step |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
||||
comp_opt_type |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
||||||
post_comp_opt_type |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
||||
comp_l2_regularize |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
comp_pspline |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
||||||
with_comp_scaled_l0_regularize |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
max_comp_relevant_features |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
max_comp_foba_iterations |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
||||
comp_svd_threshold |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
num_threads_gates |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
num_threads_gate_features |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
|||||
num_threads_comps |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
R / P / M |
posterior_prob |
P |
P |
P |
P |
P |
P |
P |
P |
P |
P |
comp_ids |
P |
P |
P |
P |
P |
P |
P |
P |
P |
P |
model_dict |
M |
M |
M |
M |
M |
M |
M |
M |
M |
M |
R : random start by init_random()
P : posterior hot-start by init_with_posterior()
M : model hot-start by init_with_model_dict()
Parameter Descriptions¶
Meanings of input parameters are as follows:
parameter |
type |
domain |
default |
description |
---|---|---|---|---|
max_fab_iterations |
int |
[1, inf) |
100 |
Maximum number of FAB-iterations. |
start_from_mstep |
bool |
True / False |
False |
If True, the first iteration starts with M-step; otherwise, E-step. |
num_acceleration_steps |
int |
[0, inf) |
0 |
The number of steps of acceleration algorithm for each FAB-iteration. If 0, the acceleration algorithm is disabled. |
repeat_until_convergence |
bool |
True / False |
False |
If False, FAB-iterations and the post-processing are executed only once even if the FAB-iterations are stopped not by convergence condition but by max_fab_iterations condition. |
projection_estep |
bool |
True / False |
False |
Whether the projection E-step algorithm is enabled. |
shrink_threshold |
float or str |
[1, inf) or (0%, 100%) |
1.0 |
Threshold value for shrinkage. If a percentage value (e.g. |
fab_stop_threshold |
float or str |
(0, inf) or (0%, inf%) |
0.001 |
Threshold value for FAB-iterations: if the increase of FIC value
is less than the threshold, the FAB-iterations is considered to
be converged. If a percentage value (e.g. |
hard_gate |
bool |
True / False |
True |
If True, hard-gate post-processing is enabled. |
gate_feature_ids |
None or list[int] |
Length: [1, inf); Element: [0, inf) |
None |
List of feature IDs which are applied to the parameter optimizations. If None, all features are used. |
comp_feature_ids |
None or list[int] |
Length: [0, inf); Element: [0, inf) |
None |
List of feature IDs which are applied to the parameter optimizations. If None, all features are used. If empty list, model is learned as a decision tree. |
comp_mandatory_feature_ids |
None or list[int] |
Length: [1, inf); Element: [0, inf) |
None |
List of feature IDs which non-L0-regularize constraints are applied to. It means the specified features will always be relevant for all components. If None, no features are specified for non-L0-regularization, which implies all relevant features are selected by FoBa algorithm. |
comp_positive_feature_ids |
None or list[int] |
Length: [1, inf); Element: [0, inf) |
None |
List of feature IDs whose weight values for all components are constrained to positive values. If None, all features are optimized with no constraints. |
comp_negative_feature_ids |
None or list[int] |
Length: [1, inf); Element: [0, inf) |
None |
List of feature IDs whose weight values for all components are constrained to negative values. If None, all features are optimized with no constraints. |
tree_depth |
int |
[0, inf) |
5 |
Initial depth of the gate-tree structure of latent variable prior. The initial number of components is \(2^d\) where \(d\) is tree depth. If 0, the optimization with only one component will be executed. |
comp_bspline_degree |
int |
[0, inf) |
3 |
Degree of B-spline function. |
comp_bspline_basis_dim |
int |
[4, inf) |
10 |
The number of B-spline basis functions to be generated for each feature. |
comp_weights_min_scale |
float |
(-inf, inf) |
-0.5 |
Scale value for the initialization of weight values of components. |
comp_weights_max_scale |
float |
(-inf, inf) |
0.5 |
Scale value for the initialization of weight values of components. |
comp_bias_min_scale |
float |
(-inf, inf) |
0.25 |
Scale value for the initialization of bias values of components. |
comp_bias_max_scale |
float |
(-inf, inf) |
0.75 |
Scale value for the initialization of bias values of components. |
comp_variance_min_scale |
float |
(0, inf) |
0.1 |
Scale value for the initialization of variance values of components. |
comp_variance_max_scale |
float |
(0, inf) |
0.25 |
Scale value for the initialization of variance values of components. |
gate_opt_mode |
str |
{‘opt’, ‘refit’, ‘keep’} |
‘opt’ |
Mode of the parameter optimization. If ‘opt’, the parameters are optimized with all features, If ‘refit’, the parameters are fit with relevant features, If ‘keep’, the parameters are kept. |
gate_max_bins |
None or int |
[1, inf) |
None |
Maximum number of binning for each feature, which is used for the parameter optimization. If None, all unique samples for each feature are used; otherwise, the equal-width binning algorithm is adopted. |
gate_opt_type |
str |
See description |
See description |
Algorithm of the parameter optimization. The domain and default value depends on each learner type and described in the following section. |
gate_l2_regularize |
float |
[0, inf) |
0.0 |
L2-regularization hyper-parameter for the parameter optimization. The larger the specified value, the stronger the regularization effect is. If 0.0, L2-regularization is disabled. |
with_gate_scaled_l0_regularize |
bool |
True / False |
True |
Whether with scaled L0-regularization using a tighter lower bound of FIC for the parameter optimization; approximation of det(F) is refined, where F is a Fisher matrix. |
max_gate_relevant_features |
int |
[1, inf) |
3 |
Maximum number of the relevant features for each gate. |
gate_svd_threshold |
float |
[0, inf) |
0.00001 |
Threshold value for singular value decomposition (SVD) in the parameter optimization. |
comp_foba_skip |
str |
{‘power_of_two’, ‘quarter_square’, ‘none’} |
‘power_of_two’ |
The judging function type for the FoBa algorithm skipping. If ‘none’, FoBa is executed for all FAB-iteration steps. FoBa is skipped at \({\rm log}_{2}t \ne {\rm ceil}({\rm log}_{2}t)\) if ‘power_of_two’, or \(t \bmod {\rm ceil}(\sqrt{t}) \ne 0\) if ‘quarter_square’. \(t\) is FAB-iteration step index number starting from 1. |
comp_foba_skip_max_interval |
int |
[2, inf) |
25 |
The maximum interval for the FoBa algorithm skipping. If comp_foba_skip is ‘none’, this value is ignored. |
comp_opt_mode |
str |
{‘opt’, ‘refit’} |
‘opt’ |
Mode of the parameter optimization. If ‘opt’, the parameters are optimized with all features, If ‘refit’, the parameters are fit with relevant features. |
comp_two_stage_opt |
bool |
True / False |
False |
Whether the two-stage optimization is enabled. If True, the first stage performs the parameter optimization on user-specified mandatory features (comp_mandatory_feature_ids), and the second stage carries out the parameter optimization to the residual of the first stage for only the relevant non-mandatory features. |
comp_backward_step |
bool |
True / False |
False |
Whether the backward-steps of FoBa algorithm are enabled. In the post-process, backward-steps are carried out regardless of this argument value. |
comp_opt_type |
str |
See description |
See description |
Algorithm of the parameter optimization. The domain and default value depends on each learner type and described in the following section. |
post_comp_opt_type |
str |
See description |
See description |
Algorithm of the parameter optimization in the post-processing. The domain and default value depends on each learner type and described in the following section. |
comp_l2_regularize |
float |
[0, inf) |
0.0 |
L2-regularization hyper-parameter for the parameter optimization. The larger the specified value, the stronger the regularization effect is. If 0.0, L2-regularization is disabled. |
comp_pspline |
float |
[0, inf) |
1.0 |
L2-regularization coefficient value for penalized B-spline function (P-spline). |
with_comp_scaled_l0_regularize |
bool |
True / False |
True |
Whether with scaled L0-regularization using a tighter lower bound of FIC for the parameter optimization; approximation of det(F) is refined, where F is a Fisher matrix. |
max_comp_relevant_features |
int |
[1, inf) |
100 |
Maximum number of the relevant features for each component. |
max_comp_foba_iterations |
int |
[1, inf) |
100 |
Maximum number of the FoBa-iterations for each component. |
comp_svd_threshold |
float |
[0, inf) |
0.00001 |
Threshold value for singular value decomposition (SVD) in the parameter optimization. |
num_threads_gates |
int |
[1, inf) |
1 |
Maximum number of OpenMP threads of the parameter optimization where tasks for all gates are divided into. |
num_threads_gate_features |
int |
[1, inf) |
1 |
Maximum number of OpenMP threads of the parameter optimization where tasks for all features are divided into. |
num_threads_comps |
int |
[1, inf) |
1 |
Maximum number of OpenMP threads of the parameter optimization. |
posterior_prob |
numpy.ndarray |
Initial posterior distribution for posterior hot-start. Size of the posterior matrix = (num_samples, num_comps). The number of samples (rows) must be consistent with that for input data given at learn(). |
||
comp_ids |
list[int] |
Length: [1, inf); Element: [0, inf) |
List of component ID numbers for posterior hot-start, whose
IDs are assigned the same as the components with corresponding
locations in a complete binary tree numbered from left to right
(0 to \(2^d - 1\)) where \(d\) is tree depth. Initial tree
structure is defined from this parameter. Note that the length of
|
|
model_dict |
dict |
Information on FAB/HME supervised model. For the format of model_dict, refer to an example for to_dict() method which is defined in HMESupervisedModel. |
Learner Specific Parameters¶
The domain and default values of learner type specific parameters are as follows:
gate type |
parameter |
value |
description |
---|---|---|---|
Logistic |
gate_opt_type |
‘quadratic’ |
using quadratic upper bound approximation with matrix inversion lemma. [default] |
‘quadratic_svd’ |
using quadratic upper bound approximation with singular value decomposition. |
component type |
parameter |
value |
description |
---|---|---|---|
Linear regression |
comp_opt_type |
‘svd’ |
using singular value decomposition. |
‘mil’ |
using matrix inversion lemma for efficient evaluation of inversion matrices. [default] |
||
Single target linear classification |
comp_opt_type |
‘standard’ |
using liblinear-weights. |
‘quadratic’ |
using quadratic upper bound approximation with matrix inversion lemma. [default] |
||
‘quadratic_svd’ |
using quadratic upper bound approximation with singular value decomposition. |
||
post_comp_opt_type |
‘standard’ |
using liblinear-weights. [default] |
|
‘quadratic’ |
using quadratic upper bound approximation. |
||
Non-linear classification |
post_comp_opt_type |
‘standard’ |
repeating optimization 10 times by the same algorithm as ‘quadratic’. [default] |
‘quadratic’ |
using quadratic upper bound approximation. |
||
Multi target linear classification |
post_comp_opt_type |
‘standard’ |
repeating optimization 100 times by the similar algorithm as ‘quadratic’. [default] |
‘quadratic’ |
using quadratic upper bound approximation. |
Models¶
HMESupervisedModel is a common model class for supervised learnings of FAB/HME.
attribute |
type |
description |
---|---|---|
components |
list[SupervisedComponoent] |
Component objects. |
lvprior |
HMELVPrior |
Latent variable prior object. |
num_features |
int |
The number of features. |
num_targets |
int |
The number of targets. |
gate_feature_ids |
list[int] |
Feature ID numbers applied to the parameter optimizations. |
comp_feature_ids |
list[int] |
Feature ID numbers applied to the parameter optimizations. |
Components¶
HMESupervisedModel contains one or more components. The component type is decided by the learning type: least-squares regression, logistic regression, B-spline regression, or B-spline classification.
All types of components predict target values by using decision function: \(Z = X W + b\). The target value is determined as \(Y = Z\) for regressions, or \(Y = \{ 1 + \exp(-Z) \} ^{-1}\) for classifications.
Common attributes¶
attribute |
symbol |
type |
domain |
description |
---|---|---|---|---|
comp_id |
int |
Component ID number. |
||
feature_ids |
list[int] |
Feature ID numbers for data applied to the parameter optimizations. |
||
weights |
\(W\) |
numpy.array |
(-inf, inf) or [-inf, inf] |
Weight values. The size of \(W\) is equal to the number of features applied to the parameter optimization. \(W[i]\) for irrelevant features are zero. The size of \(W\) is (num_features) for linear prediction components, (num_features, basis_dim) for B-spline prediction components or (num_features, num_targets) for softmax prediction components. The domain of each element of \(W\) is (-inf, inf) for linear and softmax prediction components or [-inf, inf] for B-spline prediction components (the value can be nan). |
bias |
\(b\) |
float or numpy.array |
(-inf, inf) or [-inf, inf] |
Bias value. The type of \(b\) is float for linear and B-spline prediction components or numpy.array for softmax prediction components. The domain of \(b\) is (-inf, inf) for linear and B-spline prediction components or [-inf, inf] for softmax prediction components. |
Regression component specific attributes¶
attribute |
symbol |
type |
domain |
description |
---|---|---|---|---|
variance |
\(\sigma^2\) |
float |
[0, inf] |
Variance value. |
B-spline prediction component specific attributes¶
attribute |
symbol |
type |
domain |
description |
---|---|---|---|---|
degree |
int |
[0, inf) |
Degree of the B-spline function. |
|
knot_vecs |
numpy.array |
[-inf, inf] |
Knot vectors for all features. The size of knot_vecs is (num_features, num_samples, num_knots). Each element of knot_vecs can be nan. |
Basis functions of B-spline prediction components for each feature are
generated by the following algorithm. Let \(M\) be the number of basis
functions for each feature, whose value is given by an argument parameter
comp_bspline_basis_dim
.
- For \(p < M - 1\):
- \[g_p^{(k)}(x) = \frac{x-x_p}{x_{p+k}-x_p} g_p^{(k-1)}(x) + \frac{x_{p+k+1}-x}{x_{p+k+1}-x_{p+1}} g_{p+1}^{(k-1)}(x),\]
where \(g_p^{(0)}(x) = 1\) when \(x_p \leq x < x_{p+1}\), otherwise \(g_p^{(0)}(x) = 0\). If the two knot points are the same (\(x_p = x_{p+k}\)), the term \((x-x_p) / (x_{p+k}-x_p)\) is defined as zero.
- For \(p = M - 1\):
- \[g_p^{(k)}(x) = x.\]
where, \(x\) is a feature value (a column of input feature data, \(X\); e.g. X[j]
for \(k\)-th feature), \(x_p\) is a knot point where
\(p = 0, 1, ..., M - 1\) (e.g. \(x_p\) is knot_vecs[p]
), and
\(k\) is degree of B-spline functions.
Latent Variable Prior¶
Prior classes¶
There are two kinds of prior classes are defined in the FAB-engine, and both of them are sub-classes of HMEBinaryTreeLVPrior:
HMEBernGateLVPrior
HMELogitGateLVPrior
The difference of these classes are type of gating function as mentioned later.
attribute |
type |
description |
---|---|---|
root_node |
HMEBinaryTreeNode |
Root node object of gating-tree in the prior. |
num_gates |
int |
The number of gating-nodes in the prior. |
num_comps |
int |
The number of component-nodes in the prior. |
Note
There exists lvprior.traverse_depth_first(gates_only=True) method,
which yield the all node objects with traversing the tree structures.
Here, lvprior
is an instance of latent variable prior class. The argument
gates_only
means whether the only gate nodes (not component node) are
traversed, and its default value is False.
Nodes of Gating-Tree¶
A prior is composed of gating-nodes and component-nodes defined as BinaryTreeGateNode and BinaryTreeComponentNode respectively. Both classes are sub-classes of HMEBinaryTreeNode. Only when a model has just one component, there are no BinaryTreeGateNodes in the model.
BinaryTreeGateNode is a class for gating-nodes.
attribute |
type |
description |
---|---|---|
gate_index |
int |
Gate index number. |
gate_func |
BernGateFunction / LogitGateFunction |
Gating function object. |
parent_node |
HMEBinaryTreeNode |
Parent node object of the node. |
left_node |
HMEBinaryTreeNode |
Left-child node object of the node. |
right_node |
HMEBinaryTreeNode |
Right-child node object of the node. |
BinaryTreeComponentNode is a class for component-nodes.
attribute |
type |
description |
---|---|---|
comp_index |
int |
Component index number. |
parent_node |
HMEBinaryTreeNode |
Parent node object of the node. |
Note
BinaryTreeComponentNode does not hold parameters of the component
(weights, bias, etc.), a component object described above
takes the information; BinaryTreeComponentNode maps the component list
(comps
) in the model to corresponding positions of component-nodes
in the gating-tree.
Bernoulli-Gating Function¶
BernGateFunction is a class for Bernoulli-gating function.
attribute |
symbol |
type |
domain |
description |
---|---|---|---|---|
feature_ids |
list[int] |
Feature ID numbers of the data for optimizing the gate function. |
||
feature_id |
\(\gamma\) |
int |
[0, num_features) |
Feature ID applied to the gate function. The ID number corresponds to the column index of user-specified X at learn(). |
threshold |
\(t\) |
float |
(-inf, inf) |
Threshold value for the Bernoulli-gating function. |
prob_left |
\(g\) |
float |
[0, 1] |
Probability of left-down when \(x[\gamma] < t\), where \(x\) is a sample in feature data. |
Note
For a Bernoulli-gating function, probability of left-down is equal to \((1 - g)\) when \(x[\gamma] \ge t\).
Note
A variable internal_feature_index
is defined for internal use, which
implies the feature index corresponding to the feature data applied to
the gate optimization (not the all features).
Logistic-Gating Function¶
LogitGateFunction is a class for logistic-gating function.
attribute |
symbol |
type |
domain |
description |
---|---|---|---|---|
feature_ids |
list[int] |
Feature ID numbers of the data for optimizing the gate function. |
||
weights |
\(W\) |
numpy.array |
(-inf, inf) |
Weight values. The size of \(W\) is equal to the number of features applied to the parameter optimization. \(W[i]\) for irrelevant features are zero. |
bias |
\(b\) |
float |
(-inf, inf) |
Bias value. |
hard_gate |
bool |
True / False |
Whether the gate is hard-gate (True) or soft-gate (False). |
Note
The probability of left-down is defined as \(p = 1 / \{ 1+\exp(-Z) \}\) where \(Z = XW + b\) in the case of soft-gate. It means the sample is more likely to left-down if the decision function is positive value as \(Z > 0\). In the case of hard-gate, the probability of left-down is 1.0 if the decision function is zero or positive value (\(Z \geq 0\)).
Others¶
Logging¶
All logging messages are output through the Python logging library and are named fab or its sub-namespace such as fab.hme. Four log-levels are used in the FAB-engine: ERROR, WARN, INFO and DEBUG.
Errors occurred in the FAB-engine are handled as Python standard exception objects with error messages. Applications using the FAB-engine should be able to handle the exception object properly to display the error information to users. Errors in C++ modules in the FAB-engine are turned into Python standard exception object inside the engine.
Random Seeds¶
FAB learning processes theoretically depend on initial status (variational
posterior distribution and model parameters such as weights, bias, etc.).
Since the FAB-engine uses numpy.random
library to generate random values
for them in initialization of learning processes, users can specify a random
seed by numpy.random.seed(SEED_VALUE).