FABHMELogitGateBSplineCl Component Specification

Overview

FABHMELogitGateBSplineCl component is a B-spline non-linear binary classification component with FAB/HME algorithm. This component learns a tree-structured model in which each sample is assigned to a component according to Logistic gating functions.

Note

FAB engine uses the word ‘component’ with a different meaning from that of SAMPO. Each component in FAB/HME is a prediction formula, and each sample data is assigned to a specific component for prediction.

Example:

  • SPD:

    # fabhmecl.spd
    dl1 -> fab1
    
    ---
    components:
        dl1:
            component: DataLoader
        fab1:
            component: FABHMELogitGateBSplineClComponent
            features: name != 'class'
            tree_depth: 3
            target: name == 'class'
            positive_label: 'Iris-setosa'
    
    global_settings:
        keep_attributes:
            - class
        feature_exclude:
            - class
    
  • Input of the component:

_sid

sepal_length_ in_cm

sepal_width_ in_cm

class

0

4.9

2.5

Iris-versicolor

1

6.2

2.8

Iris-versicolor

2

7.2

3.6

Iris-versicolor

28

6.2

2.9

Iris-setosa

29

6.7

3.1

Iris-setosa


  • Output of the component:

_sid

fab1_ actual

fab1_ predict

fab1_score

fab1_ assigned_comp_id

0

-1

1

2.657069e+00

2

1

-1

1

6.524541e-01

2

2

-1

-1

-1.600153e+00

0

28

1

1

6.524541e-01

2

29

1

-1

-1.080094e+00

0


_sid

fab1_ predict_c0

fab1_ score_c0

fab1_ predict_c1

fab1_ score_c1

fab1_ predict_c2

fab1_ score_c2

0

1

7.921206e-01

-1

-1.028756e+00

1

2.657069e+00

1

-1

-5.600341e-01

-1

-2.346818e+01

1

6.524541e-01

2

-1

-1.600153e+00

-1

-1.082974e+01

-1

-8.895575e-01

28

-1

-5.600341e-01

-1

-1.821556e+01

1

6.524541e-01

29

-1

-1.080094e+00

-1

-2.240158e+01

-1

-1.185517e-01

This component has component-specific external formats for model and prediction result evaluation.

See also

Component-common external format files in convert_process


Parameters

This component has the following component-specific parameters.

SPD

The following parameters are for “components” section of SPD.

Parameter Name

Type

Domain

Default Value

Description

positive_label 1

str

See Description

A value chosen from the target attributes to be set as positive label. The domain of this parameter corresponds to that of the target attribute.

max_fab_iterations

int

[1, inf)

100

Maximum number of FAB-iterations.

start_from_mstep 2 3

bool

True / False

False

If True, the first iteration starts with M-step; otherwise, E-step.

num_acceleration_steps

int

[0, inf)

0

The number of steps of acceleration algorithm for each FAB-iteration. If 0, the acceleration algorithm is disabled.

repeat_until_convergence

bool

True / False

False

If False, FAB-iterations and the post-processing are executed only once even if the FAB-iterations are stopped not by convergence condition but by max_fab_iterations condition.

projection_estep

bool

True / False

False

Whether the projection E-step algorithm is enabled.

shrink_threshold

float or str

[1, inf) or (0%, 100%)

1.0

Threshold value for shrinkage. If a percentage value (e.g. '1.0%') is specified, shrinkage is executed according to relative value, \(N_{\rm scaled\_sample} \times t_{\rm shrink}\) where \(t_{\rm shrink}\) is the threshold value and \(N_{\rm scaled\_sample}\) is the number of scaled expected samples.

fab_stop_threshold

float or str

(0, inf) or (0%, inf%)

0.001

Threshold value for FAB-iterations: if the increase of FIC value is less than the threshold, the FAB-iterations is considered to be converged. If a percentage value (e.g. '1.0%') is specified, convergence check is executed according to relative value, \((FIC^{(t)} - FIC^{(t-1)}) / | FIC^{(t-1)} |\).

gate_features

str

Query format

all()

Features which are applied to gate parameter optimizations. If not specified, all features are used.

comp_features

str

Query format

all()

Features which are applied to component parameter optimizations. If not specified, all features are used. If empty, the model is learned as a decision tree.

comp_mandatory_features

str

Query format

See Description

Features which non-L0-regularize constraints are applied to. It means the specified features will always be relevant for all components. If not specified, no features are specified for non-L0-regularization, which implies all relevant features are selected by FoBa algorithm.

tree_depth 2 3

int

[0, inf)

5

Initial depth of the gate-tree structure of latent variable prior. The initial number of components is \(2^d\) where \(d\) is tree depth. If 0, the optimization with only one component will be executed.

comp_bspline_degree 3

int

[0, inf)

3

Degree of B-spline function.

comp_bspline_basis_dim 3

int

[4, inf)

10

The number of B-spline basis functions to be generated for each feature.

comp_weights_min_scale 2 3

float

(-inf, inf)

-0.5

Scale value for the initialization of weight values of components.

comp_weights_max_scale 2 3

float

(-inf, inf)

0.5

Scale value for the initialization of weight values of components.

comp_bias_min_scale 2 3

float

(-inf, inf)

0.25

Scale value for the initialization of bias values of components.

comp_bias_max_scale 2 3

float

(-inf, inf)

0.75

Scale value for the initialization of bias values of components.

gate_l2_regularize

float

[0, inf)

0.0

L2-regularization hyper-parameter for gate-parameter optimization. The larger the specified value, the stronger the regularization effect is. If 0.0, L2-regularization is disabled.

with_gate_scaled_l0_regularize

bool

True / False

True

Whether with scaled L0-regularization using a tighter lower bound of FIC for gate parameter optimization; approximation of det(F) is refined, where F is a Fisher matrix.

max_gate_relevant_features

int

[1, inf)

3

Maximum number of the relevant features for each gate.

comp_l2_regularize

float

[0, inf)

0.0

L2-regularization hyper-parameter for component parameter optimization. The larger the specified value, the stronger the regularization effect is. If 0.0, L2-regularization is disabled.

comp_pspline

float

[0, inf)

1.0

L2-regularization coefficient value for penalized B-spline function (P-spline).

with_comp_scaled_l0_regularize

bool

True / False

True

Whether with scaled L0-regularization using a tighter lower bound of FIC for component parameter optimization; approximation of det(F) is refined, where F is a Fisher matrix.

max_comp_relevant_features

int

[1, inf)

100

Maximum number of the relevant features for each component.

num_threads_gates

int

[1, inf)

1

Maximum number of OpenMP threads of gate parameter optimization where tasks for all gates are divided into.

num_threads_comps

int

[1, inf)

1

Maximum number of OpenMP threads of component parameter optimization.

1

Required parameter

2(1,2,3,4,5,6)

Ignore parameter in posterior hot-start

3(1,2,3,4,5,6,7,8)

Ignore parameter in model hot-start

SRC

The following parameter is for “hotstart” section of SRC.

Parameter Name

Type

Domain

Default Value

Description

type

str

{‘posterior’, ‘mh_refit_comp’, ‘mh_opt_comp’, ‘mh_refit_gate_and_refit_comp’, ‘mh_refit_gate_and_opt_comp’, ‘mh_opt_gate_and_opt_comp’}

The hot-start type. If ‘posterior’, FAB learns with posterior hot-start which use the initial model whose tree structure is generated by base model and data. Each gate and component parameters are initialized randomly. ‘mh_XXX’ means FAB learns with model hot-start which uses base model as initial model. ‘refit_{gate, comp}’ means refitting the gate functions or prediction formulas with current data. ‘opt_{gate, comp}’ means optimizing (feature selection and fitting) the gate functions or prediction formulas with current data.


Utilizable Sample Metadata

Warning

_fabhme_assigned_comp_id is deprecated. Use hotstart section of SRC instead of _fabhme_assigned_comp_id data column.

This component can utilize the _fabhme_assigned_comp_id attribute of the sample metadata to hot-start with posterior. When the attribute _fabhme_assigned_comp_id attribute is specified in the input data, this component will start the FAB/HME algorithm with the _fabhme_assigned_comp_id attribute values as its initial posterior.

To create the attribute _fabhme_assigned_comp_id, see the specification of the command sampo_ps_fabhme export_assigned_comp_id.


Output Attributes

This component generates the following attributes.

Attribute Name

Scale

Description

<component_id>_actual

INTEGER

Values of target attribute.

<component_id>_predict

INTEGER

Predicted values.

<component_id>_score

REAL

A prediction result is obtained by classifying this value according to a boundary.

<component_id>_assigned_comp_id

INTEGER

Component IDs formula assigned by gating functions.

<component_id>_predict_c<hme_comp_id>

INTEGER

Predicted values for the prediction formula of component id, <hme_comp_id>.

<component_id>_score_c<hme_comp_id>

REAL

Score values for the prediction formula of component id, <hme_comp_id>.

<component_id>_basisfunc_<feature_attr_name>:<basis_func_index>

REAL

Basis function values.

These attributes are in the component output data. These can be loaded in SAMPO API.

See also

Obtaining process results via ProcessResultLoader.

When convert_process is executed, the component output data will be saved in two separate files:

  1. All non-basis function value attributes will be saved as <component_id>_predict_result.csv.

    This file describes the prediction result of the component.

    _sid,fab1_actual,fab1_predict,fab1_score,fab1_assigned_comp_id,fab1_predict_c0,fab1_score_c0,fab1_predict_c1,fab1_score_c1,fab1_predict_c2,fab1_score_c2
    0,-1,1,2.657069e+00,2,1,7.921206e-01,-1,-1.028756e+00,1,2.657069e+00
    1,-1,1,6.524541e-01,2,-1,-5.600341e-01,-1,-2.346818e+01,1,6.524541e-01
    2,-1,-1,-1.600153e+00,0,-1,-1.600153e+00,-1,-1.082974e+01,-1,-8.895575e-01
    ...
    28,1,1,6.524541e-01,2,-1,-5.600341e-01,-1,-1.821556e+01,1,6.524541e-01
    29,1,-1,-1.080094e+00,0,-1,-1.080094e+00,-1,-2.240158e+01,-1,-1.185517e-01
    
  2. Basis function value attributes will be saved as basis_func_values.csv.

    This file describes the basis function values of B-spline functions.

    _sid,fab1_basisfunc_std1_CRIM:0,fab1_basisfunc_std1_CRIM:1,fab1_basisfunc_std1_CRIM:2,fab1_basisfunc_std1_CRIM:3,fab1_basisfunc_std1_CRIM:4,fab1_basisfunc_std1_CRIM:5,fab1_basisfunc_std1_CRIM:6,fab1_basisfunc_std1_CRIM:7,fab1_basisfunc_std1_CRIM:8,fab1_basisfunc_std1_CRIM:9,fab1_basisfunc_std1_ZN:0,fab1_basisfunc_std1_ZN:1,fab1_basisfunc_std1_ZN:2,fab1_basisfunc_std1_ZN:3,fab1_basisfunc_std1_ZN:4,fab1_basisfunc_std1_ZN:5,fab1_basisfunc_std1_ZN:6,fab1_basisfunc_std1_ZN:7,fab1_basisfunc_std1_ZN:8,fab1_basisfunc_std1_ZN:9,fab1_basisfunc_std1_NOX:0,fab1_basisfunc_std1_NOX:1,fab1_basisfunc_std1_NOX:2,fab1_basisfunc_std1_NOX:3,fab1_basisfunc_std1_NOX:4,fab1_basisfunc_std1_NOX:5,fab1_basisfunc_std1_NOX:6,fab1_basisfunc_std1_NOX:7,fab1_basisfunc_std1_NOX:8,fab1_basisfunc_std1_NOX:9,fab1_basisfunc_bin1(0)_CHAS:0,fab1_basisfunc_bin1(0)_CHAS:1,fab1_basisfunc_bin1(0)_CHAS:2,fab1_basisfunc_bin1(0)_CHAS:3,fab1_basisfunc_bin1(0)_CHAS:4,fab1_basisfunc_bin1(0)_CHAS:5,fab1_basisfunc_bin1(0)_CHAS:6,fab1_basisfunc_bin1(0)_CHAS:7,fab1_basisfunc_bin1(0)_CHAS:8,fab1_basisfunc_bin1(0)_CHAS:9,fab1_basisfunc_bin1(1)_RAD:0,fab1_basisfunc_bin1(1)_RAD:1,fab1_basisfunc_bin1(1)_RAD:2,fab1_basisfunc_bin1(1)_RAD:3,fab1_basisfunc_bin1(1)_RAD:4,fab1_basisfunc_bin1(1)_RAD:5,fab1_basisfunc_bin1(1)_RAD:6,fab1_basisfunc_bin1(1)_RAD:7,fab1_basisfunc_bin1(1)_RAD:8,fab1_basisfunc_bin1(1)_RAD:9,fab1_basisfunc_std1_LSTAT:0,fab1_basisfunc_std1_LSTAT:1,fab1_basisfunc_std1_LSTAT:2,fab1_basisfunc_std1_LSTAT:3,fab1_basisfunc_std1_LSTAT:4,fab1_basisfunc_std1_LSTAT:5,fab1_basisfunc_std1_LSTAT:6,fab1_basisfunc_std1_LSTAT:7,fab1_basisfunc_std1_LSTAT:8,fab1_basisfunc_std1_LSTAT:9,fab1_basisfunc_std1_TAX:0,fab1_basisfunc_std1_TAX:1,fab1_basisfunc_std1_TAX:2,fab1_basisfunc_std1_TAX:3,fab1_basisfunc_std1_TAX:4,fab1_basisfunc_std1_TAX:5,fab1_basisfunc_std1_TAX:6,fab1_basisfunc_std1_TAX:7,fab1_basisfunc_std1_TAX:8,fab1_basisfunc_std1_TAX:9,fab1_basisfunc_bin1(3)_RAD:0,fab1_basisfunc_bin1(3)_RAD:1,fab1_basisfunc_bin1(3)_RAD:2,fab1_basisfunc_bin1(3)_RAD:3,fab1_basisfunc_bin1(3)_RAD:4,fab1_basisfunc_bin1(3)_RAD:5,fab1_basisfunc_bin1(3)_RAD:6,fab1_basisfunc_bin1(3)_RAD:7,fab1_basisfunc_bin1(3)_RAD:8,fab1_basisfunc_bin1(3)_RAD:9,fab1_basisfunc_std1_DIS:0,fab1_basisfunc_std1_DIS:1,fab1_basisfunc_std1_DIS:2,fab1_basisfunc_std1_DIS:3,fab1_basisfunc_std1_DIS:4,fab1_basisfunc_std1_DIS:5,fab1_basisfunc_std1_DIS:6,fab1_basisfunc_std1_DIS:7,fab1_basisfunc_std1_DIS:8,fab1_basisfunc_std1_DIS:9,fab1_basisfunc_std1_PTRATIO:0,fab1_basisfunc_std1_PTRATIO:1,fab1_basisfunc_std1_PTRATIO:2,fab1_basisfunc_std1_PTRATIO:3,fab1_basisfunc_std1_PTRATIO:4,fab1_basisfunc_std1_PTRATIO:5,fab1_basisfunc_std1_PTRATIO:6,fab1_basisfunc_std1_PTRATIO:7,fab1_basisfunc_std1_PTRATIO:8,fab1_basisfunc_std1_PTRATIO:9,fab1_basisfunc_std1_B:0,fab1_basisfunc_std1_B:1,fab1_basisfunc_std1_B:2,fab1_basisfunc_std1_B:3,fab1_basisfunc_std1_B:4,fab1_basisfunc_std1_B:5,fab1_basisfunc_std1_B:6,fab1_basisfunc_std1_B:7,fab1_basisfunc_std1_B:8,fab1_basisfunc_std1_B:9,fab1_basisfunc_std1_INDUS:0,fab1_basisfunc_std1_INDUS:1,fab1_basisfunc_std1_INDUS:2,fab1_basisfunc_std1_INDUS:3,fab1_basisfunc_std1_INDUS:4,fab1_basisfunc_std1_INDUS:5,fab1_basisfunc_std1_INDUS:6,fab1_basisfunc_std1_INDUS:7,fab1_basisfunc_std1_INDUS:8,fab1_basisfunc_std1_INDUS:9,fab1_basisfunc_std1_RM:0,fab1_basisfunc_std1_RM:1,fab1_basisfunc_std1_RM:2,fab1_basisfunc_std1_RM:3,fab1_basisfunc_std1_RM:4,fab1_basisfunc_std1_RM:5,fab1_basisfunc_std1_RM:6,fab1_basisfunc_std1_RM:7,fab1_basisfunc_std1_RM:8,fab1_basisfunc_std1_RM:9,fab1_basisfunc_std1_AGE:0,fab1_basisfunc_std1_AGE:1,fab1_basisfunc_std1_AGE:2,fab1_basisfunc_std1_AGE:3,fab1_basisfunc_std1_AGE:4,fab1_basisfunc_std1_AGE:5,fab1_basisfunc_std1_AGE:6,fab1_basisfunc_std1_AGE:7,fab1_basisfunc_std1_AGE:8,fab1_basisfunc_std1_AGE:9
    0,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.197819e-01,0.000000e+00,4.056000e-01,5.621333e-01,3.226667e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,2.848299e-01,3.540404e-01,5.969046e-01,4.905502e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-1.287909e+00,0.000000e+00,0.000000e+00,3.657979e-01,5.893919e-01,4.481024e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-1.442174e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,3.173806e-01,6.185440e-01,6.407538e-02,0.000000e+00,0.000000e+00,4.136719e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.810398e-01,4.160185e-01,2.941641e-03,0.000000e+00,-1.200134e-01,0.000000e+00,0.000000e+00,5.681834e-01,4.278832e-01,3.933446e-03,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,1.402136e-01,0.000000e+00,2.974283e-01,6.290620e-01,7.350970e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-6.666082e-01,0.000000e+00,0.000000e+00,4.828731e-01,5.023389e-01,1.478799e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-1.459000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,4.410519e-01,2.741603e-01,6.400531e-01,8.578652e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-1.075562e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00
    1,6.654090e-01,3.345904e-01,5.937007e-07,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.173393e-01,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.877224e-01,0.000000e+00,1.878266e-01,6.654025e-01,1.467709e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-5.933810e-01,0.000000e+00,4.359346e-01,5.396535e-01,2.441193e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-7.402622e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,4.352526e-01,5.401738e-01,2.457365e-02,0.000000e+00,0.000000e+00,1.942745e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.036805e-01,4.849149e-01,1.140454e-02,3.671664e-01,0.000000e+00,0.000000e+00,2.433332e-01,6.522031e-01,1.044637e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.571599e-01,2.243847e-01,6.581007e-01,1.175145e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-9.873295e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,4.131583e-01,5.566621e-01,3.017957e-02,0.000000e+00,0.000000e+00,-3.030941e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,4.410519e-01,0.000000e+00,3.101911e-01,6.224435e-01,6.736547e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.924394e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
    2,6.654102e-01,3.345892e-01,5.925699e-07,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.173416e-01,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.877224e-01,0.000000e+00,1.878266e-01,6.654025e-01,1.467709e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-5.933810e-01,0.000000e+00,4.359346e-01,5.396535e-01,2.441193e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-7.402622e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,3.479622e-01,6.006842e-01,5.135363e-02,0.000000e+00,1.282714e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,2.419814e-01,6.526661e-01,1.053525e-01,0.000000e+00,0.000000e+00,-2.658118e-01,0.000000e+00,0.000000e+00,2.433332e-01,6.522031e-01,1.044637e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.571599e-01,2.243847e-01,6.581007e-01,1.175145e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-9.873295e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,4.131583e-01,5.566621e-01,3.017957e-02,0.000000e+00,0.000000e+00,-3.030941e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,1.951574e-01,5.942084e-01,3.964270e-01,3.711468e-01,5.858889e-01,4.296433e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-1.208727e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
    3,6.651060e-01,3.348931e-01,9.144462e-07,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.167504e-01,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.877224e-01,3.728038e-01,5.847932e-01,4.240303e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-1.306878e+00,0.000000e+00,5.390128e-01,4.542103e-01,6.776858e-03,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-8.352838e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,4.997154e-01,4.882744e-01,1.201021e-02,0.000000e+00,1.016303e+00,0.000000e+00,0.000000e+00,0.000000e+00,3.579481e-01,5.944367e-01,4.761513e-02,0.000000e+00,0.000000e+00,0.000000e+00,-8.098885e-01,0.000000e+00,0.000000e+00,0.000000e+00,3.321228e-01,6.101833e-01,5.769394e-02,0.000000e+00,0.000000e+00,0.000000e+00,1.077737e+00,3.580211e-01,5.943904e-01,4.758852e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-1.106115e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.451185e-01,4.487702e-01,6.111363e-03,0.000000e+00,1.130321e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,1.822800e-01,5.900916e-01,4.161628e-01,5.004857e-01,4.876232e-01,1.189113e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-1.361517e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
    
    ...
    
    504,6.604905e-01,3.394952e-01,1.437113e-05,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.077641e-01,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.877224e-01,0.000000e+00,0.000000e+00,0.000000e+00,4.462810e-01,5.316804e-01,2.203857e-02,0.000000e+00,0.000000e+00,0.000000e+00,1.157384e-01,0.000000e+00,0.000000e+00,0.000000e+00,6.050596e-01,3.934473e-01,1.493110e-03,0.000000e+00,0.000000e+00,0.000000e+00,1.581241e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,1.817470e-01,6.660136e-01,1.522394e-01,0.000000e+00,0.000000e+00,7.256721e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.900481e-01,4.064453e-01,7.369964e-01,1.958019e-01,6.643210e-01,1.398771e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-6.684368e-01,0.000000e+00,4.743410e-01,5.093332e-01,1.632578e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-8.032117e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.710729e-01,4.233816e-01,1.176466e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,1.906723e-01,5.929144e-01,4.032249e-01,0.000000e+00,6.346830e-01,3.649239e-01,3.930952e-04,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-8.653016e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00
    505,6.642058e-01,3.357919e-01,2.275176e-06,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.150002e-01,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-4.877224e-01,0.000000e+00,0.000000e+00,0.000000e+00,4.462810e-01,5.316804e-01,2.203857e-02,0.000000e+00,0.000000e+00,0.000000e+00,1.157384e-01,0.000000e+00,0.000000e+00,0.000000e+00,6.050596e-01,3.934473e-01,1.493110e-03,0.000000e+00,0.000000e+00,0.000000e+00,1.581241e-01,0.000000e+00,0.000000e+00,0.000000e+00,2.461861e-01,6.512057e-01,1.026082e-01,0.000000e+00,0.000000e+00,0.000000e+00,-3.627671e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,4.170544e-01,5.538074e-01,2.913818e-02,4.347315e-01,0.000000e+00,6.662848e-01,3.337151e-01,5.470025e-08,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-6.132465e-01,0.000000e+00,4.743410e-01,5.093332e-01,1.632578e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-8.032117e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.710729e-01,4.233816e-01,1.176466e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,4.410519e-01,0.000000e+00,4.495709e-01,5.291142e-01,2.131485e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,-6.690583e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00,6.666667e-01,3.333333e-01,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.000000e-01,1.000000e+00
    

Attribute Metadata

The metadata of the output attributes is created with the following rules.

Context Rule

Attribute Name

Context Name

Description

All the output attributes of this component

field_path

List of the superordinate concepts of each output attribute based on the following hierarchical structure of the output attributes:

root
├── fabhmecl
│   ├── assigned_comp_id
│   └── component
│       ├── 0
│       │   ├── predict
│       │   └── score
│       ├── 1
│       │   ├── predict
│       │   └── score
│        .
│        .
│        .
│
└── binary_classification
    ├── actual
    ├── predict
    └── score

<component_id>_actual, <component_id>_predict, <component_id>_predict_c<hme_comp_id>

positive_map

Mapping between a positive value and a positive label.

<component_id>_actual, <component_id>_predict, <component_id>_predict_c<hme_comp_id>

negative_map

Mapping between a negative value and a negative label.

<component_id>_assigned_comp_id

active_comp_ids

List of component IDs corresponding to each prediction formula.

Derivation Rule

Attribute Name

Derived From

<component_id>_actual

Derived from the target attribute.

<component_id>_predict

Derived from the attributes which have non-zero coefficients in any prediction formula.

<component_id>_score

Derived from the attributes which have non-zero coefficients in any prediction formula.

<component_id>_assigned_comp_id

Derived from the attributes used in the gating functions.

<component_id>_predict_c<hme_comp_id>

Derived from the attributes which have non-zero coefficients in the prediction formula of component id, <hme_comp_id>.

<component_id>_score_c<hme_comp_id>

Derived from the attributes which have non-zero coefficients in the prediction formula of component id, <hme_comp_id>.

<component_id>_basisfunc_<feature_attr_name>:<basis_function_index>

Derived from the attribute of the name of <feature_attr_name>.

Example

{
    "nodes": [
        {
            "aid": "fab1[15]",
            "name": "fab1_basisfunc_sepal_width_in_cm:9",
            "scale": "real",
            "is_excluded": false,
            "cid": "fab1",
            "cindex": 15,
            "values": null,
            "is_kept": false,
            "context": null
        },
        {
            "aid": "_sid",
            "name": "_sid",
            "scale": "integer",
            "is_excluded": false,
            "cid": null,
            "cindex": 0,
            "values": null,
            "is_kept": false,
            "context": null
        },

        ...

    ],
    "links": [
        [
            "dl1[1]",
            "fab1[14]"
        ],
        [
            "dl1[1]",
            "fab1[5]"
        ],

        ...

    ]
}

See also

Attribute metadata file format in Attribute Metadata File Specification


Model

The model of this component can be described by the following parameters.

Model Parameter

Type

Domain

Description

fic

float

(-inf, inf)

Factorized Information Criterion. The asymptotic approximation value used by FAB/HME.

num_initial_comps

int

[0, inf)

The initial number of components before iterations.

num_active_comps

int

[0, inf)

The terminal number of active components after iterations.

gate_tree

dict

See Description

Dictionary form of the gating tree structure.

prediction_formulas

pandas.DataFrame

See Description

Component weights and bias for each prediction formula.

bspline_params

pandas.DataFrame

See Description

Degree and basis dimensionality of the B-spline function.

bspline_knot_vecs

pandas.DataFrame

See Description

Knot vectors for all features for all knots in the B-spline function.

The gate_tree dictionary keys are described below:

Gate Tree Dictionary Key

Type

Domain

Description

gate_type

str

‘logit’

The type of gate.

hard_gate

bool

true / false

Whether the gate is hard_gate or not.

nodes

list of dict

See Description

List of node dictionaries.

edges

list of dict

See Description

List of edge dictionaries.

The keys of each node dictionary in nodes are described below:

Node Dictionary Key

Type

Domain

Description

node_id

int

[0, inf)

The node ID.

node_type

str

{‘gate’, ‘component’}

The node type.

gate_func

dict

See Description

The gate_func dictionary contains the gate function parameters for the logit gate. Specifiable if node_type is “gate”.

comp_id

int

[0, inf)

The component ID. Specifiable if node_type is “component”.

The keys of each edge dictionary in edges are described below:

Edge Dictionary Key

Type

Domain

Description

source

int

[0, inf)

The node_id of the source node.

target

int

[0, inf)

The node_id of the target node.

is_left

bool

true / false

Whether the target node is the left-child of the source.

The keys of the gate_func dictionary are described below:

Gate Function Dictionary Key

Type

Domain

Description

bias

float

(-inf, inf)

The gate function bias.

weights

list of dict

See Description

Lists weights dictionaries mapping each attribute to its corresponding weight.

When the model is loaded in the SAMPO API, the model parameters will be output as a single dictionary.

See also

Obtaining process results via ProcessResultLoader

{'fic': -23.832958802449035,
 'num_initial_comps': 32,
 'num_active_comps': 2,
 'gate_tree':
     {'gate_type': 'logit',
      'hard_gate': True,
      'nodes': [
          {'comp_id': 20, 'node_type': 'component', 'node_id': 1},
          {'node_type': 'gate',
           'node_id': 0,
           'gate_func':
               {'bias': -14.594158450398055,
                'weights': [
                    {'aid': 'dl[0]', 'attr_name': 'sepal_length_in_cm', 'weight': 10.426327199487217},
                    {'aid': 'dl[1]', 'attr_name': 'petal_length_in_cm', 'weight': -13.460106074504926}]}},
          {'comp_id': 30, 'node_type': 'component', 'node_id': 2}],
      'edges': [
          {'source': 0, 'target': 1, 'is_left': True},
          {'source': 0, 'target': 2, 'is_left': False}]}},
 'prediction_formulas':
                                              prediction_formula_20  prediction_formula_30
     attr_name          basis_function_index
     sepal_length_in_cm 0                                         0                      0
                        1                                         0                      0
                        2                                         0                      0
                        3                                         0                      0
                        4                                         0                      0
                        5                                         0                      0
                        6                                         0                      0
                        7                                         0                      0
                        8                                         0                      0
                        9                                         0                      0
     petal_length_in_cm 0                                         0                      0
                        1                                         0                      0
                        2                                         0                      0
                        3                                         0                      0
                        4                                         0                      0
                        5                                         0                      0
                        6                                         0                      0
                        7                                         0                      0
                        8                                         0                      0
                        9                                         0                      0
                        bias                                     -1                      1,
 'bspline_params':    degree  basis_dim
     0       3         10,
 'bspline_knot_vecs':
                         knot_value_0  knot_value_1  knot_value_2  knot_value_3  knot_value_4  knot_value_5  knot_value_6  knot_value_7  knot_value_8  knot_value_9  knot_value_10  knot_value_11  knot_value_12
     attr_name
     sepal_length_in_cm        3.9625        3.9625           4.3        4.6375         4.975        5.3125          5.65        5.9875         6.325        6.6625            7.0         7.3375         7.3375
     petal_length_in_cm        1.7000        1.7000           2.0        2.3000         2.600        2.9000          3.20        3.5000         3.800        4.1000            4.4         4.7000         4.7000
}

External Format

When convert_process is executed, the model parameters are saved into different files and are grouped as: general information, gating function, prediction formula, B-spline parameters, and B-spline knot vectors.

General Information

This file describes \(FIC\) after learning the model, initial number of components, and the terminal number of components.

fic,num_initial_comps,num_active_comps
-1.294308e+02,8,3

Gate Tree

This file describes the structure and parameters of the gate-tree of the model.

{
    "gate_tree": {
        "gate_type": "logit",
        "hard_gate": true,
        "nodes": [
            {
                "node_id": 11,
                "node_type": "gate",
                "gate_func": {
                    "weights": [
                        {
                            "aid": "std1[4]",
                            "attr_name": "std1_RM",
                            "weight": -3.6682658685673992e+00
                        },
                        {
                            "aid": "std1[7]",
                            "attr_name": "std1_TAX",
                            "weight": -5.8122016705226542e+00
                        },
                        {
                            "aid": "std1[10]",
                            "attr_name": "std1_LSTAT",
                            "weight": 1.0537643144910271e+01
                        }
                    ],
                    "bias": 1.2740926133353371e+01
                }
            },
            {
                "node_id": 10,
                "node_type": "gate",
                "gate_func": {
                    "weights": [
                        {
                            "aid": "std1[2]",
                            "attr_name": "std1_INDUS",
                            "weight": 7.6493213521271874e-01
                        },
                        {
                            "aid": "std1[4]",
                            "attr_name": "std1_RM",
                            "weight": 2.5021103534594329e+00
                        },
                        {
                            "aid": "std1[10]",
                            "attr_name": "std1_LSTAT",
                            "weight": -2.8529074313420657e+00
                        }
                    ],
                    "bias": -2.3621841358789547e-01
                }
            },

            ...

            {
                "node_id": 13,
                "node_type": "component",
                "comp_id": 13
            },
            {
                "node_id": 12,
                "node_type": "component",
                "comp_id": 12
            },

            ...

        ],
        "edges": [
            {
                "source": 11,
                "target": 13,
                "is_left": false
            },
            {
                "source": 11,
                "target": 12,
                "is_left": true
            },

            ...

        ]
    }
}

Prediction Formulas

This file describes parameters of prediction formulas: weights and bias values.

aid,attr_name,basis_function_index,prediction_formula_0
dl1[0],sepal_length_in_cm,0,2.3589409814056678e-01
dl1[0],sepal_length_in_cm,1,3.2958604508257561e-01
dl1[0],sepal_length_in_cm,2,6.3181400239767593e-02

...

,bias,,5.4161895819219632e+00

B-spline Parameters

This file describes parameters of B-spline type prediction formulas: degree and the number of basis function for each feature.

degree,basis_dim
3,10

B-spline Knot Vectors

This file describes knot vectors of B-spline’s prediction formula.

aid,attr_name,knot_value_0,knot_value_1,knot_value_2,knot_value_3,knot_value_4,knot_value_5,knot_value_6,knot_value_7,knot_value_8,knot_value_9,knot_value_10,knot_value_11,knot_value_12
std1[0],std1_CRIM,-1.2341486543554931e+00,-1.2341486543554931e+00,-8.2810505758699771e-01,-4.2206146081850215e-01,-1.6017864050006603e-02,3.9002573271848906e-01,7.9606932948698450e-01,1.2021129262554799e+00,1.6081565230239758e+00,2.0142001197924713e+00,2.4202437165609667e+00,2.8262873133294621e+00,2.8262873133294621e+00
std1[1],std1_ZN,-1.3831265020520607e+00,-1.3831265020520607e+00,-1.0478983552351340e+00,-7.1267020841820727e-01,-3.7744206160128058e-01,-4.2213914784353879e-02,2.9301423203257282e-01,6.2824237884949952e-01,9.6347052566642621e-01,1.2986986724833531e+00,1.6339268193002798e+00,1.9691549661172065e+00,1.9691549661172065e+00
std1[2],std1_INDUS,-1.9600846596365655e+00,-1.9600846596365655e+00,-1.6593508630904183e+00,-1.3586170665442712e+00,-1.0578832699981242e+00,-7.5714947345197714e-01,-4.5641567690583007e-01,-1.5568188035968311e-01,1.4505191618646407e-01,4.4578571273261125e-01,7.4651950927875799e-01,1.0472533058249049e+00,1.0472533058249049e+00

...

Prediction Result Evaluation

The indices used in evaluating prediction results of this component are described below.

Evaluation Index

Type

Description

true_positive

int

Number of samples determined as positive correctly (TP).

false_positive

int

Number of samples determined as positive incorrectly (FP).

true_negative

int

Number of samples determined as negative correctly (TN).

false_negative

int

Number of samples determined as negative incorrectly (FN).

accuracy

float

Proportion of true results in the population as shown below:

\(\frac{\mbox{TP} + \mbox{TN}}{\mbox{TP} + \mbox{FP} + \mbox{TN} + \mbox{FN}}\)

classification_error

float

Proportion of false results in the population as shown below:

\(\frac{\mbox{FP} + \mbox{FN}}{\mbox{TP} + \mbox{FP} + \mbox{TN} + \mbox{FN}} = 1 - \mbox{accuracy}\)

precision

float

Proportion of the true_positive against all samples determined as positive as shown below:

\(\frac{\mbox{TP}}{\mbox{TP} + \mbox{FP}}\)

recall

float

Proportion of the true_positive against all the actual positive samples as shown below:

\(\frac{\mbox{TP}}{\mbox{TP} + \mbox{FN}}\)

specificity

float

Proportion of the true_negative against all the actual negative samples as shown below:

\(\frac{\mbox{TN}}{\mbox{TN} + \mbox{FP}}\)

false_positive_rate

float

Proportion of the false_positive against all the actual negative samples as shown below:

\(\frac{\mbox{FP}}{\mbox{TN} + \mbox{FP}} = 1 - \mbox{specificity}\)

false_negative_rate

float

Proportion of the false_negative against all the actual positive samples as shown below:

\(\frac{\mbox{FN}}{\mbox{TP} + \mbox{FN}} = 1 - \mbox{recall}\)

f_measure

float

Harmonic mean of precision and recall as shown below:

\(\frac{2 \times \mbox{precision} \times \mbox{recall}}{\mbox{precision} + \mbox{recall}}\)

auc

float

Area under ROC (Receiver Operating Characteristic) curve.

area_under_precision_recall

float

Area under PR (Precision-Recall) curve.

When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded with the evaluation indices as the columns of the DataFrame.

See also

Obtaining process results via ProcessResultLoader

External Format

When convert_process is executed, the evaluation results are saved as a CSV file with the evaluation indices as the header of the CSV.

This file describes the evaluation of the prediction result of the component.

true_positive,false_positive,true_negative,false_negative,accuracy,classification_error,precision,recall,specificity,false_positive_rate,false_negative_rate,f_measure,auc,area_under_precision_recall
6,3,14,7,6.666667e-01,3.333333e-01,6.666667e-01,4.615385e-01,8.235294e-01,1.764706e-01,5.384615e-01,5.454545e-01,6.696833e-01,5.715832e-01

Details

If a data set has samples with missing or +/-Inf values, this component ignores those samples.