FABHMEBernGateLinearRg Component Specification

Overview

FABHMEBernGateLinearRg component is a linear regression component with FAB/HME algorithm. This component learns a tree-structured model in which each sample is assigned to a component according to Bernoulli gating functions.

Note

FAB engine uses the word ‘component’ with a different meaning from that of SAMPO. Each component in FAB/HME is a prediction formula, and each sample data is assigned to a specific component for prediction.

Example:

  • SPD:

    # fabhmerg.spd
    dl1 -> std1 -> fab1
    
    ---
    
    components:
        dl1:
            component: DataLoader
        std1:
            component: StandardizeFDComponent
            features: scale == 'real' or scale == 'integer'
        fab1:
            component: FABHMEBernGateLinearRgComponent
            features: name != 'Concrete_compressive_strength_MPa'
            standardize_target: True
            target: name == 'Concrete_compressive_strength_MPa'
            tree_depth: 5
            shrink_threshold: 1.0%
    
    global_settings:
        keep_attributes:
            - Concrete_compressive_strength_MPa
        feature_exclude:
            - Concrete_compressive_strength_MPa
    
  • Input of the component:

_sid

std1_Superplasticizer_ kg_in_a_m3_mixture

std1_Coarse_Aggregate_ kg_in_a_m3_mixture

Concrete_compressive_ strength_MPa

0

0.729738484

0.705292074

9

1

1.413868312

1.434904563

9

2

-1.387806224

-1.321409287

4

8

-0.019546567

0.016213611

8

9

-0.736254006

-0.835000961

6


  • Output of the component:

_sid

fab1_ actual

fab1_ std_actual

fab1_ predict

fab1_ std_predict

fab1_ assigned_comp_id

0

9

-4.686873e-01

1.193986e+01

3.613565e-01

2

1

9

-4.686873e-01

1.487464e+01

1.189968e+00

2

2

4

-1.880396e+00

5.744445e+00

-1.387866e+00

0

8

8

-7.510290e-01

9.614527e+00

-2.951807e-01

0

9

6

-1.315712e+00

7.587341e+00

-8.675398e-01

0


_sid

fab1_ predict_c0

fab1_ std_predict_c0

fab1_ predict_c1

fab1_ std_predict_c1

fab1_ predict_c2

fab1_ std_predict_c2

0

1.173386e+01

3.031946e-01

1.285257e+01

6.190547e-01

1.193986e+01

3.613565e-01

1

1.366890e+01

8.495373e-01

1.503635e+01

1.235628e+00

1.487464e+01

1.189968e+00

2

5.744445e+00

-1.387866e+00

6.786511e+00

-1.093648e+00

3.787681e+00

-1.940342e+00

8

9.614527e+00

-2.951807e-01

1.079011e+01

3.673590e-02

9.168116e+00

-4.212211e-01

9

7.587341e+00

-8.675398e-01

8.242365e+00

-6.825991e-01

5.744203e+00

-1.387935e+00

This component has component-specific external formats for model and prediction result evaluation.

See also

Component-common external format files in convert_process


Parameters

This component has the following component-specific parameters.

SPD

The following parameters are for “components” section of SPD.

Parameter Name

Type

Domain

Default Value

Description

standardize_target

bool

True / False

False

If this parameter is True, the target attribute is standardized.

max_fab_iterations

int

[1, inf)

100

Maximum number of FAB-iterations.

start_from_mstep 2 3

bool

True / False

False

If True, the first iteration starts with M-step; otherwise, E-step.

num_acceleration_steps

int

[0, inf)

0

The number of steps of acceleration algorithm for each FAB-iteration. If 0, the acceleration algorithm is disabled.

repeat_until_convergence

bool

True / False

False

If False, FAB-iterations and the post-processing are executed only once even if the FAB-iterations are stopped not by convergence condition but by max_fab_iterations condition.

projection_estep

bool

True / False

False

Whether the projection E-step algorithm is enabled.

shrink_threshold

float or str

[1, inf) or (0%, 100%)

1.0

Threshold value for shrinkage. If a percentage value (e.g. '1.0%') is specified, shrinkage is executed according to relative value, \(N_{\rm scaled\_sample} \times t_{\rm shrink}\) where \(t_{\rm shrink}\) is the threshold value and \(N_{\rm scaled\_sample}\) is the number of scaled expected samples.

fab_stop_threshold

float or str

(0, inf) or (0%, inf%)

0.001

Threshold value for FAB-iterations: if the increase of FIC value is less than the threshold, the FAB-iterations is considered to be converged. If a percentage value (e.g. '1.0%') is specified, convergence check is executed according to relative value, \((FIC^{(t)} - FIC^{(t-1)}) / | FIC^{(t-1)} |\).

gate_features

str

Query format

all()

Features which are applied to gate parameter optimizations. If not specified, all features are used.

comp_features

str

Query format

all()

Features which are applied to component parameter optimizations. If not specified, all features are used. If empty, the model is learned as a decision tree.

comp_mandatory_features

str

Query format

See Description

Features which non-L0-regularize constraints are applied to. It means the specified features will always be relevant for all components. If not specified, no features are specified for non-L0-regularization, which implies all relevant features are selected by FoBa algorithm.

comp_positive_features

str

Query format

See Description

Features whose weight values for all components are constrained to positive values. If not specified, all features are optimized with no constraints.

comp_negative_features

str

Query format

See Description

Features whose weight values for all components are constrained to negative values. If not specified, all features are optimized with no constraints.

tree_depth 2 3

int

[0, inf)

5

Initial depth of the gate-tree structure of latent variable prior. The initial number of components is \(2^d\) where \(d\) is tree depth. If 0, the optimization with only one component will be executed.

comp_weights_min_scale 2 3

float

(-inf, inf)

-0.5

Scale value for the initialization of weight values of components.

comp_weights_max_scale 2 3

float

(-inf, inf)

0.5

Scale value for the initialization of weight values of components.

comp_bias_min_scale 2 3

float

(-inf, inf)

0.25

Scale value for the initialization of bias values of components.

comp_bias_max_scale 2 3

float

(-inf, inf)

0.75

Scale value for the initialization of bias values of components.

comp_variance_min_scale 2 3

float

(0, inf)

0.1

Scale value for the initialization of variance values of components.

comp_variance_max_scale 2 3

float

(0, inf)

0.25

Scale value for the initialization of variance values of components.

gate_max_bins

int

[1, inf)

See Description

Maximum number of binning for each feature, which is used for gate parameter optimization. If not specified, all unique samples for each feature are used; otherwise, the equal-width binning algorithm is adopted.

comp_foba_skip

str

{‘power_of_two’, ‘quarter_square’, ‘none’}

‘power_of_two’

The judging function type for the FoBa algorithm skipping. If ‘none’, FoBa is executed for all FAB-iteration steps. FoBa is skipped at \({\rm log}_{2}t \ne {\rm ceil}({\rm log}_{2}t)\) if ‘power_of_two’, or \(t \bmod {\rm ceil}(\sqrt{t}) \ne 0\) if ‘quarter_square’. \(t\) is FAB-iteration step index number starting from 1.

comp_foba_skip_max_interval

int

[2, inf)

25

The maximum interval for the FoBa algorithm skipping. If comp_foba_skip is ‘none’, this value is ignored.

comp_two_stage_opt

bool

True / False

False

Whether the two-stage optimization is enabled. If True, the first stage performs the parameter optimization on user-specified mandatory features (comp_mandatory_features), and the second stage carries out parameter optimization to the residual of the first stage for only the relevant non-mandatory features.

comp_backward_step

bool

True / False

False

Whether the backward-steps of FoBa algorithm are enabled. In the post-process, backward-steps are carried out regardless of this argument value.

comp_l2_regularize

float

[0, inf)

0.0

L2-regularization hyper-parameter for component parameter optimization. The larger the specified value, the stronger the regularization effect is. If 0.0, L2-regularization is disabled.

with_comp_scaled_l0_regularize

bool

True / False

True

Whether with scaled L0-regularization using a tighter lower bound of FIC for component parameter optimization; approximation of det(F) is refined, where F is a Fisher matrix.

max_comp_relevant_features

int

[1, inf)

100

Maximum number of the relevant features for each component.

max_comp_foba_iterations

int

[1, inf)

100

Maximum number of the FoBa-iterations for each component.

num_threads_gates

int

[1, inf)

1

Maximum number of OpenMP threads of gate parameter optimization where tasks for all gates are divided into.

num_threads_gate_features

int

[1, inf)

1

Maximum number of OpenMP threads of gate parameter optimization where tasks for all features are divided into.

num_threads_comps

int

[1, inf)

1

Maximum number of OpenMP threads of component parameter optimization.

2(1,2,3,4,5,6,7,8)

Ignore parameter in posterior hot-start

3(1,2,3,4,5,6,7,8)

Ignore parameter in model hot-start

SRC

The following parameters is for “hotstart” section of SRC.

Parameter Name

Type

Domain

Default Value

Description

type

str

{‘posterior’, ‘mh_refit_comp’, ‘mh_opt_comp’, ‘mh_refit_gate_and_refit_comp’, ‘mh_refit_gate_and_opt_comp’, ‘mh_opt_gate_and_opt_comp’}

The hot-start type. If ‘posterior’, FAB learns with posterior hot-start which use the initial model whose tree structure is generated by base model and data. Each gate and component parameters are initialized randomly. ‘mh_XXX’ means FAB learns with model hot-start which uses base model as initial model. ‘refit_{gate, comp}’ means refitting the gate functions or prediction formulas with current data. ‘opt_{gate, comp}’ means optimizing (feature selection and fitting) the gate functions or prediction formulas with current data.


Utilizable Sample Metadata

Warning

_fabhme_assigned_comp_id is deprecated. Use hotstart section of SRC instead of _fabhme_assigned_comp_id data column.

This component can utilize the _fabhme_assigned_comp_id attribute of the sample metadata to posterior hot-start. When the attribute _fabhme_assigned_comp_id attribute is specified in the input data, this component will start the FAB/HME algorithm with the _fabhme_assigned_comp_id attribute values as its initial posterior.

To create the attribute _fabhme_assigned_comp_id, see the specification of the command sampo_ps_fabhme export_assigned_comp_id.


Output Attributes

This component generates the following attributes.

Attribute Name

Scale

Description

<component_id>_actual

INTEGER/REAL (depend on target attribute)

Values of target attribute.

<component_id>_std_actual

REAL

Standardized values of actual.

<component_id>_predict

REAL

Predicted values.

<component_id>_std_predict

REAL

Standardized values of predict.

<component_id>_assigned_comp_id

INTEGER

Component IDs assigned by gating functions.

<component_id>_predict_c<hme_comp_id>

REAL

Predicted values for the prediction formula of component id, <hme_comp_id>.

<component_id>_std_predict_c<hme_comp_id>

REAL

Standardized predicted values for the prediction formula of component id, <hme_comp_id>.

These attributes are in the component output data. These can be loaded in SAMPO API.

See also

Obtaining process results via ProcessResultLoader.

When convert_process is executed, the component output data will be saved in <component_id>_predict_result.csv.

This file describes the prediction result of the component.

_sid,fab1_actual,fab1_std_actual,fab1_predict,fab1_std_predict,fab1_assigned_comp_id,fab1_predict_c0,fab1_std_predict_c0,fab1_predict_c1,fab1_std_predict_c1,fab1_predict_c2,fab1_std_predict_c2
0,9,-4.686873e-01,1.193986e+01,3.613565e-01,2,1.173386e+01,3.031946e-01,1.285257e+01,6.190547e-01,1.193986e+01,3.613565e-01
1,9,-4.686873e-01,1.487464e+01,1.189968e+00,2,1.366890e+01,8.495373e-01,1.503635e+01,1.235628e+00,1.487464e+01,1.189968e+00
2,4,-1.880396e+00,5.744445e+00,-1.387866e+00,0,5.744445e+00,-1.387866e+00,6.786511e+00,-1.093648e+00,3.787681e+00,-1.940342e+00
...
8,8,-7.510290e-01,9.614527e+00,-2.951807e-01,0,9.614527e+00,-2.951807e-01,1.079011e+01,3.673590e-02,9.168116e+00,-4.212211e-01
9,6,-1.315712e+00,7.587341e+00,-8.675398e-01,0,7.587341e+00,-8.675398e-01,8.242365e+00,-6.825991e-01,5.744203e+00,-1.387935e+00

Attribute Metadata

The metadata of the output attributes is created with the following rules.

Context Rule

Attribute Name

Context Name

Description

All the output attributes of this component

field_path

List of the superordinate concepts of each output attribute based on the following hierarchical structure of the output attributes:

root
├── fabhmerg
│   ├── assigned_comp_id
│   └── component
│       ├── 0
│       │   ├── predict
│       │   └── std_predict
│       ├── 1
│       │   ├── predict
│       │   └── std_predict
│        .
│        .
│        .
│
└── regression
    ├── actual
    ├── std_actual
    ├── predict
    └── std_predict

<component_id>_std_actual, <component_id>_std_predict, <component_id>_std_predict_c<hme_comp_id>

mean

Mean of the target values for learning.

<component_id>_std_actual, <component_id>_std_predict, <component_id>_std_predict_c<hme_comp_id>

std

Standard deviation of the target values for learning.

<component_id>_assigned_comp_id

active_comp_ids

List of component IDs corresponding to each prediction formula.

Derivation Rule

Attribute Name

Derived From

<component_id>_actual, <component_id>_std_actual

Derived from the target attribute.

<component_id>_predict, <component_id>_std_predict

Derived from the attributes which have non-zero coefficients in any prediction formula.

<component_id>_assigned_comp_id

Derived from the attributes used in the gating functions.

<component_id>_predict_c<hme_comp_id>, <component_id>_std_predict_c<hme_comp_id>

Derived from the attributes which have non-zero coefficients in the prediction formula of component id, <hme_comp_id>.

Example

{
    "nodes": [
        {"aid": "_sid", "name": "_sid", ... },

        ...

        {"aid": "dl1[0]", "name": "Superplasticizer_kg_in_a_m3_mixture", ... },
        {"aid": "dl1[1]", "name": "Coarse_Aggregate_kg_in_a_m3_mixture", ... },

        ...

        {"aid": "std1[0]", "name": "std1_Superplasticizer_kg_in_a_m3_mixture", "scale": "real",
         "is_excluded": false, "cid": "std1", "cindex": 0, "values": null, "is_kept": false,
         "context": {
             "std": 4.8989794855663543e-01,
             "mean": 1.0000000000000001e-01
         }
        },
        {"aid": "std1[1]", "name": "std1_Coarse_Aggregate_kg_in_a_m3_mixture", "scale": "real",
         "is_excluded": false, "cid": "std1", "cindex": 1, "values": null, "is_kept": false,
        "context": {
            "std": 4.0463422692599771e+01,
            "mean": 9.5183199999999999e+02
         }
        },

        ...

        {"aid": "fab1[0]", "name": "fab1_actual", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 0, "values": null, "is_kept": false,
         "context": {"field_path": ["regression", "actual"]}
        },
        {"aid": "fab1[1]", "name": "fab1_std_actual", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 1, "values": null, "is_kept": false,
         "context": {"std": null, "field_path": ["regression", "std_actual"], "mean": null}
        },
        {"aid": "fab1[2]", "name": "fab1_predict", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 2, "values": null, "is_kept": false,
         "context": {"field_path": ["regression", "predict"]}
        },
        {"aid": "fab1[3]", "name": "fab1_std_predict", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 3, "values": null, "is_kept": false,
         "context": {"std": null, "field_path": ["regression", "std_predict"], "mean": null}
        },
        {"aid": "fab1[4]", "name": "fab1_assigned_comp_id", "scale": "integer", "is_excluded": false,
         "cid": "fab1", "cindex": 4, "values": null, "is_kept": false,
         "context": {"active_comp_ids": [0, 5, 9, 10, 13, 20, 24, 27], "field_path": ["fabhmerg", "assigned_comp_id"]}
        },
        {"aid": "fab1[5]", "name": "fab1_predict_c0", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 5, "values": null, "is_kept": false,
         "context": {"field_path": ["fabhmerg", "component", 0, "predict"]}
        },
        {"aid": "fab1[6]", "name": "fab1_std_predict_c0", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 6, "values": null, "is_kept": false,
         "context": {"std": null, "field_path": ["fabhmerg", "component", 0, "std_predict"], "mean": null}
        },

        ...

    ],
    "links": [
        {"source": "std1[0]", "target": "fab1[3]"},
        {"source": "std1[0]", "target": "fab1[2]"},
        {"source": "std1[0]", "target": "fab1[19]"},
        {"source": "std1[0]", "target": "fab1[20]"},
        {"source": "dl1[4]", "target": "fab1[1]"},
        {"source": "dl1[4]", "target": "fab1[0]"},
        {"source": "dl1[1]", "target": "std1[1]"},
        {"source": "std1[2]", "target": "fab1[11]"},
        {"source": "std1[2]", "target": "fab1[16]"},
        {"source": "std1[2]", "target": "fab1[4]"},
        {"source": "std1[2]", "target": "fab1[6]"},
        {"source": "std1[2]", "target": "fab1[2]"},
        {"source": "std1[2]", "target": "fab1[3]"},
        {"source": "std1[2]", "target": "fab1[5]"},
        {"source": "std1[2]", "target": "fab1[12]"},
        {"source": "std1[2]", "target": "fab1[15]"},
        {"source": "std1[1]", "target": "fab1[2]"},
        {"source": "std1[1]", "target": "fab1[4]"},
        {"source": "std1[1]", "target": "fab1[3]"},
        {"source": "std1[3]", "target": "fab1[10]"},
        {"source": "std1[3]", "target": "fab1[9]"},
        {"source": "std1[3]", "target": "fab1[16]"},
        {"source": "std1[3]", "target": "fab1[8]"},
        {"source": "std1[3]", "target": "fab1[18]"},
        {"source": "std1[3]", "target": "fab1[4]"},
        {"source": "std1[3]", "target": "fab1[6]"},
        {"source": "std1[3]", "target": "fab1[13]"},
        {"source": "std1[3]", "target": "fab1[7]"},
        {"source": "std1[3]", "target": "fab1[2]"},
        {"source": "std1[3]", "target": "fab1[3]"},
        {"source": "std1[3]", "target": "fab1[5]"},
        {"source": "std1[3]", "target": "fab1[17]"},
        {"source": "std1[3]", "target": "fab1[14]"},
        {"source": "std1[3]", "target": "fab1[15]"},
        {"source": "dl1[3]", "target": "std1[3]"},
        {"source": "dl1[0]", "target": "std1[0]"},
        {"source": "dl1[2]", "target": "std1[2]"}
    ]
}

See also

Attribute metadata file format in Attribute Metadata File Specification


Model

The model of this component can be described by the following parameters.

Model Parameter

Type

Domain

Description

fic

float

(-inf, inf)

Factorized Information Criterion. The asymptotic approximation value used by FAB/HME.

num_initial_comps

int

[0, inf)

The initial number of components before iterations.

num_active_comps

int

[0, inf)

The terminal number of active components after iterations.

standardize_mean

float

(-inf, inf)

Mean value used for standardizing the target attribute during learning.

standardize_std

float

(-inf, inf)

Standard deviation value used for standardizing the target attribute during learning.

gate_tree

dict

See Description

Dictionary form of the gating tree structure.

prediction_formulas

pandas.DataFrame

See Description

Component weights and bias for each prediction formula.

The gate_tree dictionary keys are described below:

Gate Tree Dictionary Key

Type

Domain

Description

gate_type

str

‘bern’

The type of gate.

hard_gate

bool

true / false

Whether the gate is hard_gate or not.

nodes

list of dict

See Description

List of node dictionaries.

edges

list of dict

See Description

List of edge dictionaries.

The keys of each node dictionary in nodes are described below:

Node Dictionary Key

Type

Domain

Description

node_id

int

[0, inf)

The node ID.

node_type

str

{‘gate’, ‘component’}

The node type.

gate_func

dict

See Description

The gate_func dictionary contains the gate function parameters for the Bernoulli gate. Specifiable if node_type is “gate”.

comp_id

int

[0, inf)

The component ID. Specifiable if node_type is “component”.

The keys of each edge dictionary in edges are described below:

Edge Dictionary Key

Type

Domain

Description

source

int

[0, inf)

The node_id of the source node.

target

int

[0, inf)

The node_id of the target node.

is_left

bool

true / false

Whether the target node is the left-child of the source.

The keys of the gate_func dictionary are described below:

Gate Function Dictionary Key

Type

Domain

Description

attr_name

str

See Description

The attribute name.

aid

str

See Description

The attribute ID.

threshold

float

(-inf, inf)

Threshold value of the Bernoulli-gating function.

prob_left_smaller_than_threshold

float

[0.0, 1.0]

Probability that the value of left-child node is smaller than the threshold.

When the model is loaded in the SAMPO API, the model parameters will be output as a single dictionary.

See also

Obtaining process results via ProcessResultLoader

{'fic': -113.32992684082878,
 'num_initial_comps': 8,
 'num_active_comps': 7,
 'standardize_mean': 1.1303215277777777e+04,
 'standardize_std': 5.7343353765366674e+03,
 'gate_tree':
     {'gate_type': 'bern',
      'hard_gate': True,
      'nodes': [
          {'node_type': 'gate',
           'node_id': 0,
           'gate_func':
               {'threshold': 1.0926798109856788,
                'aid': 'std1[6]',
                'attr_name': 'std1_Fine_Aggregate_kg_in_a_m3_mixture',
                'prob_left_smaller_than_threshold': 1.0}},
          {'node_type': 'gate',
           'node_id': 1,
           'gate_func':
               {'threshold': -0.3327413962726527,
                'aid': 'std1[6]',
                'attr_name': 'std1_Fine_Aggregate_kg_in_a_m3_mixture',
                'prob_left_smaller_than_threshold': 0.0}},
          {'node_type': 'gate',
           'node_id': 2,
           'gate_func':
               {'threshold': -0.9288885581274996,
                'aid': 'std1[0]',
                'attr_name': 'std1_Cement_kg_in_a_m3_mixture',
                'prob_left_smaller_than_threshold': 1.0}},
          {'node_type': 'gate',
           'node_id': 5,
           'gate_func':
               {'threshold': -1.8014738709189047,
                'aid': 'std1[0]',
                'attr_name': 'std1_Cement_kg_in_a_m3_mixture',
                'prob_left_smaller_than_threshold': 0.0}},
          {'node_type': 'gate',
           'node_id': 8,
           'gate_func':
               {'threshold': -0.995668911902564,
                'aid': 'std1[5]',
                'attr_name': 'std1_Coarse_Aggregate_kg_in_a_m3_mixture',
                'prob_left_smaller_than_threshold': 1.0}},
          {'node_type': 'gate',
           'node_id': 10,
           'gate_func':
               {'threshold': 0.03525594329899589,
                'aid': 'std1[1]',
                'attr_name': 'std1_Blast_Furnace_Slag_kg_in_a_m3_mixture',
                'prob_left_smaller_than_threshold': 1.0}},
          {'comp_id': 0, 'node_type': 'component', 'node_id': 3},
          {'comp_id': 1, 'node_type': 'component', 'node_id': 4},
          {'comp_id': 2, 'node_type': 'component', 'node_id': 6},
          {'comp_id': 3, 'node_type': 'component', 'node_id': 7},
          {'comp_id': 5, 'node_type': 'component', 'node_id': 9},
          {'comp_id': 6, 'node_type': 'component', 'node_id': 11},
          {'comp_id': 7, 'node_type': 'component', 'node_id': 12}],
      'edges': [
          {'source': 10, 'target': 11, 'is_left': True},
          {'source': 10, 'target': 12, 'is_left': False},
          {'source': 1, 'target': 2, 'is_left': True},
          {'source': 1, 'target': 5, 'is_left': False},
          {'source': 0, 'target': 1, 'is_left': True},
          {'source': 0, 'target': 8, 'is_left': False},
          {'source': 2, 'target': 3, 'is_left': True},
          {'source': 2, 'target': 4, 'is_left': False},
          {'source': 5, 'target': 7, 'is_left': False},
          {'source': 5, 'target': 6, 'is_left': True},
          {'source': 8, 'target': 9, 'is_left': True},
          {'source': 8, 'target': 10, 'is_left': False}]}},
 'prediction_formulas':
                                                  prediction_formula_0  prediction_formula_1  prediction_formula_2  prediction_formula_3  prediction_formula_5  prediction_formula_6  prediction_formula_7
     attr_name
     std1_Cement_kg_in_a_m3_mixture                           0.000000             -0.842738              0.890938              0.000000              0.000000              0.000000              0.000000
     std1_Blast_Furnace_Slag_kg_in_a_m3_mixture               0.000000             -0.925857              0.000000              0.000000              0.000000              0.000000              0.000000
     std1_Fly_Ash_kg_in_a_m3_mixture                          0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
     std1_Water_kg_in_a_m3_mixture                            0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
     std1_Superplasticizer_kg_in_a_m3_mixture                 0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
     std1_Coarse_Aggregate_kg_in_a_m3_mixture                 0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
     std1_Fine_Aggregate_kg_in_a_m3_mixture                   0.000000              0.000000              0.000000              0.000000              0.000000              0.000000              0.000000
     bias                                                     0.141556              0.004476             -0.729938             -1.092049              0.045834              1.169998              0.889939
     variance                                                 0.413225              0.004247              0.505680              0.000000              0.061580              0.019360              0.000000}

External Format

When convert_process is executed, the model parameters are saved into different files and are grouped as: general information, gating function, and prediction formula.

General Information

This file describes \(FIC\) after learning the model, initial number of components, and the terminal number of components.

fic,num_initial_comps,num_active_comps
-1.294308e+02,8,3

Gate Tree

This file describes the structure and parameters of the gate-tree of the model.

{
    "gate_tree": {
        "gate_type": "bern",
        "hard_gate": true,
        "nodes": [
            {
                "node_id": 1,
                "node_type": "gate",
                "gate_func": {
                    "aid": "dl1[1]",
                    "attr_name": "sepal_width_in_cm",
                    "threshold": 2.5499999999999998e+00,
                    "prob_left_smaller_than_threshold": 1.0000000000000000e+00
                }
            },
            {
                "node_id": 0,
                "node_type": "gate",
                "gate_func": {
                    "aid": "dl1[1]",
                    "attr_name": "sepal_width_in_cm",
                    "threshold": 3.7500000000000000e+00,
                    "prob_left_smaller_than_threshold": 1.0000000000000000e+00
                }
            },
            ...
            {
                "node_id": 2,
                "node_type": "component",
                "comp_id": 2
            },
            {
                "node_id": 5,
                "node_type": "component",
                "comp_id": 12
            },
            ...
        ],
        "edges": [
            {
                "source": 1,
                "target": 3,
                "is_left": false
            },
            {
                "source": 1,
                "target": 2,
                "is_left": true
            },
            ...
        ]
    }
}

Prediction Formulas

This file describes parameters of prediction formulas: weights, bias and variance values.

aid,attr_name,prediction_formula_0,prediction_formula_1,prediction_formula_2,prediction_formula_3,prediction_formula_4,prediction_formula_5,prediction_formula_7
std1[0],std1_Cement_kg_in_a_m3_mixture,8.2142735978929815e-01,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00
std1[1],std1_Blast_Furnace_Slag_kg_in_a_m3_mixture,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00

...

,bias,-6.7186992548478641e-01,-2.2356491265481820e-02,-1.3838061097770997e+00,5.3112994502587174e-01,-1.2190946387798474e+00,1.0330395448550976e-01,1.3254015778746325e-02
,variance,1.9539664915904803e-01,4.8649969215506857e-02,4.3107489306578556e-01,8.4528051141512206e-01,7.0848813094870444e-01,1.8867837210436236e-01,2.5513538042657519e-01

Prediction Result Evaluation

The indices used in evaluating prediction results of this component are described below.

Formula

Description

\(X_{\mbox{p}}\)

Array of predicted value.

\(X_{\mbox{a}}\)

Array of actual value.

\(\mbox{mean}(X)\)

Mean of \(X\).

\(\mbox{median}(X)\)

Median value in \(X\).

\(\mbox{max}(X)\)

Maximum value in \(X\).

\([\cdot]_+\)

A function which returns the argument directly if it is greater than \(0\), otherwise returns \(0\).


Evaluation Index

Type

Description

root_mean_squared_error

float

RMSE (Root Mean Square Error), which is the square root of the mean squared error as shown below:

\(\sqrt{\mbox{mean}((X_{\mbox{p}} - X_{\mbox{a}})^2)}\)

root_median_squared_error

float

RMdSE (Root Median Square Error), which is the square root of the median squared error as shown below:

\(\sqrt{\mbox{median}((X_{\mbox{p}} - X_{\mbox{a}})^2)}\)

mean_abs_error

float

Mean of absolute error as shown below:

\(\mbox{mean}(|X_{\mbox{p}} - X_{\mbox{a}}|)\)

median_abs_error

float

Median of absolute error as shown below:

\(\mbox{median}(|X_{\mbox{p}} - X_{\mbox{a}}|)\)

max_abs_error

float

Maximum value of absolute error.

\(\mbox{max}(|X_{\mbox{p}} - X_{\mbox{a}}|)\)

relative_root_mean_squared_error

float

The square root of the mean squared relative error as shown below:

\(\sqrt{\mbox{mean}((\frac{{\large X}_{\mbox{p}} {\large - X}_{\mbox{a}}}{ {\large X}_{\mbox{a}}})^2)}\)

relative_root_median_squared_error

float

The square root of the median squared relative error as shown below:

\(\sqrt{\mbox{median}((\frac{{\large X}_{\mbox{p}} {\large - X}_{\mbox{a}}}{ {\large X}_{\mbox{a}}})^2)}\)

relative_mean_abs_error

float

The mean abs relative error as shown below:

\(\mbox{mean}(|\frac{{\large X}_{\mbox{p}} {\large - X}_{\mbox{a}}}{ {\large X}_{\mbox{a}}}|)\)

relative_median_abs_error

float

The median abs relative error as shown below:

\(\mbox{median}(|\frac{{\large X}_{\mbox{p}} {\large - X}_{\mbox{a}}}{ {\large X}_{\mbox{a}}}|)\)

relative_max_abs_error

float

The maximum abs relative error as shown below:

\(\mbox{max}(|\frac{{\large X}_{\mbox{p}} {\large - X}_{\mbox{a}}}{ {\large X}_{\mbox{a}}}|)\)

positive_side_root_mean_squared_error

float

root_mean_squared_error for samples that satisfy the condition, \(X_{\mbox{p}} > X_{\mbox{a}}\).

positive_side_root_median_squared_error

float

root_median_squared_error for samples that satisfy the condition, \(X_{\mbox{p}} > X_{\mbox{a}}\).

positive_side_mean_abs_error

float

mean_abs_error for samples that satisfy the condition, \(X_{\mbox{p}} > X_{\mbox{a}}\).

positive_side_median_abs_error

float

median_abs_error for samples that satisfy the condition, \(X_{\mbox{p}} > X_{\mbox{a}}\).

positive_side_max_abs_error

float

max_abs_error for samples that satisfy the condition, \(X_{\mbox{p}} > X_{\mbox{a}}\).

negative_side_root_mean_squared_error

float

root_mean_squared_error for samples that satisfy the condition, \(X_{\mbox{a}} \geq X_{\mbox{p}}\).

negative_side_root_median_squared_error

float

root_median_squared_error for samples that satisfy the condition, \(X_{\mbox{a}} \geq X_{\mbox{p}}\).

negative_side_mean_abs_error

float

mean_abs_error for samples that satisfy the condition, \(X_{\mbox{a}} \geq X_{\mbox{p}}\).

negative_side_median_abs_error

float

median_abs_error for samples that satisfy the condition, \(X_{\mbox{a}} \geq X_{\mbox{p}}\).

negative_side_max_abs_error

float

max_abs_error for samples that satisfy the condition, \(X_{\mbox{a}} \geq X_{\mbox{p}}\).

max_upside_err_mean_obs

float

Proportion of the maximum error for samples that satisfy the condition, \(X_{\mbox{p}} > X_{\mbox{a}}\) against the mean of actual values as shown below:

\(\frac{\mbox{max}({\large X}_{\mbox{p}} {\large - X}_{\mbox{a}})}{\mbox{mean}({\large X}_{\mbox{a}})}\)

mean_upside_err_mean_obs

float

Proportion of the mean error whose value is only available if it satisfies the condition, \(X_{\mbox{p}} > X_{\mbox{a}}\) (otherwise \(0\)) against the mean of actual values as shown below:

\(\frac{\mbox{mean}([{\large X}_{\mbox{p}} {\large - X}_{\mbox{a}}]_+)}{\mbox{mean}({\large X}_{\mbox{a}})}\)

max_downside_err_mean_obs

float

Proportion of the maximum error for samples that satisfy the condition, \(X_{\mbox{a}} \geq X_{\mbox{p}}\) against the mean of actual values as shown below:

\(\frac{\mbox{max}({\large X}_{\mbox{a}} {\large - X}_{\mbox{p}})}{\mbox{mean}({\large X}_{\mbox{a}})}\)

mean_downside_err_mean_obs

float

Proportion of the mean error whose value is only available if it satisfies the condition, \(X_{\mbox{a}} \geq X_{\mbox{p}}\) (otherwise \(0\)) against the mean of actual values as shown below:

\(\frac{\mbox{mean}([{\large X}_{\mbox{a}} - {\large X}_{\mbox{p}}]_+)}{\mbox{mean}({\large X}_{\mbox{a}})}\)

negative_pred_num

int

The number of the samples that satisfy the condition, \(X_{\mbox{p}} < 0\).

std_root_mean_squared_error

float

root_mean_squared_error for standardized predicted/actual values.

std_root_median_squared_error

float

root_median_squared_error for standardized predicted/actual values.

std_mean_abs_error

float

mean_abs_error for standardized predicted/actual values.

std_median_abs_error

float

median_abs_error for standardized predicted/actual values.

std_max_abs_error

float

max_abs_error for standardized predicted/actual values.

std_relative_root_mean_squared_error

float

relative_root_mean_squared_error for standardized predicted/actual values.

std_relative_root_median_squared_error

float

relative_root_median_squared_error for standardized predicted/actual values.

std_relative_mean_abs_error

float

relative_mean_abs_error for standardized predicted/actual values.

std_relative_median_abs_error

float

relative_median_abs_error for standardized predicted/actual values.

std_relative_max_abs_error

float

relative_max_abs_error for standardized predicted/actual values.

std_positive_side_root_mean_squared_error

float

positive_side_root_mean_squared_error for standardized predicted/actual values.

std_positive_side_root_median_squared_error

float

positive_side_root_median_squared_error for standardized predicted/actual values.

std_positive_side_mean_abs_error

float

positive_side_mean_abs_error for standardized predicted/actual values.

std_positive_side_median_abs_error

float

positive_side_median_abs_error for standardized predicted/actual values.

std_positive_side_max_abs_error

float

positive_side_max_abs_error for standardized predicted/actual values.

std_negative_side_root_mean_squared_error

float

negative_side_root_mean_squared_error for standardized predicted/actual values.

std_negative_side_root_median_squared_error

float

negative_side_root_median_squared_error for standardized predicted/actual values.

std_negative_side_mean_abs_error

float

negative_side_mean_abs_error for standardized predicted/actual values.

std_negative_side_median_abs_error

float

negative_side_median_abs_error for standardized predicted/actual values.

std_negative_side_max_abs_error

float

negative_side_max_abs_error for standardized predicted/actual values.

std_max_upside_err_mean_obs

float

max_upside_err_mean_obs for standardized predicted/actual values.

std_mean_upside_err_mean_obs

float

mean_upside_err_mean_obs for standardized predicted/actual values.

std_max_downside_err_mean_obs

float

max_downside_err_mean_obs for standardized predicted/actual values.

std_mean_downside_err_mean_obs

float

mean_downside_err_mean_obs for standardized predicted/actual values.

std_negative_pred_num

int

negative_pred_num for standardized predicted/actual values.

When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded with the evaluation indices as the columns of the DataFrame.

See also

Obtaining process results via ProcessResultLoader

External Format

When convert_process is executed, the evaluation results are saved as a CSV file with the evaluation indices as the header of the CSV.

This file describes the evaluation of the prediction result of the component.

root_mean_squared_error,root_median_squared_error,mean_abs_error,median_abs_error,max_abs_error,relative_root_mean_squared_error,relative_root_median_squared_error,relative_mean_abs_error,relative_median_abs_error,relative_max_abs_error,positive_side_root_mean_squared_error,positive_side_root_median_squared_error,positive_side_mean_abs_error,positive_side_median_abs_error,positive_side_max_abs_error,negative_side_root_mean_squared_error,negative_side_root_median_squared_error,negative_side_mean_abs_error,negative_side_median_abs_error,negative_side_max_abs_error,max_upside_err_mean_obs,mean_upside_err_mean_obs,max_downside_err_mean_obs,mean_downside_err_mean_obs,negative_pred_num,std_root_mean_squared_error,std_root_median_squared_error,std_mean_abs_error,std_median_abs_error,std_max_abs_error,std_relative_root_mean_squared_error,std_relative_root_median_squared_error,std_relative_mean_abs_error,std_relative_median_abs_error,std_relative_max_abs_error,std_positive_side_root_mean_squared_error,std_positive_side_root_median_squared_error,std_positive_side_mean_abs_error,std_positive_side_median_abs_error,std_positive_side_max_abs_error,std_negative_side_root_mean_squared_error,std_negative_side_root_median_squared_error,std_negative_side_mean_abs_error,std_negative_side_median_abs_error,std_negative_side_max_abs_error,std_max_upside_err_mean_obs,std_mean_upside_err_mean_obs,std_max_downside_err_mean_obs,std_mean_downside_err_mean_obs,std_negative_pred_num
1.350699e+01,8.838062e+00,1.036833e+01,8.837361e+00,3.377979e+01,8.686816e-01,3.416881e-01,5.519667e-01,3.413176e-01,3.334933e+00,1.356254e+01,9.506539e+00,1.098259e+01,9.506539e+00,3.377979e+01,1.338252e+01,5.174490e+00,9.001098e+00,5.174490e+00,2.990477e+01,1.240240e+00,2.782290e-01,1.097967e+00,1.024486e-01,2,1.135126e+00,7.427497e-01,8.713529e-01,7.426909e-01,2.838850e+00,2.076069e+00,6.108897e-01,1.178233e+00,6.105078e-01,9.166376e+00,1.139795e+00,7.989285e-01,9.229754e-01,7.989285e-01,2.838850e+00,1.124665e+00,4.348636e-01,7.564513e-01,4.348636e-01,2.513193e+00,-2.609989e+00,-5.855117e-01,-2.310587e+00,-2.155952e-01,74

Details

If a data set has samples with missing or +/-Inf values, this component ignores those samples.