FABHMEBernGateLinearMultiCl Component Specification

Overview

FABHMEBernGateLinearMultiCl component is a linear multiclass classification component with FAB/HME algorithm. This component learns a tree-structured model in which each sample is assigned to a component according to Bernoulli gating functions.

Note

FAB engine uses the word ‘component’ with a different meaning from that of SAMPO. Each component in FAB/HME is a prediction formula, and each sample data is assigned to a specific component for prediction.

Example:

  • SPD:

    # fabhmemcl.spd
    dl1 -> fab1
    
    ---
    components:
        dl1:
            component: DataLoader
        fab1:
            component: FABHMEBernGateLinearMultiClComponent
            features: name != 'class'
            tree_depth: 3
            target: name == 'class'
    
    global_settings:
        keep_attributes:
            - class
        feature_exclude:
            - class
    
  • Input of the component:

_sid

sepal_length_in_cm

sepal_width_in_cm

class

0

5.1

3.5

Iris-setosa

1

4.9

3.0

Iris-setosa

2

4.7

3.2

Iris-versicolor

28

6.7

2.5

Iris-virginica

29

7.2

3.6

Iris-virginica


  • Output of the component:

_sid

fab1_actual

fab1_predict

0

Iris-setosa

Iris-setosa

1

Iris-setosa

Iris-setosa

2

Iris-versicolor

Iris-versicolor

28

Iris-virginica

Iris-virginica

29

Iris-virginica

Iris-virginica


_sid

fab1_score_Iris-setosa

fab1_score_Iris-versicolor

fab1_score_Iris-virginica

fab1_assigned_comp_id

0

3.066959e+00

-9.652613e-01

-1.528725e+01

0

1

2.729733e+00

-5.539330e-01

-1.273995e+01

0

2

2.838539e+00

6.186438e+00

4.268897e+00

0

28

2.357728e+00

9.662180e+00

1.384438e+01

0

29

3.097255e+00

8.260996e+00

9.879197e+00

0


_sid

fab1_predict_c0

fab1_score_c0_Iris-setosa

fab1_score_c0_Iris-versicolor

fab1_score_c0_Iris-virginica

0

Iris-setosa

3.066959e+00

-9.652613e-01

-1.528725e+01

1

Iris-setosa

2.729733e+00

-5.539330e-01

-1.273995e+01

2

Iris-versicolor

2.838539e+00

6.186438e+00

4.268897e+00

28

Iris-virginica

2.357728e+00

9.662180e+00

1.384438e+01

29

Iris-virginica

3.097255e+00

8.260996e+00

9.879197e+00

This component has component-specific external formats for model and prediction result evaluation.

See also

Component-common external format files in convert_process


Parameters

This component has the following component-specific parameters.

SPD

The following parameters are for “components” section of SPD.

Parameter Name

Type

Domain

Default Value

Description

max_fab_iterations

int

[1, inf)

100

Maximum number of FAB-iterations.

start_from_mstep 2 3

bool

True / False

False

If True, the first iteration starts with M-step; otherwise, E-step.

num_acceleration_steps

int

[0, inf)

0

The number of steps of acceleration algorithm for each FAB-iteration. If 0, the acceleration algorithm is disabled.

repeat_until_convergence

bool

True / False

False

If False, FAB-iterations and the post-processing are executed only once even if the FAB-iterations are stopped not by convergence condition but by max_fab_iterations condition.

projection_estep

bool

True / False

False

Whether the projection E-step algorithm is enabled.

shrink_threshold

float or str

[1, inf) or (0%, 100%)

1.0

Threshold value for shrinkage. If a percentage value (e.g. '1.0%') is specified, shrinkage is executed according to relative value, \(N_{\rm scaled\_sample} \times t_{\rm shrink}\) where \(t_{\rm shrink}\) is the threshold value and \(N_{\rm scaled\_sample}\) is the number of scaled expected samples.

fab_stop_threshold

float or str

(0, inf) or (0%, inf%)

0.001

Threshold value for FAB-iterations: if the increase of FIC value is less than the threshold, the FAB-iterations is considered to be converged. If a percentage value (e.g. '1.0%') is specified, convergence check is executed according to relative value, \((FIC^{(t)} - FIC^{(t-1)}) / | FIC^{(t-1)} |\).

gate_features

str

Query format

all()

Features which are applied to gate parameter optimizations. If not specified, all features are used.

comp_features

str

Query format

all()

Features which are applied to component parameter optimizations. If not specified, all features are used. If empty, the model is learned as a decision tree.

comp_mandatory_features

str

Query format

See Description

Features which non-L0-regularize constraints are applied to. It means the specified features will always be relevant for all components. If not specified, no features are specified for non-L0-regularization, which implies all relevant features are selected by FoBa algorithm.

tree_depth 2 3

int

[0, inf)

5

Initial depth of the gate-tree structure of latent variable prior. The initial number of components is \(2^d\) where \(d\) is tree depth. If 0, the optimization with only one component will be executed.

comp_weights_min_scale 2 3

float

(-inf, inf)

-0.5

Scale value for the initialization of weight values of components.

comp_weights_max_scale 2 3

float

(-inf, inf)

0.5

Scale value for the initialization of weight values of components.

comp_bias_min_scale 2 3

float

(-inf, inf)

0.25

Scale value for the initialization of bias values of components.

comp_bias_max_scale 2 3

float

(-inf, inf)

0.75

Scale value for the initialization of bias values of components.

gate_max_bins

int

[1, inf)

See Description

Maximum number of binning for each feature, which is used for gate parameter optimization. If not specified, all unique samples for each feature are used; otherwise, the equal-width binning algorithm is adopted.

comp_foba_skip

str

{‘power_of_two’, ‘quarter_square’, ‘none’}

‘power_of_two’

The judging function type for the FoBa algorithm skipping. If ‘none’, FoBa is executed for all FAB-iteration steps. FoBa is skipped at \({\rm log}_{2}t \ne {\rm ceil}({\rm log}_{2}t)\) if ‘power_of_two’, or \(t \bmod {\rm ceil}(\sqrt{t}) \ne 0\) if ‘quarter_square’. \(t\) is FAB-iteration step index (\(t\) starts from 1).

comp_foba_skip_max_interval

int

[2, inf)

25

The maximum interval for the FoBa algorithm skipping. If comp_foba_skip is ‘none’, this value is ignored.

comp_backward_step

bool

True / False

False

Whether the backward-steps of FoBa algorithm are enabled. In the post-process, backward-steps are carried out regardless of this argument value.

comp_l2_regularize

float

[0, inf)

0.0

L2-regularization hyper-parameter for component parameter optimization. The larger the specified value, the stronger the regularization effect is. If 0.0, L2-regularization is disabled.

with_comp_scaled_l0_regularize

bool

True / False

True

Whether with scaled L0-regularization using a tighter lower bound of FIC for component parameter optimization; approximation of det(F) is refined, where F is a Fisher matrix.

max_comp_relevant_features

int

[1, inf)

100

Maximum number of the relevant features for each component.

max_comp_foba_iterations

int

[1, inf)

100

Maximum number of the FoBa-iterations for each component.

num_threads_gates

int

[1, inf)

1

Maximum number of OpenMP threads of gate parameter optimization where tasks for all gates are divided into.

num_threads_gate_features

int

[1, inf)

1

Maximum number of OpenMP threads of gate parameter optimization where tasks for all features are divided into.

num_threads_comps

int

[1, inf)

1

Maximum number of OpenMP threads of component parameter optimization.

2(1,2,3,4,5,6)

Ignore parameter in posterior hot-start

3(1,2,3,4,5,6)

Ignore parameter in model hot-start

SRC

The following parameter is for “hotstart” section of SRC.

Parameter Name

Type

Domain

Default Value

Description

type

str

{‘posterior’, ‘mh_refit_comp’, ‘mh_opt_comp’, ‘mh_refit_gate_and_refit_comp’, ‘mh_refit_gate_and_opt_comp’, ‘mh_opt_gate_and_opt_comp’}

The hot-start type. If ‘posterior’, FAB learns with posterior hot-start which use the initial model whose tree structure is generated by base model and data. Each gate and component parameters are initialized randomly. ‘mh_XXX’ means FAB learns with model hot-start which uses base model as initial model. ‘refit_{gate, comp}’ means refitting the gate functions or prediction formulas with current data. ‘opt_{gate, comp}’ means optimizing (feature selection and fitting) the gate functions or prediction formulas with current data.


Utilizable Sample Metadata

Warning

_fabhme_assigned_comp_id is deprecated. Use hotstart section of SRC instead of _fabhme_assigned_comp_id data column.

This component can utilize the _fabhme_assigned_comp_id attribute of the sample metadata to hot-start with posterior. When the attribute _fabhme_assigned_comp_id attribute is specified in the input data, this component will start the FAB/HME algorithm with the _fabhme_assigned_comp_id attribute values as its initial posterior.

To create the attribute _fabhme_assigned_comp_id, see the specification of the command sampo_ps_fabhme export_assigned_comp_id.


Output Attributes

This component generates the following attributes.

Attribute Name

Scale

Description

<component_id>_actual

NOMINAL

Values of the target attribute.

<component_id>_predict

NOMINAL

Predicted values.

<component_id>_score_<target_class>

REAL

A prediction result is obtained by identifying the maximum score among all target classes.

<component_id>_assigned_comp_id

INTEGER

Component IDs assigned by gating functions.

<component_id>_predict_c<hme_comp_id>

NOMINAL

Predicted values for the prediction formula of component ID, <hme_comp_id>.

<component_id>_score_c<hme_comp_id>_<target_class>

REAL

Score values per target class for the prediction formula of component ID, <hme_comp_id>.

These attributes are in the component output data. These can be loaded in SAMPO API.

See also

Obtaining process results via ProcessResultLoader.

When convert_process is executed, the component output data will be saved in <component_id>_predict_result.csv.

This file describes the prediction result of the component.

_sid,fab1_actual,fab1_predict,fab1_score_Iris-setosa,fab1_score_Iris-versicolor,fab1_score_Iris-virginica,fab1_assigned_comp_id,fab1_predict_c2,fab1_score_c2_Iris-setosa,fab1_score_c2_Iris-versicolor,fab1_score_c2_Iris-virginica
0,Iris-setosa,Iris-setosa,1.000000e+00,-1.932341e+00,-1.101594e+01,2,Iris-setosa,1.000000e+00,-1.932341e+00,-1.101594e+01
1,Iris-setosa,Iris-setosa,1.000000e+00,-1.932341e+00,-1.101594e+01,2,Iris-setosa,1.000000e+00,-1.932341e+00,-1.101594e+01
2,Iris-setosa,Iris-setosa,1.000000e+00,-2.135714e+00,-1.147150e+01,2,Iris-setosa,1.000000e+00,-2.135714e+00,-1.147150e+01
...
28,Iris-virginica,Iris-virginica,1.000000e+00,7.016043e+00,9.028858e+00,2,Iris-virginica,1.000000e+00,7.016043e+00,9.028858e+00
29,Iris-virginica,Iris-virginica,1.000000e+00,7.626160e+00,1.039555e+01,2,Iris-virginica,1.000000e+00,7.626160e+00,1.039555e+01

Attribute Metadata

The metadata of the output attributes is created with the following rules.

Context Rule

Attribute Name

Context Name

Description

All the output attributes of this component

field_path

List of the superordinate concepts of each output attribute based on the following hierarchical structure of the output attributes:

root
├── fabhmemcl
│   ├── assigned_comp_id
│   └── component
│       ├── 0
│       │   ├── predict
│       │   └── score
│       │       ├── *<target_class_0>*
│       │       ├── *<target_class_1>*
│       │        .
│       │        .
│       │        .
│       │
│       ├── 1
│       │   ├── predict
│       │   └── score
│       │       ├── *<target_class_0>*
│       │       ├── *<target_class_1>*
│       │        .
│       │        .
│       │        .
│        .
│        .
│        .
│
└── multiclass_classification
    ├── actual
    ├── predict
    └── score
        ├── *<target_class_0>*
        ├── *<target_class_1>*
         .
         .
         .

<component_id>_assigned_comp_id

active_comp_ids

List of component IDs corresponding to each prediction formula.

Derivation Rule

Attribute Name

Derived From

<component_id>_actual

Derived from the target attribute.

<component_id>_predict

Derived from the attributes which have non-zero coefficients in any prediction formula.

<component_id>_score_<target_class>

Derived from the attributes which have non-zero coefficients in any prediction formula.

<component_id>_assigned_comp_id

Derived from the attributes used in the gating functions.

<component_id>_predict_c<hme_comp_id>

Derived from the attributes which have non-zero coefficients in the prediction formula of component ID, <hme_comp_id>.

<component_id>_score_c<hme_comp_id>_<target_class>

Derived from the attributes which have non-zero coefficients in the prediction formula of component ID, <hme_comp_id>.

Example

{
    "nodes": [
        {"aid": "_sid", "name": "_sid", ... },
        {"aid": "dl1[0]", "name": "sepal_length_in_cm", ... },
        {"aid": "dl1[1]", "name": "sepal_width_in_cm", ... },
        {"aid": "dl1[2]", "name": "petal_length_in_cm", ... },
        {"aid": "dl1[3]", "name": "petal_width_in_cm", ... },
        {"aid": "dl1[4]", "name": "class", ... },
        {"aid": "fab1[0]", "name": "fab1_actual", "scale": "nominal", "is_excluded": false,
         "cid": "fab1", "cindex": 0, "is_kept": false,
         "values": ["Iris-setosa", "Iris-versicolor", "Iris-virginica"],
         "context": {
             "field_path": ["multiclass_classification", "actual"]
         },
        },
        {"aid": "fab1[1]", "name": "fab1_predict", "scale": "nominal", "is_excluded": false,
         "cid": "fab1", "cindex": 1, "is_kept": false,
         "values": ["Iris-setosa", "Iris-versicolor", "Iris-virginica"],
         "context": {
             "field_path": ["multiclass_classification", "predict"]
         }
        },
        {"aid": "fab1[2]", "name": "fab1_score_Iris-setosa", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 2, "values": null, "is_kept": false,
         "context": {
             "field_path": ["multiclass_classification", "score", "Iris-setosa"]
         }
        },
        {"aid": "fab1[3]", "name": "fab1_score_Iris-versicolor", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 3, "values": null, "is_kept": false,
         "context": {
             "field_path": ["multiclass_classification", "score", "Iris-versicolor"]
         }
        },
        {"aid": "fab1[4]", "name": "fab1_score_Iris-virginica", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 4, "values": null, "is_kept": false,
         "context": {
             "field_path": ["multiclass_classification", "score", "Iris-virginica"]
         }
        },
        {"aid": "fab1[5]", "name": "fab1_assigned_comp_id", "scale": "integer",
         "is_excluded": false, "cid": "fab1", "cindex": 5, "values": null, "is_kept": false,
         "context": {
             "active_comp_ids": [7, 13, 17, 19, 22], "field_path": ["fabhmemcl", "assigned_comp_id"]
         }
        },
        {"aid": "fab1[6]", "name": "fab1_predict_c7", "scale": "nominal", "is_excluded": false,
         "cid": "fab1", "cindex": 6, "is_kept": false,
         "values": ["Iris-setosa", "Iris-versicolor", "Iris-virginica"],
         "context": {
             "field_path": ["fabhmemcl", "component", 7, "predict"]}
         }
        },
        {"aid": "fab1[7]", "name": "fab1_score_c7_Iris-setosa", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 7, "values": null, "is_kept": false,
         "context": {
             "field_path": ["fabhmemcl", "component", 7, "score", "Iris-setosa"]
         }
        },
        {"aid": "fab1[8]", "name": "fab1_score_c7_Iris-versicolor", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 8, "values": null, "is_kept": false,
         "context": {
             "field_path": ["fabhmemcl", "component", 7, "score", "Iris-versicolor"]
         }
        },
        {"aid": "fab1[9]", "name": "fab1_score_c7_Iris-virginica", "scale": "real", "is_excluded": false,
         "cid": "fab1", "cindex": 9, "values": null, "is_kept": false,
         "context": {
             "field_path": ["fabhmemcl", "component", 8, "score", "Iris-virginica"]
         }
        },

         ...

     ],
    "links": [
        {"source": "dl1[1]", "target": "fab1[2]"},
        {"source": "dl1[1]", "target": "fab1[1]"},
        {"source": "dl1[1]", "target": "fab1[3]"},
        {"source": "dl1[0]", "target": "fab1[5]"},
        {"source": "dl1[0]", "target": "fab1[12]"},
        {"source": "dl1[0]", "target": "fab1[4]"},
        {"source": "dl1[0]", "target": "fab1[8]"},
        {"source": "dl1[0]", "target": "fab1[2]"},
        {"source": "dl1[0]", "target": "fab1[9]"},
        {"source": "dl1[0]", "target": "fab1[3]"},
        {"source": "dl1[0]", "target": "fab1[6]"},
        {"source": "dl1[0]", "target": "fab1[13]"},
        {"source": "dl1[0]", "target": "fab1[7]"},
        {"source": "dl1[0]", "target": "fab1[10]"},
        {"source": "dl1[0]", "target": "fab1[1]"},
        {"source": "dl1[0]", "target": "fab1[11]"},
        {"source": "dl1[2]", "target": "fab1[2]"},
        {"source": "dl1[2]", "target": "fab1[1]"},
        {"source": "dl1[2]", "target": "fab1[3]"},
        {"source": "dl1[3]", "target": "fab1[2]"},
        {"source": "dl1[3]", "target": "fab1[1]"},
        {"source": "dl1[3]", "target": "fab1[3]"},
        {"source": "dl1[4]", "target": "fab1[0]"}
    ]
}

See also

Attribute metadata file format in Attribute Metadata File Specification


Model

The model of this component can be described by the following parameters.

Model Parameter

Type

Domain

Description

fic

float

(-inf, inf)

Factorized Information Criterion. The asymptotic approximation value used by FAB/HME.

num_initial_comps

int

[0, inf)

The initial number of components before iterations.

num_active_comps

int

[0, inf)

The terminal number of active components after iterations.

gate_tree

dict

See Description

Dictionary form of the gating tree structure.

prediction_formulas

pandas.DataFrame

See Description

Component weights and bias for each prediction formula.

The gate_tree dictionary keys are described below:

Gate Tree Dictionary Key

Type

Domain

Description

gate_type

str

‘bern’

The type of gate.

hard_gate

bool

true / false

Whether the gate is hard_gate or not.

nodes

list of dict

See Description

List of node dictionaries.

edges

list of dict

See Description

List of edge dictionaries.

The keys of each node dictionary in nodes are described below:

Node Dictionary Key

Type

Domain

Description

node_id

int

[0, inf)

The node ID.

node_type

str

{‘gate’, ‘component’}

The node type.

gate_func

dict

See Description

The gate_func dictionary contains the gate function parameters for the Bernoulli gate. Specifiable if node_type is “gate”.

comp_id

int

[0, inf)

The component ID. Specifiable if node_type is “component”.

The keys of each edge dictionary in edges are described below:

Edge Dictionary Key

Type

Domain

Description

source

int

[0, inf)

The node_id of the source node.

target

int

[0, inf)

The node_id of the target node.

is_left

bool

true / false

Whether the target node is the left-child of the source.

The keys of the gate_func dictionary are described below:

Gate Function Dictionary Key

Type

Domain

Description

attr_name

str

See Description

The attribute name.

aid

str

See Description

The attribute ID.

threshold

float

(-inf, inf)

Threshold value of the Bernoulli-gating function.

prob_left_smaller_than_threshold

float

[0.0, 1.0]

Probability that the value of left-child node is smaller than the threshold.

When the model is loaded in the SAMPO API, the model parameters will be output as a single dictionary.

See also

Obtaining process results via ProcessResultLoader

{'fic': -64.281909689671252,
 'num_initial_comps': 32,
 'num_active_comps': 2,
 'gate_tree':
     {'gate_type': 'bern',
      'hard_gate': True,
      'nodes': [
          {'node_type': 'gate',
           'node_id': 0,
           'gate_func':
               {'threshold': 3.3499999999999996,
                'aid': 'dl[1]',
                'attr_name': 'petal_length_in_cm',
                'prob_left_smaller_than_threshold': 0.0}},
          {'comp_id': 17, 'node_type': 'component', 'node_id': 1},
          {'comp_id': 19, 'node_type': 'component', 'node_id': 2}],
      'edges': [
          {'source': 0, 'target': 1, 'is_left': True},
          {'source': 0, 'target': 2, 'is_left': False}]}},
 'prediction_formulas':
                         prediction_formula_17_Iris-setosa  prediction_formula_17_Iris-versicolor  prediction_formula_17_Iris-virginica  prediction_formula_19_Iris-setosa  prediction_formula_19_Iris-versicolor  prediction_formula_19_Iris-virginica
     attr_name
     sepal_length_in_cm                           0.120060                               0.000000                              2.648575                      -2.698730e-12                               2.543482                              3.747440
     petal_length_in_cm                           0.065109                               0.000000                             -2.231421                       0.000000e+00                               0.000000                              0.000000
     bias                                         0.390583                                   -inf                             -7.419183                       3.617190e-01                             -12.697070                            -20.274437}

Name rule of ‘prediction_formulas’ column is prediction_formula_<component_id>_<nominal_index>

External Format

When convert_process is executed, the model parameters are saved into different files and are grouped as: general information, gating function, and prediction formula.

General Information

This file describes \(FIC\) after learning the model, initial number of components, and the terminal number of components.

fic,num_initial_comps,num_active_comps
-1.294308e+02,8,3

Gate Tree

This file describes the structure and parameters of the gate-tree of the model.

{
    "gate_tree": {
        "gate_type": "bern",
        "hard_gate": true,
        "nodes": [
            {
                "node_id": 1,
                "node_type": "gate",
                "gate_func": {
                    "aid": "dl1[1]",
                    "attr_name": "sepal_width_in_cm",
                    "threshold": 2.5499999999999998e+00,
                    "prob_left_smaller_than_threshold": 1.0000000000000000e+00
                }
            },
            {
                "node_id": 0,
                "node_type": "gate",
                "gate_func": {
                    "aid": "dl1[1]",
                    "attr_name": "sepal_width_in_cm",
                    "threshold": 3.7500000000000000e+00,
                    "prob_left_smaller_than_threshold": 1.0000000000000000e+00
                }
            },
            ...
            {
                "node_id": 2,
                "node_type": "component",
                "comp_id": 2
            },
            {
                "node_id": 5,
                "node_type": "component",
                "comp_id": 12
            },
            ...
        ],
        "edges": [
            {
                "source": 1,
                "target": 3,
                "is_left": false
            },
            {
                "source": 1,
                "target": 2,
                "is_left": true
            },
            ...
        ]
    }
}

Prediction Formulas

This file describes parameters of prediction formulas: weights and bias values.

aid,attr_name,prediction_formula_2_Iris-setosa,prediction_formula_2_Iris-versicolor,prediction_formula_2_Iris-virginica
dl1[0],sepal_length_in_cm,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00
dl1[1],sepal_width_in_cm,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00
dl1[2],petal_length_in_cm,6.7879035725582071e-13,2.0337236935969112e+00,4.5556358511732196e+00
dl1[3],petal_width_in_cm,0.0000000000000000e+00,0.0000000000000000e+00,0.0000000000000000e+00
,bias,9.9999999999719602e-01,-4.7795545358542579e+00,-1.7393829518577114e+01

Predict Result File

This file describes the prediction result of the component.

_sid,fab1_actual,fab1_predict,fab1_score_Iris-setosa,fab1_score_Iris-versicolor,fab1_score_Iris-virginica,fab1_assigned_comp_id,fab1_predict_c2,fab1_score_c2_Iris-setosa,fab1_score_c2_Iris-versicolor,fab1_score_c2_Iris-virginica
0,Iris-setosa,Iris-setosa,1.000000e+00,-1.932341e+00,-1.101594e+01,2,Iris-setosa,1.000000e+00,-1.932341e+00,-1.101594e+01
1,Iris-setosa,Iris-setosa,1.000000e+00,-1.932341e+00,-1.101594e+01,2,Iris-setosa,1.000000e+00,-1.932341e+00,-1.101594e+01
2,Iris-setosa,Iris-setosa,1.000000e+00,-2.135714e+00,-1.147150e+01,2,Iris-setosa,1.000000e+00,-2.135714e+00,-1.147150e+01
...
28,Iris-virginica,Iris-virginica,1.000000e+00,7.016043e+00,9.028858e+00,2,Iris-virginica,1.000000e+00,7.016043e+00,9.028858e+00
29,Iris-virginica,Iris-virginica,1.000000e+00,7.626160e+00,1.039555e+01,2,Iris-virginica,1.000000e+00,7.626160e+00,1.039555e+01

Predict Result Evaluation File

This file describes the evaluation of the prediction result of the component.

accuracy_weighted_average,classification_error_weighted_average,precision_weighted_average,recall_weighted_average,specificity_weighted_average,false_positive_rate_weighted_average,false_negative_rate_weighted_average,f_measure_weighted_average,true_positive_Iris-setosa,false_positive_Iris-setosa,true_negative_Iris-setosa,false_negative_Iris-setosa,accuracy_Iris-setosa,classification_error_Iris-setosa,precision_Iris-setosa,recall_Iris-setosa,specificity_Iris-setosa,false_positive_rate_Iris-setosa,false_negative_rate_Iris-setosa,f_measure_Iris-setosa,true_positive_Iris-versicolor,false_positive_Iris-versicolor,true_negative_Iris-versicolor,false_negative_Iris-versicolor,accuracy_Iris-versicolor,classification_error_Iris-versicolor,precision_Iris-versicolor,recall_Iris-versicolor,specificity_Iris-versicolor,false_positive_rate_Iris-versicolor,false_negative_rate_Iris-versicolor,f_measure_Iris-versicolor,true_positive_Iris-virginica,false_positive_Iris-virginica,true_negative_Iris-virginica,false_negative_Iris-virginica,accuracy_Iris-virginica,classification_error_Iris-virginica,precision_Iris-virginica,recall_Iris-virginica,specificity_Iris-virginica,false_positive_rate_Iris-virginica,false_negative_rate_Iris-virginica,f_measure_Iris-virginica,cf_Iris-setosa_Iris-setosa,cf_Iris-setosa_Iris-versicolor,cf_Iris-setosa_Iris-virginica,cf_Iris-versicolor_Iris-setosa,cf_Iris-versicolor_Iris-versicolor,cf_Iris-versicolor_Iris-virginica,cf_Iris-virginica_Iris-setosa,cf_Iris-virginica_Iris-versicolor,cf_Iris-virginica_Iris-virginica
8.222222e-01,1.777778e-01,8.194444e-01,7.333333e-01,8.666667e-01,1.333333e-01,2.666667e-01,6.705517e-01,5,3,7,0,8.000000e-01,2.000000e-01,6.250000e-01,1.000000e+00,7.000000e-01,3.000000e-01,0.000000e+00,7.692308e-01,5,1,9,0,9.333333e-01,6.666667e+00,8.333333e-01,1.000000e+00,9.000000e-01,1.000000e-01,0.000000e+00,9.090909e-01,1,0,10,4,7.333333e-01,2.666667e-01,1.000000e+00,2.000000e-01,1.000000e+00,0.000000e+00,8.000000e-01,3.333333e-01,5,0,0,0,5,0,3,1,1

Prediction Result Evaluation

The following is a classwise evaluation index list for each target class, \(i\). Weighted averages of evaluation indices are subsequently computed wherein the weight of a target class is the proportion of the occurrences of the class in the actual population.

Evaluation Index

Type

Description

true_positive_<i> (**)

int

Number of samples determined as positive for each target class, \(i\), correctly (TP).

false_positive_<i> (**)

int

Number of samples determined as positive for each target class, \(i\), incorrectly (FP).

true_negative_<i> (**)

int

Number of samples determined as negative for each target class, \(i\), correctly (TN).

false_negative_<i> (**)

int

Number of samples determined as negative for each target class, \(i\), incorrectly (FN).

accuracy_<i>

float

Proportion of true results for each target class, \(i\), in the population as shown below:

\(\frac{\mbox{TP}_{i} + \mbox{TN}_{i}}{\mbox{TP}_{i} + \mbox{FP}_{i} + \mbox{TN}_{i} + \mbox{FN}_{i}}\)

classification_error_<i>

float

Proportion of false results for each target class, \(i\), in the population as shown below:

\(\frac{\mbox{FP}_{i} + \mbox{FN}_{i}}{\mbox{TP}_{i} + \mbox{FP}_{i} + \mbox{TN}_{i} + \mbox{FN}}_{i} = 1 - \mbox{accuracy}_{i}\)

precision_<i>

float

Proportion of the true_positive of each target class, \(i\), against all samples determined as positive as shown below:

\(\frac{\mbox{TP}_{i}}{\mbox{TP}_{i} + \mbox{FP}_{i}}\)

recall_<i>

float

Proportion of the true_positive of each target class, \(i\), against all the actual positive samples as shown below:

\(\frac{\mbox{TP}_{i}}{\mbox{TP}_{i} + \mbox{FN}_{i}}\)

specificity_<i>

float

Proportion of the true_negative of each target class, \(i\), against all the actual negative samples as shown below:

\(\frac{\mbox{TN}_{i}}{\mbox{TN}_{i} + \mbox{FP}_{i}}\)

false_positive_rate_<i>

float

Proportion of the false_positive of each target class, \(i\), against all the actual negative samples as shown below:

\(\frac{\mbox{FP}_{i}}{\mbox{TN}_{i} + \mbox{FP}_{i}} = 1 - \mbox{specificity}_{i}\)

false_negative_rate_<i>

float

Proportion of the false_negative of each target class, \(i\), against all the actual positive samples as shown below:

\(\frac{\mbox{FN}_{i}}{\mbox{TP}_{i} + \mbox{FN}_{i}} = 1 - \mbox{recall}_{i}\)

f_measure_<i>

float

Harmonic mean of precision and recall of each target class, \(i\), as shown below:

\(\frac{2 \times \mbox{precision}_{i} \times \mbox{recall}_{i}}{\mbox{precision}_{i} + \mbox{recall}_{i}}\)

cf_<i>_<j> (**)

int

Confusion matrix values that show the number of actual class, \(i\), values predicted as \(j\).
There are \(\mbox{num_target_classes}^{2}\) cf index values for every evaluation.
  • (**) Weighted average is not computed for this index.

When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded with the evaluation indices as the columns of the DataFrame.

See also

Obtaining process results via ProcessResultLoader

External Format

When convert_process is executed, the evaluation results are saved as a CSV file with the evaluation indices as the header of the CSV.

This file describes the evaluation of the prediction result of the component.

accuracy_weighted_average,classification_error_weighted_average,precision_weighted_average,recall_weighted_average,specificity_weighted_average,false_positive_rate_weighted_average,false_negative_rate_weighted_average,f_measure_weighted_average,true_positive_Iris-setosa,false_positive_Iris-setosa,true_negative_Iris-setosa,false_negative_Iris-setosa,accuracy_Iris-setosa,classification_error_Iris-setosa,precision_Iris-setosa,recall_Iris-setosa,specificity_Iris-setosa,false_positive_rate_Iris-setosa,false_negative_rate_Iris-setosa,f_measure_Iris-setosa,true_positive_Iris-versicolor,false_positive_Iris-versicolor,true_negative_Iris-versicolor,false_negative_Iris-versicolor,accuracy_Iris-versicolor,classification_error_Iris-versicolor,precision_Iris-versicolor,recall_Iris-versicolor,specificity_Iris-versicolor,false_positive_rate_Iris-versicolor,false_negative_rate_Iris-versicolor,f_measure_Iris-versicolor,true_positive_Iris-virginica,false_positive_Iris-virginica,true_negative_Iris-virginica,false_negative_Iris-virginica,accuracy_Iris-virginica,classification_error_Iris-virginica,precision_Iris-virginica,recall_Iris-virginica,specificity_Iris-virginica,false_positive_rate_Iris-virginica,false_negative_rate_Iris-virginica,f_measure_Iris-virginica,cf_Iris-setosa_Iris-setosa,cf_Iris-setosa_Iris-versicolor,cf_Iris-setosa_Iris-virginica,cf_Iris-versicolor_Iris-setosa,cf_Iris-versicolor_Iris-versicolor,cf_Iris-versicolor_Iris-virginica,cf_Iris-virginica_Iris-setosa,cf_Iris-virginica_Iris-versicolor,cf_Iris-virginica_Iris-virginica
8.222222e-01,1.777778e-01,8.194444e-01,7.333333e-01,8.666667e-01,1.333333e-01,2.666667e-01,6.705517e-01,5,3,7,0,8.000000e-01,2.000000e-01,6.250000e-01,1.000000e+00,7.000000e-01,3.000000e-01,0.000000e+00,7.692308e-01,5,1,9,0,9.333333e-01,6.666667e+00,8.333333e-01,1.000000e+00,9.000000e-01,1.000000e-01,0.000000e+00,9.090909e-01,1,0,10,4,7.333333e-01,2.666667e-01,1.000000e+00,2.000000e-01,1.000000e+00,0.000000e+00,8.000000e-01,3.333333e-01,5,0,0,0,5,0,3,1,1

Details

If a data set has samples with missing or +/-Inf values, this component ignores those samples.