SVMCl Component Specification

Overview

SVMCl component is a binary linear classification component using liblinear library. This component currently supports the following solvers:

  • L2-regularized L2-loss support vector classification (dual)

  • L2-regularized L2-loss support vector classification (primal)

  • L2-regularized L1-loss support vector classification (dual)

  • L1-regularized L2-loss support vector classification

Example:

  • SPD:

    # svmcl.spd
    
    dl1 -> svmcl1
    
    ---
    
    components:
        dl1:
            component: DataLoader
        svmcl1:
            component: SVMClComponent
            features: name == 'Sepal.Length' or name == 'Sepal.Width'
            target: name == 'Species'
            positive_label: 'versicolor'
            solver_type: 'L1R_L2LOSS_SVC'
            epsilon: 0.01
            parameter_c: 1
            bias: 1.0
            weight: [1.0, 1.0]
    
    global_settings:
        keep_attributes:
            - 'Species'
        feature_exclude:
            - 'Species'
    
  • Input of the component:

_sid

Sepal.Length

Sepal.Width

Species

0

4.9

2.5

virginica

1

6.2

2.8

virginica

2

7.2

3.6

virginica

28

6.2

2.9

versicolor

29

6.7

3.1

versicolor


  • Output of the component:

_sid

svmcl1_actual

svmcl1_predict

svmcl1_score

0

-1

1

2.657069e+00

1

-1

1

6.524541e-01

2

-1

-1

-1.600153e+00

28

1

1

6.524541e-01

29

1

-1

-1.080094e+00

This component has component-specific external formats for model and prediction result evaluation.

See also

Component-common external format files in convert_process


Parameters

Here are the component-specific parameters for the SVMCl component.

SPD

The following parameters are for “components” section of SPD.

Parameter Name

Type

Domain

Default Value

Description

positive_label 1

str

See Description

Choose one value from the target attribute to be considered as positive.
The domain of this parameter corresponds to that of the target attribute.

solver_type

str

See Description

‘L1R_L2LOSS_SVC’

Specifies the solver type from the following types:

Solver Type

Description

L2R_L2LOSS_SVC_DUAL

L2-regularized L2-loss support vector classification (dual)

L2R_L2LOSS_SVC

L2-regularized L2-loss support vector classification (primal)

L2R_L1LOSS_SVC_DUAL

L2-regularized L1-loss support vector classification (dual)

L1R_L2LOSS_SVC

L1-regularized L2-loss support vector classification

epsilon

float

(0, inf)

See Description

Set tolerance of termination criterion. Default value of this parameter depends on solver_type.

  • L2R_L2LOSS_SVC_DUAL or L2R_L1LOSS_SVC_DUAL

    • Dual maximal violation <= eps; similar to libsvm (default 0.1)

  • L2R_L2LOSS_SVC

    • |f’(w)|_2 <= eps*min(pos,neg)/l*|f’(w0)|_2,

    • where f is the primal function and pos/neg are # of positive/negative data (default 0.01).

  • L1R_L2LOSS_SVC

    • |f’(w)|_inf <= eps*min(pos,neg)/l*|f’(w0)|_inf, where f is the primal function (default 0.01).

parameter_c

float

(0, inf)

1

Set the parameter C; C is the cost of constraints violation.

bias

float

[0, inf)

If bias >= 0, then instance x becomes [x; bias]

weight

list consists of two float values

(0, inf) for each element

Weights adjust the parameter C for each class. The weights correspond in order of positive class, negative class.

1

Required parameter


Utilizable Sample Metadata

There are no component-specific sample metadata available.


Output Attributes

SVMCl component generates the following attributes:

Attribute Name

Scale

Description

<component_id>_actual

INTEGER

Binarized values of target attribute based on positive_label.

<component_id>_predict

INTEGER

Predicted values.

<component_id>_score

REAL

A prediction result can be obtained by classifying this values according to a boundary.

These attributes are in the component output data. These can be loaded in SAMPO API.

See also

Obtaining process results via ProcessResultLoader.

When convert_process is executed, the component output data will be saved in <component_id>_predict_result.csv.

This file describes a prediction result by the component:

_sid,svmcl1_actual,svmcl1_predict,svmcl1_score
0,1,1,8.554352e-01
1,1,1,1.272770e+00
2,1,1,1.168148e+00
3,1,1,1.428549e+00
...
36,-1,-1,-1.363943e+00
37,-1,-1,-1.205856e+00
38,-1,-1,-4.361886e-01
39,-1,-1,-1.260474e+00

Attribute Metadata

The metadata of the output attributes is created with the following rules.

Context Rule

Attribute Name

Context Name

Description

All the output attributes of this component

field_path

List of the superordinate concepts of each output attribute based on the following hierarchical structure of the output attributes:

root
└── binary_classification
   ├── actual
   ├── predict
   └── score

<component_id>_actual, <component_id>_predict

positive_map

Mapping between a positive value and a positive label.

<component_id>_actual, <component_id>_predict

negative_map

Mapping between a negative value and a negative label.

Derivation Rule

Attribute Name

Derived From

<component_id>_actual

Derived from the target attribute.

<component_id>_predict

Derived from the attributes which have non-zero coefficients in any prediction formula.

<component_id>_score

Derived from the attributes which have non-zero coefficients in any prediction formula.

Example

{
    "nodes": [
        {"aid": "dl1[1]", "name": "sepal_width_in_cm", "scale": "real", "is_excluded": false,
         "cid": "dl1", "cindex": 1, "values": null, "is_kept": false, "context": null},
        {"aid": "svmcl1[1]", "name": "svmcl1_predict", "scale": "integer", "is_excluded": false,
         "cid": "svmcl1", "cindex": 1, "values": null, "is_kept": false,
         "context":
             {"field_path": ["binary_classification", "predict"],
              "positive_map": {"1": ["Iris-setosa"]},
              "negative_map": {"-1": ["Iris-versicolor"]}}},
        {"aid": "dl1[0]", "name": "sepal_length_in_cm", "scale": "real", "is_excluded": false,
         "cid": "dl1", "cindex": 0, "values": null, "is_kept": false, "context": null},
        {"aid": "dl1[2]", "name": "petal_length_in_cm", "scale": "real", "is_excluded": false,
         "cid": "dl1", "cindex": 2, "values": null, "is_kept": false, "context": null},
        {"aid": "svmcl1[2]", "name": "svmcl1_score", "scale": "real", "is_excluded": false,
         "cid": "svmcl1", "cindex": 2, "values": null, "is_kept": false,
         "context":
             {"field_path": ["binary_classification", "score"]}},
        {"aid": "_sid", "name": "_sid", "scale": "integer", "is_excluded": false,
         "cid": null, "cindex": 0, "values": null, "is_kept": false, "context": null},
        {"aid": "svmcl1[0]", "name": "svmcl1_actual", "scale": "integer", "is_excluded": false,
         "cid": "svmcl1", "cindex": 0, "values": null, "is_kept": false,
         "context":
             {"field_path": ["binary_classification", "actual"],
              "positive_map": {"1": ["Iris-setosa"]},
              "negative_map": {"-1": ["Iris-versicolor"]}}},
        {"aid": "dl1[3]", "name": "petal_width_in_cm", "scale": "real", "is_excluded": false,
         "cid": "dl1", "cindex": 3, "values": null, "is_kept": false, "context": null},
        {"aid": "dl1[4]", "name": "class", "scale": "nominal", "is_excluded": true,
         "cid": "dl1", "cindex": 4, "values": ["Iris-setosa", "Iris-versicolor"],
         "is_kept": true, "context": null}
    ],
    "links": [
        {"source": "dl1[0]", "target": "svmcl1[1]"},
        {"source": "dl1[0]", "target": "svmcl1[2]"},
        {"source": "dl1[2]", "target": "svmcl1[1]"},
        {"source": "dl1[2]", "target": "svmcl1[2]"},
        {"source": "dl1[4]", "target": "svmcl1[0]"}
    ]
}

See also

Attribute metadata file format in Attribute Metadata File Specification


Model

The model of this component can be described by its parameters.

SVMCl Model Parameters

Type

Domain

Description

prediction_formula

pandas.DataFrame

See Description

DataFrame containing the weight of each feature and the bias.

When loaded in the SAMPO API, the model is represented as a dict of its parameters.

See also

Obtaining process results via ProcessResultLoader.

{'prediction_formula':
    sepal_length_in_cm     0.2281450981660121
    petal_length_in_cm    -0.9329267820373003
    bias                   1.253715607666645
    dtype: int64}

External Format

This file describes the weights of each attribute:

aid,attr_name,prediction_formula
dl1[0],sepal_length_in_cm,0.2281450981660121
dl1[2],petal_length_in_cm,-0.9329267820373003
,bias,1.253715607666645

Prediction Result Evaluation

The indices used in evaluating prediction results of this component are described below.

Evaluation Index

Type

Description

true_positive

int

Number of samples determined as positive correctly (TP).

false_positive

int

Number of samples determined as positive incorrectly (FP).

true_negative

int

Number of samples determined as negative correctly (TN).

false_negative

int

Number of samples determined as negative incorrectly (FN).

accuracy

float

Proportion of true results in the population as shown below:

\(\frac{\mbox{TP} + \mbox{TN}}{\mbox{TP} + \mbox{FP} + \mbox{TN} + \mbox{FN}}\)

classification_error

float

Proportion of false results in the population as shown below:

\(\frac{\mbox{FP} + \mbox{FN}}{\mbox{TP} + \mbox{FP} + \mbox{TN} + \mbox{FN}} = 1 - \mbox{accuracy}\)

precision

float

Proportion of the true_positive against all samples determined as positive as shown below:

\(\frac{\mbox{TP}}{\mbox{TP} + \mbox{FP}}\)

recall

float

Proportion of the true_positive against all the actual positive samples as shown below:

\(\frac{\mbox{TP}}{\mbox{TP} + \mbox{FN}}\)

specificity

float

Proportion of the true_negative against all the actual negative samples as shown below:

\(\frac{\mbox{TN}}{\mbox{TN} + \mbox{FP}}\)

false_positive_rate

float

Proportion of the false_positive against all the actual negative samples as shown below:

\(\frac{\mbox{FP}}{\mbox{TN} + \mbox{FP}} = 1 - \mbox{specificity}\)

false_negative_rate

float

Proportion of the false_negative against all the actual positive samples as shown below:

\(\frac{\mbox{FN}}{\mbox{TP} + \mbox{FN}} = 1 - \mbox{recall}\)

f_measure

float

Harmonic mean of precision and recall as shown below:

\(\frac{2 \times \mbox{precision} \times \mbox{recall}}{\mbox{precision} + \mbox{recall}}\)

auc

float

Area under ROC (Receiver Operating Characteristic) curve.

area_under_precision_recall

float

Area under PR (Precision-Recall) curve.

When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded with the evaluation indices as the columns of the DataFrame.

See also

Obtaining process results via ProcessResultLoader

External Format

When convert_process is executed, the evaluation results are saved as a CSV file with the evaluation indices as the header of the CSV.

This file describes the evaluation for a prediction result by the component:

true_positive,false_positive,true_negative,false_negative,accuracy,classification_error,precision,recall,specificity,false_positive_rate,false_negative_rate,f_measure,auc,area_under_precision_recall
30,0,30,0,1.000000e+00,0.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,0.000000e+00,0.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00

Details

If a data set has samples with missing or +/-Inf values, this component ignores those samples.