SVMCl Component Specification¶
Contents
Overview¶
SVMCl component is a binary linear classification component using liblinear library. This component currently supports the following solvers:
L2-regularized L2-loss support vector classification (dual)
L2-regularized L2-loss support vector classification (primal)
L2-regularized L1-loss support vector classification (dual)
L1-regularized L2-loss support vector classification
Example:
SPD:
# svmcl.spd dl1 -> svmcl1 --- components: dl1: component: DataLoader svmcl1: component: SVMClComponent features: name == 'Sepal.Length' or name == 'Sepal.Width' target: name == 'Species' positive_label: 'versicolor' solver_type: 'L1R_L2LOSS_SVC' epsilon: 0.01 parameter_c: 1 bias: 1.0 weight: [1.0, 1.0] global_settings: keep_attributes: - 'Species' feature_exclude: - 'Species'
Input of the component:
_sid
Sepal.Length
Sepal.Width
Species
0
4.9
2.5
virginica
1
6.2
2.8
virginica
2
7.2
3.6
virginica
…
…
…
…
28
6.2
2.9
versicolor
29
6.7
3.1
versicolor
Output of the component:
_sid
svmcl1_actual
svmcl1_predict
svmcl1_score
0
-1
1
2.657069e+00
1
-1
1
6.524541e-01
2
-1
-1
-1.600153e+00
…
…
…
…
28
1
1
6.524541e-01
29
1
-1
-1.080094e+00
This component has component-specific external formats for model and prediction result evaluation.
See also
Component-common external format files in convert_process
Parameters¶
Here are the component-specific parameters for the SVMCl component.
SPD¶
The following parameters are for “components” section of SPD.
Parameter Name |
Type |
Domain |
Default Value |
Description |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
positive_label 1 |
str |
See Description |
– |
Choose one value from the target attribute to be considered as positive.
The domain of this parameter corresponds to that of the target attribute.
|
||||||||||
solver_type |
str |
See Description |
‘L1R_L2LOSS_SVC’ |
Specifies the solver type from the following types:
|
||||||||||
epsilon |
float |
(0, inf) |
See Description |
Set tolerance of termination criterion. Default value of this parameter depends on
|
||||||||||
parameter_c |
float |
(0, inf) |
1 |
Set the parameter C; C is the cost of constraints violation. |
||||||||||
bias |
float |
[0, inf) |
– |
If bias >= 0, then instance x becomes [x; bias] |
||||||||||
weight |
list consists of two float values |
(0, inf) for each element |
– |
Weights adjust the parameter C for each class. The weights correspond in order of positive class, negative class. |
- 1
Required parameter
Output Attributes¶
SVMCl component generates the following attributes:
Attribute Name |
Scale |
Description |
---|---|---|
<component_id>_actual |
INTEGER |
Binarized values of target attribute based on |
<component_id>_predict |
INTEGER |
Predicted values. |
<component_id>_score |
REAL |
A prediction result can be obtained by classifying this values according to a boundary. |
These attributes are in the component output data. These can be loaded in SAMPO API.
See also
Obtaining process results via ProcessResultLoader.
When convert_process is executed, the component output data will be saved in <component_id>_predict_result.csv.
This file describes a prediction result by the component:
_sid,svmcl1_actual,svmcl1_predict,svmcl1_score
0,1,1,8.554352e-01
1,1,1,1.272770e+00
2,1,1,1.168148e+00
3,1,1,1.428549e+00
...
36,-1,-1,-1.363943e+00
37,-1,-1,-1.205856e+00
38,-1,-1,-4.361886e-01
39,-1,-1,-1.260474e+00
Attribute Metadata¶
The metadata of the output attributes is created with the following rules.
Context Rule¶
Attribute Name |
Context Name |
Description |
---|---|---|
All the output attributes of this component |
field_path |
List of the superordinate concepts of each output attribute based on the following hierarchical structure of the output attributes: root
└── binary_classification
├── actual
├── predict
└── score
|
<component_id>_actual, <component_id>_predict |
positive_map |
Mapping between a positive value and a positive label. |
<component_id>_actual, <component_id>_predict |
negative_map |
Mapping between a negative value and a negative label. |
Derivation Rule¶
Attribute Name |
Derived From |
---|---|
<component_id>_actual |
Derived from the target attribute. |
<component_id>_predict |
Derived from the attributes which have non-zero coefficients in any prediction formula. |
<component_id>_score |
Derived from the attributes which have non-zero coefficients in any prediction formula. |
Example¶
{
"nodes": [
{"aid": "dl1[1]", "name": "sepal_width_in_cm", "scale": "real", "is_excluded": false,
"cid": "dl1", "cindex": 1, "values": null, "is_kept": false, "context": null},
{"aid": "svmcl1[1]", "name": "svmcl1_predict", "scale": "integer", "is_excluded": false,
"cid": "svmcl1", "cindex": 1, "values": null, "is_kept": false,
"context":
{"field_path": ["binary_classification", "predict"],
"positive_map": {"1": ["Iris-setosa"]},
"negative_map": {"-1": ["Iris-versicolor"]}}},
{"aid": "dl1[0]", "name": "sepal_length_in_cm", "scale": "real", "is_excluded": false,
"cid": "dl1", "cindex": 0, "values": null, "is_kept": false, "context": null},
{"aid": "dl1[2]", "name": "petal_length_in_cm", "scale": "real", "is_excluded": false,
"cid": "dl1", "cindex": 2, "values": null, "is_kept": false, "context": null},
{"aid": "svmcl1[2]", "name": "svmcl1_score", "scale": "real", "is_excluded": false,
"cid": "svmcl1", "cindex": 2, "values": null, "is_kept": false,
"context":
{"field_path": ["binary_classification", "score"]}},
{"aid": "_sid", "name": "_sid", "scale": "integer", "is_excluded": false,
"cid": null, "cindex": 0, "values": null, "is_kept": false, "context": null},
{"aid": "svmcl1[0]", "name": "svmcl1_actual", "scale": "integer", "is_excluded": false,
"cid": "svmcl1", "cindex": 0, "values": null, "is_kept": false,
"context":
{"field_path": ["binary_classification", "actual"],
"positive_map": {"1": ["Iris-setosa"]},
"negative_map": {"-1": ["Iris-versicolor"]}}},
{"aid": "dl1[3]", "name": "petal_width_in_cm", "scale": "real", "is_excluded": false,
"cid": "dl1", "cindex": 3, "values": null, "is_kept": false, "context": null},
{"aid": "dl1[4]", "name": "class", "scale": "nominal", "is_excluded": true,
"cid": "dl1", "cindex": 4, "values": ["Iris-setosa", "Iris-versicolor"],
"is_kept": true, "context": null}
],
"links": [
{"source": "dl1[0]", "target": "svmcl1[1]"},
{"source": "dl1[0]", "target": "svmcl1[2]"},
{"source": "dl1[2]", "target": "svmcl1[1]"},
{"source": "dl1[2]", "target": "svmcl1[2]"},
{"source": "dl1[4]", "target": "svmcl1[0]"}
]
}
See also
Attribute metadata file format in Attribute Metadata File Specification
Model¶
The model of this component can be described by its parameters.
SVMCl Model Parameters |
Type |
Domain |
Description |
---|---|---|---|
prediction_formula |
pandas.DataFrame |
See Description |
DataFrame containing the weight of each feature and the bias. |
When loaded in the SAMPO API, the model is represented as a dict of its parameters.
See also
Obtaining process results via ProcessResultLoader.
{'prediction_formula':
sepal_length_in_cm 0.2281450981660121
petal_length_in_cm -0.9329267820373003
bias 1.253715607666645
dtype: int64}
External Format¶
This file describes the weights of each attribute:
aid,attr_name,prediction_formula
dl1[0],sepal_length_in_cm,0.2281450981660121
dl1[2],petal_length_in_cm,-0.9329267820373003
,bias,1.253715607666645
Prediction Result Evaluation¶
The indices used in evaluating prediction results of this component are described below.
Evaluation Index |
Type |
Description |
---|---|---|
true_positive |
int |
Number of samples determined as positive correctly (TP). |
false_positive |
int |
Number of samples determined as positive incorrectly (FP). |
true_negative |
int |
Number of samples determined as negative correctly (TN). |
false_negative |
int |
Number of samples determined as negative incorrectly (FN). |
accuracy |
float |
Proportion of true results in the population as shown below:
\(\frac{\mbox{TP} + \mbox{TN}}{\mbox{TP} + \mbox{FP} + \mbox{TN} + \mbox{FN}}\)
|
classification_error |
float |
Proportion of false results in the population as shown below:
\(\frac{\mbox{FP} + \mbox{FN}}{\mbox{TP} + \mbox{FP} + \mbox{TN} + \mbox{FN}} = 1 - \mbox{accuracy}\)
|
precision |
float |
Proportion of the
true_positive against all samples determined as positive as shown below:\(\frac{\mbox{TP}}{\mbox{TP} + \mbox{FP}}\)
|
recall |
float |
Proportion of the
true_positive against all the actual positive samples as shown below:\(\frac{\mbox{TP}}{\mbox{TP} + \mbox{FN}}\)
|
specificity |
float |
Proportion of the
true_negative against all the actual negative samples as shown below:\(\frac{\mbox{TN}}{\mbox{TN} + \mbox{FP}}\)
|
false_positive_rate |
float |
Proportion of the
false_positive against all the actual negative samples as shown below:\(\frac{\mbox{FP}}{\mbox{TN} + \mbox{FP}} = 1 - \mbox{specificity}\)
|
false_negative_rate |
float |
Proportion of the
false_negative against all the actual positive samples as shown below:\(\frac{\mbox{FN}}{\mbox{TP} + \mbox{FN}} = 1 - \mbox{recall}\)
|
f_measure |
float |
Harmonic mean of
precision and recall as shown below:\(\frac{2 \times \mbox{precision} \times \mbox{recall}}{\mbox{precision} + \mbox{recall}}\)
|
auc |
float |
Area under ROC (Receiver Operating Characteristic) curve. |
area_under_precision_recall |
float |
Area under PR (Precision-Recall) curve. |
When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded with the evaluation indices as the columns of the DataFrame.
See also
Obtaining process results via ProcessResultLoader
External Format¶
When convert_process is executed, the evaluation results are saved as a CSV file with the evaluation indices as the header of the CSV.
This file describes the evaluation for a prediction result by the component:
true_positive,false_positive,true_negative,false_negative,accuracy,classification_error,precision,recall,specificity,false_positive_rate,false_negative_rate,f_measure,auc,area_under_precision_recall
30,0,30,0,1.000000e+00,0.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,0.000000e+00,0.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00