=============================
SVMCl Component Specification
=============================

.. contents:: Contents
    :local:

Overview
========
**SVMCl component** is a binary linear classification component using **liblinear** library.
This component currently supports the following solvers:

*  L2-regularized L2-loss support vector classification (dual)
*  L2-regularized L2-loss support vector classification (primal)
*  L2-regularized L1-loss support vector classification (dual)
*  L1-regularized L2-loss support vector classification

**Example**:

* SPD:

  .. code-block:: python

    # svmcl.spd

    dl1 -> svmcl1

    ---

    components:
        dl1:
            component: DataLoader
        svmcl1:
            component: SVMClComponent
            features: name == 'Sepal.Length' or name == 'Sepal.Width'
            target: name == 'Species'
            positive_label: 'versicolor'
            solver_type: 'L1R_L2LOSS_SVC'
            epsilon: 0.01
            parameter_c: 1
            bias: 1.0
            weight: [1.0, 1.0]

    global_settings:
        keep_attributes:
            - 'Species'
        feature_exclude:
            - 'Species'

* Input of the component:

 +--------+----------------+---------------+----------------+
 |   _sid |   Sepal.Length |   Sepal.Width |   Species      |
 +========+================+===============+================+
 | 0      | 4.9            | 2.5           | virginica      |
 +--------+----------------+---------------+----------------+
 | 1      | 6.2            | 2.8           | virginica      |
 +--------+----------------+---------------+----------------+
 | 2      | 7.2            | 3.6           | virginica      |
 +--------+----------------+---------------+----------------+
 | ...    | ...            | ...           | ...            |
 +--------+----------------+---------------+----------------+
 | 28     | 6.2            | 2.9           | versicolor     |
 +--------+----------------+---------------+----------------+
 | 29     | 6.7            | 3.1           | versicolor     |
 +--------+----------------+---------------+----------------+

|

* Output of the component:

 +----------+------------------+-------------------+----------------------+
 |   _sid   |   svmcl1_actual  |   svmcl1_predict  |   svmcl1_score       |
 +==========+==================+===================+======================+
 | 0        | -1               | 1                 | 2.657069e+00         |
 +----------+------------------+-------------------+----------------------+
 | 1        | -1               | 1                 | 6.524541e-01         |
 +----------+------------------+-------------------+----------------------+
 | 2        | -1               | -1                | -1.600153e+00        |
 +----------+------------------+-------------------+----------------------+
 | ...      | ...              | ...               | ...                  |
 +----------+------------------+-------------------+----------------------+
 | 28       |  1               | 1                 | 6.524541e-01         |
 +----------+------------------+-------------------+----------------------+
 | 29       |  1               | -1                | -1.080094e+00        |
 +----------+------------------+-------------------+----------------------+

This component has component-specific external formats for model and prediction result evaluation.

.. seealso::

    Component-common external format files in :ref:`convert_process`

|

Parameters
==========
Here are the component-specific parameters for the **SVMCl component**.

SPD
---

The following parameters are for "components" section of SPD.

.. list-table::
  :header-rows: 1
  :widths: 10, 5, 15, 10, 50

  * - Parameter Name
    - Type
    - Domain
    - Default Value
    - Description
  * - positive_label [1]_
    - str
    - See Description
    - --
    - | Choose one value from the target attribute to be considered as positive.
      | The domain of this parameter corresponds to that of the target attribute.
  * - solver_type
    - str
    - See Description
    - 'L1R_L2LOSS_SVC'
    - Specifies the solver type from the following types:

      .. list-table::
        :header-rows: 1
        :widths: 1, 2

        * - Solver Type
          - Description
        * - L2R_L2LOSS_SVC_DUAL
          - L2-regularized L2-loss support vector classification (dual)
        * - L2R_L2LOSS_SVC
          - L2-regularized L2-loss support vector classification (primal)
        * - L2R_L1LOSS_SVC_DUAL
          - L2-regularized L1-loss support vector classification (dual)
        * - L1R_L2LOSS_SVC
          - L1-regularized L2-loss support vector classification
  * - epsilon
    - float
    - (0, inf)
    - See Description
    -   Set tolerance of termination criterion. Default value of this parameter depends on ``solver_type``.

          - L2R_L2LOSS_SVC_DUAL or L2R_L1LOSS_SVC_DUAL

            - Dual maximal violation <= eps; similar to libsvm (default 0.1)

          - L2R_L2LOSS_SVC

            - \|f'(w)|\_2 <= eps*min(pos,neg)/l*|f'(w0)|_2,
            - where f is the primal function and pos/neg are # of
              positive/negative data (default 0.01).

          - L1R_L2LOSS_SVC

            - \|f'(w)|\_inf <= eps*min(pos,neg)/l*|f'(w0)|_inf,
              where f is the primal function (default 0.01).
  * - parameter_c
    - float
    - (0, inf)
    - 1
    - Set the parameter C; C is the cost of constraints violation.
  * - bias
    - float
    - [0, inf)
    - --
    - If bias >= 0, then instance x becomes [x; bias]
  * - weight
    - list consists of two float values
    - (0, inf) for each element
    - --
    - Weights adjust the parameter C for each class.
      The weights correspond in order of positive class, negative class.

.. [1] Required parameter

|

Utilizable Sample Metadata
==========================
There are no component-specific sample metadata available.

|

Output Attributes
=================
**SVMCl component** generates the following attributes:

.. list-table::
  :header-rows: 1
  :widths: 1,1,3

  * - Attribute Name
    - Scale
    - Description
  * - *<component_id>*\ _actual
    - INTEGER
    - Binarized values of target attribute based on ``positive_label``.
  * - *<component_id>*\ _predict
    - INTEGER
    - Predicted values.
  * - *<component_id>*\ _score
    - REAL
    - A prediction result can be obtained by classifying this values according to a boundary.

These attributes are in the component output data. These can be loaded in SAMPO API.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_.

When :ref:`convert_process` is executed,
the component output data will be saved in *<component_id>*\_predict_result.csv.

This file describes a prediction result by the component::

    _sid,svmcl1_actual,svmcl1_predict,svmcl1_score
    0,1,1,8.554352e-01
    1,1,1,1.272770e+00
    2,1,1,1.168148e+00
    3,1,1,1.428549e+00
    ...
    36,-1,-1,-1.363943e+00
    37,-1,-1,-1.205856e+00
    38,-1,-1,-4.361886e-01
    39,-1,-1,-1.260474e+00

|

Attribute Metadata
==================
The metadata of the output attributes is created with the following rules.

Context Rule
------------
.. list-table::
  :header-rows: 1
  :widths: 2,1,3

  * - Attribute Name
    - Context Name
    - Description
  * - All the output attributes of this component
    - field_path
    - List of the superordinate concepts of each output attribute based on the following hierarchical structure of the output attributes::

          root
          └── binary_classification
             ├── actual
             ├── predict
             └── score

  * - *<component_id>*\ _actual, *<component_id>*\ _predict
    - positive_map
    - Mapping between a positive value and a positive label.
  * - *<component_id>*\ _actual, *<component_id>*\ _predict
    - negative_map
    - Mapping between a negative value and a negative label.

Derivation Rule
---------------
.. list-table::
  :header-rows: 1
  :widths: 1,3

  * - Attribute Name
    - Derived From
  * - *<component_id>*\ _actual
    - Derived from the target attribute.
  * - *<component_id>*\ _predict
    - Derived from the attributes which have non-zero coefficients in any prediction formula.
  * - *<component_id>*\ _score
    - Derived from the attributes which have non-zero coefficients in any prediction formula.

Example
-------
.. code-block:: javascript

    {
        "nodes": [
            {"aid": "dl1[1]", "name": "sepal_width_in_cm", "scale": "real", "is_excluded": false,
             "cid": "dl1", "cindex": 1, "values": null, "is_kept": false, "context": null},
            {"aid": "svmcl1[1]", "name": "svmcl1_predict", "scale": "integer", "is_excluded": false,
             "cid": "svmcl1", "cindex": 1, "values": null, "is_kept": false,
             "context":
                 {"field_path": ["binary_classification", "predict"],
                  "positive_map": {"1": ["Iris-setosa"]},
                  "negative_map": {"-1": ["Iris-versicolor"]}}},
            {"aid": "dl1[0]", "name": "sepal_length_in_cm", "scale": "real", "is_excluded": false,
             "cid": "dl1", "cindex": 0, "values": null, "is_kept": false, "context": null},
            {"aid": "dl1[2]", "name": "petal_length_in_cm", "scale": "real", "is_excluded": false,
             "cid": "dl1", "cindex": 2, "values": null, "is_kept": false, "context": null},
            {"aid": "svmcl1[2]", "name": "svmcl1_score", "scale": "real", "is_excluded": false,
             "cid": "svmcl1", "cindex": 2, "values": null, "is_kept": false,
             "context":
                 {"field_path": ["binary_classification", "score"]}},
            {"aid": "_sid", "name": "_sid", "scale": "integer", "is_excluded": false,
             "cid": null, "cindex": 0, "values": null, "is_kept": false, "context": null},
            {"aid": "svmcl1[0]", "name": "svmcl1_actual", "scale": "integer", "is_excluded": false,
             "cid": "svmcl1", "cindex": 0, "values": null, "is_kept": false,
             "context":
                 {"field_path": ["binary_classification", "actual"],
                  "positive_map": {"1": ["Iris-setosa"]},
                  "negative_map": {"-1": ["Iris-versicolor"]}}},
            {"aid": "dl1[3]", "name": "petal_width_in_cm", "scale": "real", "is_excluded": false,
             "cid": "dl1", "cindex": 3, "values": null, "is_kept": false, "context": null},
            {"aid": "dl1[4]", "name": "class", "scale": "nominal", "is_excluded": true,
             "cid": "dl1", "cindex": 4, "values": ["Iris-setosa", "Iris-versicolor"],
             "is_kept": true, "context": null}
        ],
        "links": [
            {"source": "dl1[0]", "target": "svmcl1[1]"},
            {"source": "dl1[0]", "target": "svmcl1[2]"},
            {"source": "dl1[2]", "target": "svmcl1[1]"},
            {"source": "dl1[2]", "target": "svmcl1[2]"},
            {"source": "dl1[4]", "target": "svmcl1[0]"}
        ]
    }

.. seealso::

    Attribute metadata file format in :ref:`Attribute Metadata File Specification <attribute-metadata>`

|

Model
=====
The model of this component can be described by its parameters.

.. list-table::
  :header-rows: 1
  :widths: 2,1,1,3

  * - SVMCl Model Parameters
    - Type
    - Domain
    - Description
  * - prediction_formula
    - pandas.DataFrame
    - See Description
    - DataFrame containing the weight of each feature and the bias.

When loaded in the SAMPO API, the model is represented as a dict of its parameters.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_.

::

    {'prediction_formula':
        sepal_length_in_cm     0.2281450981660121
        petal_length_in_cm    -0.9329267820373003
        bias                   1.253715607666645
        dtype: int64}


External Format
---------------
This file describes the weights of each attribute::

    aid,attr_name,prediction_formula
    dl1[0],sepal_length_in_cm,0.2281450981660121
    dl1[2],petal_length_in_cm,-0.9329267820373003
    ,bias,1.253715607666645

|

Prediction Result Evaluation
============================

.. include:: ./fabhme/cl_predict_result_evaluation_indices.rst

When obtaining these evaluation results in SAMPO API, a pandas.DataFrame is loaded
with the evaluation indices as the columns of the DataFrame.

.. seealso::

    Obtaining process results via `ProcessResultLoader <../../api/process_result_loader.html>`_

External Format
---------------
When :ref:`convert_process` is executed, the evaluation results
are saved as a CSV file with the evaluation indices as the header of the CSV.

This file describes the evaluation for a prediction result by the component::

    true_positive,false_positive,true_negative,false_negative,accuracy,classification_error,precision,recall,specificity,false_positive_rate,false_negative_rate,f_measure,auc,area_under_precision_recall
    30,0,30,0,1.000000e+00,0.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,0.000000e+00,0.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00

|

Details
=======
If a data set has samples with missing or +/-Inf values, this component ignores those samples.
