LinearHSICFS Component Specification

Overview

LinearHSICFS component is a feature selector. This component calculates Linear HSIC (Hilbert-Schmidt Independence Criterion) of the each input features, and outputs features with high score of Linear HSIC. The scale of the input must be INTEGER or REAL.

Example:

  • SPD:

    dl1 -> ts1 -> fs1
    
    ---
    
    components:
        dl1:
            component: DataLoader
    
        ts1:
            component: TimeshiftFDComponent
            features: scale == 'integer' or scale == 'real'
            shift: [["name == 'humidity'", [3, -2]],
                    ["name == 'temperature'", [-1]]]
    
        fs1:
            component: LinearHSICFSComponent
            features: all()
            target: name == 'price'
            max_num_output_features: 2
    
    global_settings:
        keep_attributes:
            - 'price'
        feature_exclude:
            - 'price'
    
  • Input of the fs1 component:

_sid

ts1(3)_humidity

ts1(-2)_humidity

ts1(-1)_temperature

price

0

73

NaN

NaN

100

1

66

NaN

22.3

110

2

NaN

89

21.8

90

3

NaN

88

22.8

100

4

NaN

50

23.4

120

  • Linear HSIC score of the each input features in fs1:

ts(3)_humidity

ts(-2)_humidity

ts(-1)_temperature

0.038

0.488

0.505

  • Output of the fs1 component:

    Since the value of max_num_output_features is 2, fs1 outputs ts1(-2)_humidity and ts(-1)_temperature.

_sid

fs1_ts1(-2)_humidity

fs1_ts1(-1)_temperature

0

NaN

NaN

1

NaN

22.3

2

89

21.8

3

88

22.8

4

50

23.4

This component has no component-specific external formats.

See also

Component-common external format files in convert_process


Parameters

Here are the component-specific parameters for the LinearHSICFS component.

SPD

The following parameter is for “components” section of SPD.

Parameter Name

Type

Domain

Default Value

Description

max_num_output_features 1

int

[1, inf)

Specifies the upper limit number of the output features.

1

Required parameter

The target is required, that is used to calculate Linear HSIC of the each input features.

Output Attributes

LinearHSICFS component generates the following attributes:

Attribute Name

Scale

Description

<component_id>_<original_attribute_name>

Same as the scale of the original attribute.

Same as the value of the original attribute.

The resulting objects or converted output files contain these output attributes and other attributes specified in keep_attributes of the SPD.

See also

Obtaining process results via ProcessResultLoader.


Attribute Metadata

The metadata of the output attributes is created with the following rules.

Context Rule

Attribute Name

Context Name

Description

<component_id>_<original_attribute_name>

score

Set the calculation result of Linear HSIC.

Derivation Rule

Each new attribute is derived from the corresponding attribute selected by the Linear HSIC score of the input features.

Example

{
    "nodes": [
        {"aid": "_sid", "name": "_sid", "scale": "integer", ...},
        ...,
        {"aid": "dl1[0]", "name": "temperature", "scale": "real",    "is_excluded": false, ...},
        {"aid": "dl1[1]", "name": "humidity",    "scale": "integer", "is_excluded": false, ...},
        {"aid": "dl1[2]", "name": "price",       "scale": "real",    "is_excluded": true,  ...},
        ...,
        {"aid": "ts1[0]", "name": "ts1(3)_humidity",     "scale": "integer", "cid": "ts1", "cindex": 0, "context": {"shift": 3}, ...},
        {"aid": "ts1[1]", "name": "ts1(-2)_humidity",    "scale": "integer", "cid": "ts1", "cindex": 1, "context": {"shift": -2}, ...},
        {"aid": "ts1[2]", "name": "ts1(-1)_temperature", "scale": "real",    "cid": "ts1", "cindex": 2, "context": {"shift": -1}, ...},
        ...,
        {"aid": "fs1[0]", "name": "fs1_ts1(-2)_humidity", "scale": "integer", "cid": "fs1", "cindex": 0,
                         "context": {"score": 4.8804398568390484e-01}}

        {"aid": "fs1[1]", "name": "fs1_ts1(-1)_temperature", "scale": "real", "cid": "fs1", "cindex": 1,
                         "context": {"score": 5.0526028145921520e-01}},
    ],
    "links": [
        {"source": "dl1[1]", "target": "ts1[0]"},
        {"source": "dl1[1]", "target": "ts1[1]"},
        {"source": "dl1[0]", "target": "ts1[2]"},
        ...,
        {"source": "ts1[2]", "target": "fs1[1]"},
        {"source": "ts1[1]", "target": "fs1[0]"}
    ]
}

See also

Attribute metadata file format in Attribute Metadata File Specification


Model

The model of this component can be described by its fd_params.

fd_params

Type

Description

source_attr_names

list of string

A list of attribute names where the output attribute is derived from.

params

dict

The keys of this dictionary are the same as the context of this component’s Attribute Metadata.

These attributes are in the component output data. These can be loaded in SAMPO API or saved as data.csv after executing convert_process.

See also

Obtaining process results via ProcessResultLoader.

{'fd_params':
    [{'source_attr_names': ['humidity'], 'params': {}},
     {'source_attr_names': ['temperature'], 'params': {}}]}

Details

Nothing.