sampo_ps convert

Overview

Warning

sampo_ps convert command is deprecated.

sampo_ps convert converts a stored data in a ProcessStore from a SAMPO-internal format to a human-readable external format.

The convertible data and their corresponding external format are shown in the table below:

Convertible Data

External Format

SPD (SAMPO Process Description)

Same as SPD input file format. See the SPD File Specification.

SRC (SAMPO Run Configuration)

Same as SRC input file format. See the SRC File Specification.

ASD (Attributes Schema Description)

Same as ASD input file format. See the ASD File Specification. The attributes in the ASD are component-specific. See component specification.

Attribute metadata

See Attribute Metadata File Specification

Selected attributes

See Selected Attributes File Specification

Model

A component-specific format. See each component specification.

Component output data

Same as SAMPO CSV input file format. See the SAMPO CSV File Specification. The attributes in the output data are component-specific. See component specification.

Prediction result evaluation

A component-specific format. See component specification.


Synopsis

See sampo_ps command help:

$ sampo_ps convert --help

Examples

You can choose to convert only files in a specific process or component, all the convertible files in a process, and so on.

  • Converting data in a process.

    All the convertible files in the process my_process_a are output.

    • Command:

      $ sampo_ps convert -s file:///var/process_store_storage -l "my_process_a/*" -o output_dir/
      
    • Output:

      output_dir
      └── my_process_a
          ├── attr_metadata
          │   └── attr_metadata.json
          ├── components
          │   ├── dl1
          │   │   └── component_output_data
          │   │       ├── data.asd
          │   │       └── data.csv
          │   └── fabreg1
          │       ├── comp_output_data
          │       │   └── rg_predict_result.csv
          │       ├── comp_output_evaluation
          │       │   └── comp_output_evaluation.csv
          │       ├── model
          │       │   ├── fabhmerg_info.csv
          │       │   ├── gate_tree.json
          │       │   └── prediction_formulas.csv
          │       └── selected_attrs
          │           └── selected_attrs.json
          ├── spd
          │   └── my_process_a.spd
          └── src
              └── my_process_a_predict.src
      

  • Converting data of a specific component in a process.

    All the convertible files in the component fabreg1 in the process my_process_a are output along with attr_metadata, spd, and src.

    • Command:

      $ sampo_ps convert -s file:///var/process_store_storage -l "my_process_a/fabreg1" -o output_dir/
      
    • Output:

      output_dir
      └── my_process_a
          ├── attr_metadata
          │   └── attr_metadata.json
          ├── components
          │   └── fabreg1
          │       ├── comp_output_data
          │       │   └── rg_predict_result.csv
          │       ├── comp_output_evaluation
          │       │   └── comp_output_evaluation.csv
          │       ├── model
          │       │   ├── fabhmerg_info.csv
          │       │   ├── gate_tree.json
          │       │   └── prediction_formulas.csv
          │       └── selected_attrs
          │           └── selected_attrs.json
          ├── spd
          │   └── my_process_a.spd
          └── src
              └── my_process_a_predict.src
      

  • Converting all the convertible files in a ProcessStore.

    All the convertible files in the ProcessStore are output.

    • Command:

      $ sampo_ps convert -s file:///var/process_store_storage -o output_dir/
      
    • Output:

      output_dir
      ├── my_process_a
      │   ├── attr_metadata
      │   │   └── attr_metadata.json
      │   ├── components
      │   │   ├── dl1
      │   │   │   └── ...
      │   │   └── fabreg1
      │   │       ├── ...
      │   │       .
      │   │       .
      │   │       .
      │   ├── spd
      │   │   └── my_process_a.spd
      │   └── src
      │       └── my_process_a_predict.src
      ├── my_process_b
      │   ├── ...
      │   .
      │   .
      │   .
      ├── my_process_c
      │   ├── ...
      .   .
      .   .
      .   .
      

  • Converting process data in a flat directory structure.

    The converted files are output directly beneath the output_dir.

    • Command:

      $ sampo_ps convert -s "file:///var/process_store_storage" -o output_dir/ --flat
      
    • Output:

      $ tree output_dir
      output_dir/
      ├── my_process_a_learn_attr_metadata.json
      ├── my_process_a_learn_dl1_data.asd
      ├── my_process_a_learn_dl1_data.csv
      ├── my_process_a_learn_fabreg1_comp_output_evaluation.csv
      ├── my_process_a_learn_fabreg1_fabhmerg_info.csv
      ├── my_process_a_learn_fabreg1_gate_tree.json
      ├── my_process_a_learn_fabreg1_prediction_formulas.csv
      ├── my_process_a_learn_fabreg1_rg_predict_result.csv
      ├── my_process_a_learn_fabreg1_selected_attrs.json
      ├── my_process_a_learn_my_process_a.spd
      ├── my_process_a_learn_my_process_a_predict.src
      ├── my_process_b_learn_attr_metadata.json
      ├── ...
      ├── my_process_c_predict_attr_metadata.json
      ├── ...
      .
      .
      .
      

Output Format

Attribute Metadata File Format

  • Attribute Metadata File describes the metadata of attributes and the derivation relations in a process.

  • Attribute matadata is represented by DAG (Directed Acyclic Graph) structure, consisted of nodes and links.
    • Nodes section represents the information of each attribute.

    • Links section represents derivation relationships of attributes.

  • The file follows the JSON format.

Example:

{
    "nodes": [
        {"aid": "dl1[0]", "name": "A", "scale": "integer", "is_excluded": false,"cid": "dl1",
         "cindex": 0, "values": null, "is_kept": true, "context": null},
         {"aid": "dl1[1]", "name": "B", "scale": "nominal", "is_excluded": true, "cid": "dl1",
         "cindex": 1, "values": ["A", "B", "O"], "is_kept": true, "context": null},
         {"aid": "rg1[0]", "name": "actual", "scale": "real", "is_excluded": false, "cid": "rg1",
         "cindex": 0, "values": null, "is_kept": false, "context": {"field_path": ["regression", "actual"]}}
    ],
     "links": [
        {"source": "dl1[0]", "target": "rg1[0]"}
    ]
}

Nodes Section

Nodes section represents the information of all attributes generated in a process.

Each property of attributes is defined as follows:

Property

Description

aid

Attribute ID.

name

Attribute name.

scale

Scale of the attribute.

is_excluded

Whether the attribute is excluded as a feature or not.

cid

ID of the component by which the attribute was generated.

cindex

Index of the attribute in the component by which the attribute was generated.

values

Domain of NOMINAL attribute. (null if the scale is not NOMINAL.)

is_kept

Whether the attribute is kept or not even after running every component.

context

Context information of the attribute.

Selected Attributes File Format

  • Selected Attributes contains the information of attributes which a learning component selected.

  • The file follows the JSON format.

Example:

{
    "selected_features": [
        {"aid": "dl1[0]", "name": "A", "scale": "integer", "is_excluded": false,
         "cid": "dl1", "cindex": 0, "values": null, "is_kept": false, "context": null},
        {"aid": "dl1[1]", "name": "B", "scale": "real", "is_excluded": false,
         "cid": "dl1", "cindex": 1, "values": null, "is_kept": false, "context": null},
        {"aid": "dl1[2]", "name": "C", "scale": "real", "is_excluded": false,
         "cid": "dl1", "cindex": 2, "values": null, "is_kept": false, "context": null}
    ],
    "selected_targets": [
        {"aid": "dl1[3]", "name": "Z", "scale": "integer", "is_excluded": true,
         "cid": "dl1", "cindex": 3, "values": null, "is_kept": true, "context": null}
    ]
}

Selected Features Section

Selected Features Section describes attributes information selected as features.

Each property of attributes is defined as well as that in Nodes Section of Attribute Metadata.

Selected Targets Section

Selected Targets Section describes attributes information selected as targets.

Each property of attributes is defined as well as that in Nodes Section of Attribute Metadata.