sampo.api.process_store¶
create¶
-
sampo.api.process_store.
create
(url)¶ Creates a process store.
- Parameters
- urlstr
Process store URL.
- Raises
- ValidationError
If url is not str.
Examples
>>> from sampo.api import process_store >>> process_store.create('file:///var/process_store_storage')
open_process¶
-
sampo.api.process_store.
open_process
(url, process_name)¶ Opens a process in the process store and returns a ProcessResultLoader object.
- Parameters
- urlstr
Process store URL.
- process_namestr or ProcessKey
A process name or a ProcessKey object.
- Returns
- prlProcessResultLoader
A ProcessResultLoader object.
- Raises
- ValidationError
If url is not string.
If process_name is neither string nor ProcessKey.
Examples
>>> from sampo.api import process_store >>> with process_store.open_process('tmpstore', 'fabhmerg_predict') as prl: ... output_df = prl.load_comp_output('fabrg1')
list_process_metadata¶
-
sampo.api.process_store.
list_process_metadata
(url, all=False)¶ Lists the process metadata of the processes in a process store as a pandas.DataFrame.
- Parameters
- urlstr
Process store URL.
- allbool
Displays all processes including in-progress processes and previous versions of processes in the process store.
- Returns
- list_dfpandas.DataFrame
DataFrame that lists the processes from the process store and their metadata.
Columns:
- process name <dtype: object>
Name of the process.
- version <dtype: object>
Version of the process.
- started at <dtype: datetime64[ns]>
Date and time that the process started.
- running time <dtype: timedelta64[ns]>
Time that the process has taken to run. If the status is In progress, this column is NaT.
- status <dtype: object>
- There are three statuses:
Succeeded : Process has finished successfully.
Failed : Process has failed and finished.
In progress : Process is in execution.
- Raises
- ValidationError
If url is not str.
If all is not bool.
Examples
Example1: Listing current versions of processes in the process store.¶
In [1]:from sampo.api import process_store pstore_url = './pstore' process_store.list_process_metadata(pstore_url)
Out[1]:Example2: Listing current and older versions of processes and in-progress processes.¶
In [2]:from sampo.api import process_store pstore_url = './pstore' process_store.list_process_metadata(pstore_url, True)
Out[2]:
list_comp_metadata¶
-
sampo.api.process_store.
list_comp_metadata
(url, all=False)¶ Lists the component metadata of the processes in a process store as a pandas.DataFrame.
- Parameters
- urlstr
Process store URL.
- allbool
Displays all processes including in-progress processes and previous versions of processes in the process store.
- Returns
- list_dfpandas.DataFrame
DataFrame that lists the components of each process in the process store and their metadata.
Columns:
- process name <dtype: object>
Name of the process that the component belongs to.
- version <dtype: object>
Version of the process that the component belongs to.
- cid <dtype: object>
Component ID
- started at <dtype: datetime64[ns]>
Date and time that the component started.
- running time <dtype: timedelta64[ns]>
Time that the component has taken to run. If the status is In progress, this column is NaT.
- Raises
- ValidationError
If url is not str.
If all is not bool.
Examples
Example1: Listing the components of current versions of processes in the process store.¶
In [1]:from sampo.api import process_store pstore_url = './pstore' process_store.list_comp_metadata(pstore_url)
Out[1]:Example2: Listing the components of current and older versions of processes in the process store.¶
In [2]:from sampo.api import process_store pstore_url = './pstore' process_store.list_comp_metadata(pstore_url, all=True)
Out[2]:
remove_process¶
-
sampo.api.process_store.
remove_process
(url, process_name)¶ Removes process from the process store.
If the removed process was the current version, the current version will be assigned to the latest version which has the latest finished_at date in the process metadata.
Warning
If a process name or a ProcessKey with no version is specified, all versions will be removed.
- Parameters
- urlstr
Process store URL.
- process_namestr or ProcessKey
process name or ProcessKey object.
- Raises
- ValidationError
If url is not str.
If process_name is neither string nor ProcessKey.
Examples
Example1: Removing by ProcessKey¶
In [1]:# Initial state from sampo.api import process_store process_store.list_process_metadata('./pstore1', all=True)
Out[1]:In [2]:# Removes and lists result. process_store.remove_process('./pstore1', 'fabhmerg_learn.57120ab6-5e60-40da-95fe-a9b412b55ccf') process_store.list_process_metadata('./pstore1', all=True)
Out[2]:Example2: Removing all version processes by process name¶
In [3]:# Initial state from sampo.api import process_store process_store.list_process_metadata('./pstore2', all=True)
Out[3]:In [4]:# Removes and lists result. process_store.remove_process('./pstore2', 'fabhmerg_learn') process_store.list_process_metadata('./pstore2', all=True)
Out[4]:
rename_process¶
-
sampo.api.process_store.
rename_process
(url, process_name, new_process_name)¶ Renames the process.
The specified process will have the process name changed, but the process version will be kept unchanged.
When you rename a specific process, the SRC file in the ProcessStore is reconfigured to reflect the new name.
If the renamed process was the current version, the current version will be assigned to the latest version which has the latest finished_at date in the process metadata.
If a process name or a ProcessKey with no version is specified, all versions will be renamed.
- Parameters
- urlstr
Process store URL.
- process_namestr or ProcessKey
Process name or ProcessKey object.
- new_process_namestr
New process name.
- Raises
- ValidationError
If url is not str.
If process_name is neither string nor ProcessKey.
If new_process_name is not str.
Examples
Example1: Renaming by ProcessKey¶
In [1]:# Initial state from sampo.api import process_store process_store.list_process_metadata('./pstore1', all=True)
Out[1]:In [2]:# Removes and lists result. process_store.rename_process('./pstore1', 'fabhmerg_learn.15d61636-73e1-4323-9313-bcdf58ea9785', 'new_name') process_store.list_process_metadata('./pstore1', all=True)
Out[2]:Example2: Renaming all version processes by process name¶
In [3]:# Initial state from sampo.api import process_store process_store.list_process_metadata('./pstore2', all=True)
Out[3]:In [4]:# Removes and lists result. process_store.rename_process('./pstore2', 'fabhmerg_learn', 'new_name') process_store.list_process_metadata('./pstore2', all=True)
Out[4]:
convert_process¶
-
sampo.api.process_store.
convert_process
(url, process_name, dest_dir, cids=None, flat=False)¶ Converts stored data in a process store from a SAMPO-internal format to a human-readable process external format.
- Parameters
- urlstr
Process store URL.
- process_name: str
A process name.
- dest_dirstr
Path to destination directory.
- cidslist or None
Specify target components to convert. Data of components matched with each item of this list is converted. If None, all components in a specified process are output.
Specifies some components:
['dl', 'rg', 'bexp']
- flatbool
If true, output files in a flat directory structure. See examples for more details. An output filename includes process name, cid and each data name concatenated with under-score(‘_’) as below.
<process name>_<cid>_<data name>
examples
fabreg_learn_fabreg1_reg_predict_result.csv
fabreg_learn_fabreg1_selected_attrs.json
- Raises
- ValidationError
If url is not str.
If process_name is not str.
If dest_dir is not str.
If cids is not list and not None.
If flat is not bool.
Examples
Converting data in a process.
>>> from sampo.api import process_store >>> process_store.convert_process('./pstore', 'my_process_a', './convert_dir')
Output:
convert_dir/ └── my_process_a ├── attr_metadata │ └── attr_metadata.json ├── components │ ├── dl │ │ └── comp_output_data │ │ ├── data.asd │ │ └── data.csv │ ├── rg │ │ ├── comp_output_data │ │ │ └── rg_predict_result.csv │ │ ├── comp_output_evaluation │ │ │ └── comp_output_evaluation.csv │ │ ├── model │ │ │ ├── fabhmerg_info.csv │ │ │ ├── gate_tree.json │ │ │ └── prediction_formulas.csv │ │ └── selected_attrs │ │ └── selected_attrs.json │ └── std │ ├── comp_output_data │ │ ├── data.asd │ │ └── data.csv │ └── selected_attrs │ └── selected_attrs.json ├── spd │ └── my_process_a.spd └── src └── dump.src
Converting data in specific components.
>>> from sampo.api import process_store >>> process_store.convert_process('./pstore', 'my_process_a', './convert_dir', ['dl', 'rg'])
Output:
convert_dir/ └── my_process_a ├── attr_metadata │ └── attr_metadata.json ├── components │ ├── dl │ │ └── comp_output_data │ │ ├── data.asd │ │ └── data.csv │ └── rg │ ├── comp_output_data │ │ └── rg_predict_result.csv │ ├── comp_output_evaluation │ │ └── comp_output_evaluation.csv │ ├── model │ │ ├── fabhmerg_info.csv │ │ ├── gate_tree.json │ │ └── prediction_formulas.csv │ └── selected_attrs │ └── selected_attrs.json ├── spd │ └── my_process_a.spd └── src └── dump.src
Converting data in a flat directory structure.
>>> from sampo.api import process_store >>> process_store.convert_process('./pstore', 'my_process_a', './convert_dir', flat=True)
Output:
convert_dir ├── my_process_a_attr_metadata.json ├── my_process_a_bexp_data.asd ├── my_process_a_bexp_data.csv ├── my_process_a_bexp_selected_attrs.json ├── my_process_a_dl_data.asd ├── my_process_a_dl_data.csv ├── my_process_a_dump.src ├── my_process_a_my_process_a.spd ├── my_process_a_rg_comp_output_evaluation.csv ├── my_process_a_rg_fabhmerg_info.csv ├── my_process_a_rg_gate_tree.json ├── my_process_a_rg_prediction_formulas.csv ├── my_process_a_rg_rg_predict_result.csv ├── my_process_a_rg_selected_attrs.json ├── my_process_a_std_data.asd ├── my_process_a_std_data.csv └── my_process_a_std_selected_attrs.json
Process External Format¶
Convertible data and their corresponding external format¶
The convertible data and their corresponding external format are shown in the table below:
Convertible Data |
External Format |
---|---|
SPD (SAMPO Process Description) |
Same as SPD input file format. See the SPD File Specification. |
SRC (SAMPO Run Configuration) |
Same as SRC input file format. See the SRC File Specification. |
ASD (Attributes Schema Description) |
Same as ASD input file format. See the ASD File Specification. The attributes in the ASD are component-specific. See component specification. |
Attribute metadata |
|
Selected attributes |
|
Model |
A component-specific format. See each component specification. |
Component output data |
Same as SAMPO CSV input file format. See the SAMPO CSV File Specification. The attributes in the output data are component-specific. See component specification. |
Prediction result evaluation |
A component-specific format. See component specification. |
Attribute Metadata File Format¶
Attribute Metadata File describes the metadata of attributes and the derivation relations in a process.
- Attribute matadata is represented by DAG (Directed Acyclic Graph) structure, consisted of nodes and links.
Nodes section represents the information of each attribute.
Links section represents derivation relationships of attributes.
The file follows the JSON format.
Example:
{
"nodes": [
{"aid": "dl1[0]", "name": "A", "scale": "integer", "is_excluded": false,"cid": "dl1",
"cindex": 0, "values": null, "is_kept": true, "context": null},
{"aid": "dl1[1]", "name": "B", "scale": "nominal", "is_excluded": true, "cid": "dl1",
"cindex": 1, "values": ["A", "B", "O"], "is_kept": true, "context": null},
{"aid": "rg1[0]", "name": "actual", "scale": "real", "is_excluded": false, "cid": "rg1",
"cindex": 0, "values": null, "is_kept": false, "context": {"field_path": ["regression", "actual"]}}
],
"links": [
{"source": "dl1[0]", "target": "rg1[0]"}
]
}
Nodes Section¶
Nodes section represents the information of all attributes generated in a process.
Each property of attributes is defined as follows:
Property |
Description |
---|---|
aid |
Attribute ID. |
name |
Attribute name. |
scale |
Scale of the attribute. |
is_excluded |
Whether the attribute is excluded as a feature or not. |
cid |
ID of the component by which the attribute was generated. |
cindex |
Index of the attribute in the component by which the attribute was generated. |
values |
Domain of NOMINAL attribute. (null if the scale is not NOMINAL.) |
is_kept |
Whether the attribute is kept or not even after running every component. |
context |
Context information of the attribute. |
Links Section¶
Links section represents derivation relationships of attributes:
"links": [
{"source": "dl1[0]", "target": "rg1[0]"}
]
In the above example, links section represents that the attribute rg1[0] was derived from the attribute dl1[0].
Selected Attributes File Format¶
Selected Attributes contains the information of attributes which a learning component selected.
The file follows the JSON format.
Example:
{
"selected_features": [
{"aid": "dl1[0]", "name": "A", "scale": "integer", "is_excluded": false,
"cid": "dl1", "cindex": 0, "values": null, "is_kept": false, "context": null},
{"aid": "dl1[1]", "name": "B", "scale": "real", "is_excluded": false,
"cid": "dl1", "cindex": 1, "values": null, "is_kept": false, "context": null},
{"aid": "dl1[2]", "name": "C", "scale": "real", "is_excluded": false,
"cid": "dl1", "cindex": 2, "values": null, "is_kept": false, "context": null}
],
"selected_targets": [
{"aid": "dl1[3]", "name": "Z", "scale": "integer", "is_excluded": true,
"cid": "dl1", "cindex": 3, "values": null, "is_kept": true, "context": null}
]
}
Selected Features Section¶
Selected Features Section describes attributes information selected as features.
Each property of attributes is defined as well as that in Nodes Section of Attribute Metadata.
Selected Targets Section¶
Selected Targets Section describes attributes information selected as targets.
Each property of attributes is defined as well as that in Nodes Section of Attribute Metadata.
export_process¶
-
sampo.api.process_store.
export_process
(url, process_name, dest_dir, truncate_comp_output=False)¶ Export stored process data in a process store to a directory. The exported data can be imported to any process store with sampo.api.process_store.import_process() API. This API can export only successful processes. Failed or in-progress cannot be exported.
- Parameters
- urlstr
Process store URL.
- process_namestr
Process name to be exported.
- dest_dirstr
Exported data output path.
- truncate_comp_outputbool
Whether the component outputs of the exported processes are truncated or not.
- Raises
- ValidationError
If url is not str.
If process_name is not str.
If dest_dir is not str.
If truncate_comp_output is not bool.
Examples
Example1: Exporting current version of a selected in the process store¶
In [1]:from sampo.api import process_store pstore_url = './pstore' process_store.export_process(pstore_url, 'fabhmerg_learn', './export_dir') # Checking exported processes ! ls ./export_dir
Example2: Exporting current version of all processes in the process store¶
In [2]:from sampo.api import process_store pstore_url = './pstore' process_list = process_store.list_process_metadata('./pstore')['process name'].values for process_name in process_list: process_store.export_process(pstore_url, process_name, './export_dir') # Checking exported processes ! ls ./export_dir
import_process¶
-
sampo.api.process_store.
import_process
(input_dir_path, process_name, url)¶ Import process data exported by sampo.api.process_store.export_process() to a process store.
- Parameters
- input_dir_pathstr
The directory path of source process data which is imported.
- process_namestr or None
The process name. if None, all process will be imported.
- urlstr
Process store URL.
- Raises
- ValidationError
If input_dir_path is not str.
If process_name is not str or None.
If url is not str.
Examples
Example: Importing processes exported by export_process() from a directory to a new process store.¶
In [1]:import os from sampo.api import process_store new_pstore_url = './new_pstore' process_store.create(new_pstore_url) with os.scandir('./export_dir') as exported_processes: for entry in exported_processes: if entry.is_dir(): process_store.import_process(entry.path, None, new_pstore_url) # Checking imported processes process_store.list_process_metadata(new_pstore_url)
Out[1]:
process_to_spd¶
-
sampo.api.process_store.
process_to_spd
(url, process_name, dest_dir)¶ Converts a learned process into SPD file. When a process has Feature Learner, it is converted to Feature Descriptor. Output attributes which were not used for the successor are removed. Components without output attributes are removed from process.
- Parameters
- urlstr
Process store URL.
- process_namestr
A process name.
- dest_dirstr
SPD is output to this directory path with the following name.
<dest_dir>/<original SPD file name>_<process name>.spd
- Raises
- Validation Error
If url is not str.
If process_name is not str.
If dest_dir is not str.
Examples
aad.spd
dl -> hr -> fs -> bin -> fs -> ts -> fs --- components: dl: component: DataLoader hr: component: HingeRampFLComponent features: scale == 'real' or scale == 'integer' max_num_output_features: 15 bin: component: BinarizeFLComponent features: scale == 'real' or scale == 'integer' max_num_output_features: 10 ts: component: TimeshiftFDComponent features: scale == 'real' or scale == 'integer' shift: [["all()", [2,10]]] fs: component: LinearHSICFSComponent features: scale == 'real' or scale == 'integer' target: name == 'target_value' max_num_output_features: 5 global_settings: keep_attributes: - target_value feature_exclude: - target_value
Consider pstore as a process store containing process which is learned using aad.spd shown above.
>>> from sampo.api import process_store >>> process_store.process_to_spd('pstore', 'my_process_A', './output_dir')
aad_my_process_A.spd is output.
aad_my_process_A.spd
dl -> bin -> fs dl -> ts -> fs --- components: dl: component: DataLoader ts: component: TimeshiftFDComponent features: scale == 'real' or scale == 'integer' shift: - [name =="attr2", [2]] - [name =="attr4", [2]] bin: component: BinarizeFDComponent features: scale == 'real' or scale == 'integer' binarize_param: - [name =="attr2", [{threshold: 3.5}]] fs: component: DummyFDComponent features: name == 'bin(3.5)_attr2' or name == 'ts(2)_attr2' or name == 'ts(2)_attr4' global_settings: keep_attributes: - target_value feature_exclude: - target_value
Thus each component is generated according to the following rules.
BinarizeFLComponent converted to BinarizeFDComponent. LinearHSICFSComponent converted to DummyFDComponent.
TimeshiftFDComponent output attributes which were not used for the successor are removed. (BinarizeFLComponent and HingeRampFLComponent output attributes were deleted similarly.)
HingeRampFLComponent removed from Process. (Output attributes of HingeRampFLComponent were not selected by Feature Selection)