ASD (Attribute Schema Description) Specification¶
Overview¶
ASD (Attribute Schema Description) describes the relations between attribute names and scales. SAMPO loads the input data following the order and number of attributes described in the ASD.
The ASD can be prepared in two formats: Python object or file.
ASD File Example:
_sid: {scale: INTEGER}
_datetime: {scale: DATE}
col_a: {scale: INTEGER}
col_b: {scale: REAL}
col_c: {scale: NOMINAL, domain: [a, b, c]}
Format¶
The ASD can be prepared either as:
a Python object (usable in SAMPO API)
a text file (usable in SAMPO API and SAMPO Command)
ASD Object¶
The ASD object is a Python OrderedDict object that holds the input attribute names as dictionary keys and the scale and domain information as corresponding values. Each attribute follows the rule described by Attribute Patterns. ASD objects are usable only via SAMPO API.
ASD Object Example:
>>> from collections import OrderedDict
>>> asd = OrderedDict([('_sid', {'scale': 'INTEGER'}),
... ('col1', {'scale': 'INTEGER'}),
... ('_datetime', {'scale': 'DATE'}),
... ('col3', {'scale': 'DATE'}),
... ('col4', {'scale': 'REAL'}),
... ('col5', {'scale': 'NOMINAL', 'domain': ['aaa', 'bbb', 'ccc']}),
... ('col6', {'scale': 'NOMINAL', 'domain': ['0', '1']}),
... ('col7', {'scale': 'REAL'})])
ASD File¶
The ASD file is a text file that describes the schemata of input attributes for SAMPO where each line describes an attribute. Each attribute follows the rule described by Attribute Patterns. This file is used on both SAMPO API and SAMPO Command.
ASD files follow the YAML syntax.
- YAML Version 1.2
ASD File Example:
_sid: {scale: INTEGER}
col1: {scale: INTEGER}
_datetime: {scale: DATE}
col3: {scale: DATE}
col4: {scale: REAL}
col5: {scale: NOMINAL, domain: [aaa, bbb, ccc]}
col6: {scale: NOMINAL, domain: ['0', '1']}
col7: {scale: DATE}
The ASD file must fit the following constraints:
Property |
Constraint |
---|---|
File name |
ASCII characters.asd |
Character code |
Python 3: UTF-8 (ASCII + Japanese Characters)
Python 2: ASCII
|
Newline code |
CRLF (Recommended), LF (Not Recommended) |
Attribute Patterns¶
Attribute mappings in the ASD follow either pattern below:
<attribute_name>: {scale: <scale>}
<attribute_name>: {scale: NOMINAL, domain: [<domain_value>, ...]}
<attribute_name>
Specifies an attribute name.
Attribute name which start with ‘_’ (underscore) is considered as sample metadata (e.g. ‘_sid’ as sample ID).
‘_sid’ must be specified and must be INTEGER scale.
‘_datetime’ must be DATE scale if specified.
When using non-ASCII multibyte characters in Attribute names (in Python 3), the above rules still apply.
<scale>
Specifies an attribute scale of SAMPO.
Supported attribute scales are INTEGER, REAL, DATE and NOMINAL.
<domain_value>
Specifies NOMINAL scale domain values as a list.
Using non-ASCII multibyte characters in NOMINAL values is allowed in Python 3.
In ASD objects, each of the domain values must be written as python string literals.
In ASD files, as per YAML specifications, values such as, null, boolean, integer, and floating point that need to be interpreted as strings should be enclosed in quotation marks.
Sample Metadata¶
Sample metadata is different from other attributes.
Sample metadata is not selected by attribute selection.
Some components utilize sample metadata for component specification.