ASD (Attribute Schema Description) Specification

Overview

ASD (Attribute Schema Description) describes the relations between attribute names and scales. SAMPO loads the input data following the order and number of attributes described in the ASD.

The ASD can be prepared in two formats: Python object or file.

ASD File Example:

_sid: {scale: INTEGER}
_datetime: {scale: DATE}
col_a: {scale: INTEGER}
col_b: {scale: REAL}
col_c: {scale: NOMINAL, domain: [a, b, c]}

Format

The ASD can be prepared either as:

  1. a Python object (usable in SAMPO API)

  2. a text file (usable in SAMPO API and SAMPO Command)

ASD Object

The ASD object is a Python OrderedDict object that holds the input attribute names as dictionary keys and the scale and domain information as corresponding values. Each attribute follows the rule described by Attribute Patterns. ASD objects are usable only via SAMPO API.

ASD Object Example:

>>> from collections import OrderedDict
>>> asd = OrderedDict([('_sid', {'scale': 'INTEGER'}),
...                    ('col1', {'scale': 'INTEGER'}),
...                    ('_datetime', {'scale': 'DATE'}),
...                    ('col3', {'scale': 'DATE'}),
...                    ('col4', {'scale': 'REAL'}),
...                    ('col5', {'scale': 'NOMINAL', 'domain': ['aaa', 'bbb', 'ccc']}),
...                    ('col6', {'scale': 'NOMINAL', 'domain': ['0', '1']}),
...                    ('col7', {'scale': 'REAL'})])

ASD File

The ASD file is a text file that describes the schemata of input attributes for SAMPO where each line describes an attribute. Each attribute follows the rule described by Attribute Patterns. This file is used on both SAMPO API and SAMPO Command.

ASD files follow the YAML syntax.

ASD File Example:

_sid: {scale: INTEGER}
col1: {scale: INTEGER}
_datetime: {scale: DATE}
col3: {scale: DATE}
col4: {scale: REAL}
col5: {scale: NOMINAL, domain: [aaa, bbb, ccc]}
col6: {scale: NOMINAL, domain: ['0', '1']}
col7: {scale: DATE}

The ASD file must fit the following constraints:

Property

Constraint

File name

ASCII characters.asd

Character code

Python 3: UTF-8 (ASCII + Japanese Characters)
Python 2: ASCII

Newline code

CRLF (Recommended), LF (Not Recommended)


Attribute Patterns

Attribute mappings in the ASD follow either pattern below:

<attribute_name>: {scale: <scale>}
<attribute_name>: {scale: NOMINAL, domain: [<domain_value>, ...]}
  • <attribute_name>

    • Specifies an attribute name.

      • Attribute name which start with ‘_’ (underscore) is considered as sample metadata (e.g. ‘_sid’ as sample ID).

      • ‘_sid’ must be specified and must be INTEGER scale.

      • ‘_datetime’ must be DATE scale if specified.

      • When using non-ASCII multibyte characters in Attribute names (in Python 3), the above rules still apply.

  • <scale>

    • Specifies an attribute scale of SAMPO.

      • Supported attribute scales are INTEGER, REAL, DATE and NOMINAL.

  • <domain_value>

    • Specifies NOMINAL scale domain values as a list.

      • Using non-ASCII multibyte characters in NOMINAL values is allowed in Python 3.

      • In ASD objects, each of the domain values must be written as python string literals.

      • In ASD files, as per YAML specifications, values such as, null, boolean, integer, and floating point that need to be interpreted as strings should be enclosed in quotation marks.


Sample Metadata

Sample metadata is different from other attributes.

  • Sample metadata is not selected by attribute selection.

  • Some components utilize sample metadata for component specification.