SAMPO ARFF File Specification

Overview

Warning

ARFF file is deprecated input data format. Use CSV file with ASD file instead of ARFF file.

A SAMPO ARFF file is a text file that describes input data to be learned/predicted.

The format is based on ARFF.

https://weka.wikispaces.com/ARFF+%28book+version%29

However, there are some differences from ARFF:

  • There are several reserved attributes which can be considered as sample metadata (e.g. ‘_sid’ as sample ID).

    • For further details, see the Reserved Attribute section below.

  • STRING attribute is not supported.

  • The date format of the DATE attribute supports only the following format:

    • yyyy-MM-dd

    • yyyy-MM-dd’T’HH:mm:ss

    • yyyy-MM-dd’T’HH:mm:ss.S

Example:

% 1. Title: Iris Plants Database
%
% 2. Sources:
%      (a) Creator: R.A. Fisher
%      (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
%      (c) Date: July, 1988
%
@RELATION iris

@ATTRIBUTE _sid         INTEGER
@ATTRIBUTE sepallength  REAL
@ATTRIBUTE sepalwidth   REAL
@ATTRIBUTE petallength  REAL
@ATTRIBUTE petalwidth   REAL
@ATTRIBUTE class        {Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
  .
  .
  .

The SAMPO ARFF file must fit the following constraints:

Property

Constraint

File name

ASCII characters.arff

Character code

Python 3: UTF-8 (ASCII + Japanese Characters)
Python 2: ASCII

Newline code

CRLF (Recommended), LF (Not Recommended)


Attribute Names

There are some naming rules that must be followed:

  • Attribute name must not contain following characters. If you include these characters in attribute name, enclose using double quote (“).
    • curly brackets (‘{‘ and ‘}’)

    • at sign (‘@’)

    • comma (‘,’)

    • space

    • If you use double quote (“) in double quotes, escape double quote by double quote like “attr_””name”.

  • Attribute name must not contain following characters.
    • square brackets (‘[‘ and ‘]’)

  • The name of usual (non-reserved) attributes must not begin with an underscore (‘_’).

  • When using non-ASCII multibyte characters in Attribute names (in Python 3), the above rules still apply.

Relation Names

Relation names must adhere to the rules specified for attribute names.


Nominal Names

Nominal names must adhere to the rules specified for attribute names.


Reserved Attributes

Reserved attributes are defined as follows. You can use these attributes in your data.

Attribute name

Data type

Required or Optional

Description

_sid

INTEGER

Required

Sample ID. Must be a unique value.

_datetime

DATE

Optional

Date/Time data.


Missing Values

A question mark (‘?’) and an empty string (‘’) in the data section are treated as missing values.


Attribute Scale Mapping between ARFF and SAMPO

The scale of attributes will be converted to SAMPO internal scale after loaded in SAMPO. The mapping between ARFF scale and SAMPO internal scale is as follows:

ARFF

SAMPO

Remarks

INTEGER

INTEGER

REAL

REAL

NUMERIC

REAL

Deprecated

NOMINAL

NOMINAL

DATE

DATE

STRING

Not supported


Examples

Non-series Data:

@RELATION iris

@ATTRIBUTE _sid         INTEGER
@ATTRIBUTE sepallength  REAL
@ATTRIBUTE sepalwidth   REAL
@ATTRIBUTE petallength  REAL
@ATTRIBUTE petalwidth   REAL
@ATTRIBUTE class        {Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
  .
  .
  .

Time-series Data:

@RELATION temperature_and_humidity_report_2013_04

@ATTRIBUTE _sid            INTEGER
@ATTRIBUTE _datetime       DATE     "yyyy-MM-dd'T'HH:mm:ss"
@ATTRIBUTE temperature-avg REAL
@ATTRIBUTE temperature-max REAL
@ATTRIBUTE temperature-min REAL
@ATTRIBUTE humidity-avg    REAL
@ATTRIBUTE humidity-min    REAL

@DATA
1,2013-04-01T00:00:00,7.9,11.6,2.3,62.6,41
2,2013-04-02T00:00:00,11.8,13.5,9.6,89.7,60
3,2013-04-03T00:00:00,12.2,13.5,11.0,83.4,63
 .
 .
 .