SAMPO ARFF File Specification¶
Contents
Overview¶
Warning
ARFF file is deprecated input data format. Use CSV file with ASD file instead of ARFF file.
A SAMPO ARFF file is a text file that describes input data to be learned/predicted.
- The format is based on ARFF.
However, there are some differences from ARFF:
There are several reserved attributes which can be considered as sample metadata (e.g. ‘_sid’ as sample ID).
For further details, see the Reserved Attribute section below.
STRING attribute is not supported.
The date format of the DATE attribute supports only the following format:
yyyy-MM-dd
yyyy-MM-dd’T’HH:mm:ss
yyyy-MM-dd’T’HH:mm:ss.S
Example:
% 1. Title: Iris Plants Database
%
% 2. Sources:
% (a) Creator: R.A. Fisher
% (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
% (c) Date: July, 1988
%
@RELATION iris
@ATTRIBUTE _sid INTEGER
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
.
.
.
The SAMPO ARFF file must fit the following constraints:
Property |
Constraint |
---|---|
File name |
ASCII characters.arff |
Character code |
Python 3: UTF-8 (ASCII + Japanese Characters)
Python 2: ASCII
|
Newline code |
CRLF (Recommended), LF (Not Recommended) |
Attribute Names¶
There are some naming rules that must be followed:
- Attribute name must not contain following characters. If you include these characters in attribute name, enclose using double quote (“).
curly brackets (‘{‘ and ‘}’)
at sign (‘@’)
comma (‘,’)
space
If you use double quote (“) in double quotes, escape double quote by double quote like “attr_””name”.
- Attribute name must not contain following characters.
square brackets (‘[‘ and ‘]’)
The name of usual (non-reserved) attributes must not begin with an underscore (‘_’).
When using non-ASCII multibyte characters in Attribute names (in Python 3), the above rules still apply.
Reserved Attributes¶
Reserved attributes are defined as follows. You can use these attributes in your data.
Attribute name |
Data type |
Required or Optional |
Description |
---|---|---|---|
_sid |
INTEGER |
Required |
Sample ID. Must be a unique value. |
_datetime |
DATE |
Optional |
Date/Time data. |
Missing Values¶
A question mark (‘?’) and an empty string (‘’) in the data section are treated as missing values.
Attribute Scale Mapping between ARFF and SAMPO¶
The scale of attributes will be converted to SAMPO internal scale after loaded in SAMPO. The mapping between ARFF scale and SAMPO internal scale is as follows:
ARFF |
SAMPO |
Remarks |
---|---|---|
INTEGER |
INTEGER |
|
REAL |
REAL |
|
NUMERIC |
REAL |
Deprecated |
NOMINAL |
NOMINAL |
|
DATE |
DATE |
|
STRING |
— |
Not supported |
Examples¶
Non-series Data:
@RELATION iris
@ATTRIBUTE _sid INTEGER
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
.
.
.
Time-series Data:
@RELATION temperature_and_humidity_report_2013_04
@ATTRIBUTE _sid INTEGER
@ATTRIBUTE _datetime DATE "yyyy-MM-dd'T'HH:mm:ss"
@ATTRIBUTE temperature-avg REAL
@ATTRIBUTE temperature-max REAL
@ATTRIBUTE temperature-min REAL
@ATTRIBUTE humidity-avg REAL
@ATTRIBUTE humidity-min REAL
@DATA
1,2013-04-01T00:00:00,7.9,11.6,2.3,62.6,41
2,2013-04-02T00:00:00,11.8,13.5,9.6,89.7,60
3,2013-04-03T00:00:00,12.2,13.5,11.0,83.4,63
.
.
.