=============================
SAMPO ARFF File Specification
=============================

.. contents:: Contents
    :local:

Overview
========
.. warning::

   ARFF file is deprecated input data format. Use CSV file with ASD file instead of ARFF file.

A SAMPO ARFF file is a text file that describes input data to be learned/predicted.

The format is based on ARFF.
    https://weka.wikispaces.com/ARFF+%28book+version%29

However, there are some differences from ARFF:

* There are several **reserved attributes** which can be considered as sample metadata (e.g. **'_sid'** as sample ID).

  * For further details, see the **Reserved Attribute** section below.

* STRING attribute is not supported.
* The date format of the DATE attribute supports only the following format:

  * yyyy-MM-dd
  * yyyy-MM-dd'T'HH:mm:ss
  * yyyy-MM-dd'T'HH:mm:ss.S

**Example**::

    % 1. Title: Iris Plants Database
    %
    % 2. Sources:
    %      (a) Creator: R.A. Fisher
    %      (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    %      (c) Date: July, 1988
    %
    @RELATION iris

    @ATTRIBUTE _sid         INTEGER
    @ATTRIBUTE sepallength  REAL
    @ATTRIBUTE sepalwidth   REAL
    @ATTRIBUTE petallength  REAL
    @ATTRIBUTE petalwidth   REAL
    @ATTRIBUTE class        {Iris-setosa,Iris-versicolor,Iris-virginica}

    @DATA
    0,5.1,3.5,1.4,0.2,Iris-setosa
    1,4.9,3.0,1.4,0.2,Iris-setosa
    2,4.7,3.2,1.3,0.2,Iris-setosa
    3,4.6,3.1,1.5,0.2,Iris-setosa
      .
      .
      .

|

The SAMPO ARFF file must fit the following constraints:

+----------------+------------------------------------------------------+
| Property       | Constraint                                           |
+================+======================================================+
| File name      | *ASCII characters*.arff                              |
+----------------+------------------------------------------------------+
| Character code | | Python 3: UTF-8 (ASCII + Japanese Characters)      |
|                | | Python 2: ASCII                                    |
+----------------+------------------------------------------------------+
| Newline code   | CRLF (Recommended),  LF (Not Recommended)            |
+----------------+------------------------------------------------------+

|

Attribute Names
===============
There are some naming rules that must be followed:

* Attribute name must not contain following characters. If you include these characters in attribute name, enclose using **double quote (")**.
      * **curly brackets ('{' and '}')**
      * **at sign ('@')**
      * **comma (',')**
      * **space**
      * If you use double quote (") in double quotes, escape double quote by double quote like "attr\_""name".
* Attribute name must not contain following characters.
      * **square brackets ('[' and ']')**
* The name of usual (non-reserved) attributes must not begin with **an underscore ('_')**.
* When using non-ASCII multibyte characters in Attribute names (in Python 3), the above rules still apply.


Relation Names
==============
Relation names must adhere to the rules specified for attribute names.

|

Nominal Names
=============
Nominal names must adhere to the rules specified for attribute names.

|

Reserved Attributes
===================
Reserved attributes are defined as follows. You can use these attributes in your data.

+-------------------+----------+---------------------+--------------------------------------------------+
|Attribute name     |Data type |Required or Optional |Description                                       |
+===================+==========+=====================+==================================================+
|_sid               |INTEGER   |Required             | Sample ID. Must be a unique value.               |
+-------------------+----------+---------------------+--------------------------------------------------+
|_datetime          |DATE      |Optional             | Date/Time data.                                  |
+-------------------+----------+---------------------+--------------------------------------------------+

|

Missing Values
==============
**A question mark ('?')** and **an empty string ('')** in the data section are treated as missing values.

|

Attribute Scale Mapping between ARFF and SAMPO
==============================================
The scale of attributes will be converted to SAMPO internal scale after loaded in SAMPO.
The mapping between ARFF scale and SAMPO internal scale is as follows:

+-----------------+-----------------+-----------------+
|ARFF             |SAMPO            |Remarks          |
+=================+=================+=================+
|INTEGER          |INTEGER          |                 |
+-----------------+-----------------+-----------------+
|REAL             |REAL             |                 |
+-----------------+-----------------+-----------------+
|NUMERIC          |REAL             |Deprecated       |
+-----------------+-----------------+-----------------+
|NOMINAL          |NOMINAL          |                 |
+-----------------+-----------------+-----------------+
|DATE             |DATE             |                 |
+-----------------+-----------------+-----------------+
|STRING           |---              |Not supported    |
+-----------------+-----------------+-----------------+

|

Examples
========
Non-series Data::

    @RELATION iris

    @ATTRIBUTE _sid         INTEGER
    @ATTRIBUTE sepallength  REAL
    @ATTRIBUTE sepalwidth   REAL
    @ATTRIBUTE petallength  REAL
    @ATTRIBUTE petalwidth   REAL
    @ATTRIBUTE class        {Iris-setosa,Iris-versicolor,Iris-virginica}

    @DATA
    0,5.1,3.5,1.4,0.2,Iris-setosa
    1,4.9,3.0,1.4,0.2,Iris-setosa
    2,4.7,3.2,1.3,0.2,Iris-setosa
    3,4.6,3.1,1.5,0.2,Iris-setosa
      .
      .
      .

|

Time-series Data::

    @RELATION temperature_and_humidity_report_2013_04

    @ATTRIBUTE _sid            INTEGER
    @ATTRIBUTE _datetime       DATE     "yyyy-MM-dd'T'HH:mm:ss"
    @ATTRIBUTE temperature-avg REAL
    @ATTRIBUTE temperature-max REAL
    @ATTRIBUTE temperature-min REAL
    @ATTRIBUTE humidity-avg    REAL
    @ATTRIBUTE humidity-min    REAL

    @DATA
    1,2013-04-01T00:00:00,7.9,11.6,2.3,62.6,41
    2,2013-04-02T00:00:00,11.8,13.5,9.6,89.7,60
    3,2013-04-03T00:00:00,12.2,13.5,11.0,83.4,63
     .
     .
     .

