====================================
SAMPO Pandas DataFrame Specification
====================================

.. contents:: Contents
   :local:

Overview
========
Pandas DataFrame is a two-dimensional tabular data format with labeled axes.

See the documentation on
    https://pandas.pydata.org/pandas-docs/version/0.25/generated/pandas.DataFrame.html


**Example**

 +--------+------------+-------------+------------+-------------+------------+-----------------+
 |   _sid | _datetime  | SepalLength | SepalWidth | PetalLength | PetalWidth | Name            |
 +========+============+=============+============+=============+============+=================+
 |   0    | 2017-01-23 | 5.9         | 3.5        | 1.4         | 0.2        | Iris-setosa     |
 +--------+------------+-------------+------------+-------------+------------+-----------------+
 |   1    | 2017-01-23 | 4.9         | 3.0        | 1.4         | 0.2        | Iris-setosa     |
 +--------+------------+-------------+------------+-------------+------------+-----------------+
 |   2    | 2017-01-23 | 4.7         | 3.2        | 1.3         | 0.2        | Iris-setosa     |
 +--------+------------+-------------+------------+-------------+------------+-----------------+
 |   3    | 2017-01-23 | 4.6         | 3.1        | 1.5         | 0.2        | Iris-setosa     |
 +--------+------------+-------------+------------+-------------+------------+-----------------+
 |   4    | 2017-01-23 | 5.0         | 3.6        | 1.4         | 0.2        | Iris-setosa     |
 +--------+------------+-------------+------------+-------------+------------+-----------------+


String type data in the SAMPO Pandas DataFrame must fit the following constraints:

+----------------+-----------------------------------------------------+
| Property       | Constraint                                          |
+================+=====================================================+
| Character code || Python 3: UTF-8 (ASCII + Japanese Characters)      |
|                || Python 2: ASCII                                    |
+----------------+-----------------------------------------------------+

|

Attribute Names
===============
Attribute names are described as columns of a pandas.DataFrame. Attributes beginning with underscores ('_') will \
be treated as sample metadata.

|

Attribute Scales
================
Attribute scales are derived from the dtype of the pandas.DataFrame column.
Please refer to the supported schema table of `gen_asd_from_pandas_df() API <../sampotools/api/df2asd.html>`_ for
the relation of the column dtype and its corresponding attribute scale.

|

Reserved Attributes
===================
Reserved attributes are defined as follows. You must use these attributes in your data as \
prescribed below.

+-------------------+-----------------+---------------------+--------------------------------------------------+
|Attribute name     |Data type        |Required or Optional |Description                                       |
+===================+=================+=====================+==================================================+
|_sid               |numpy.int64      |Required             | Sample ID. Must be a unique value. SAMPO will    |
|                   |                 |                     | automatically add this attribute if not found.   |
+-------------------+-----------------+---------------------+--------------------------------------------------+
|_datetime          |numpy.datetime64 |Optional             | Date/Time data.                                  |
+-------------------+-----------------+---------------------+--------------------------------------------------+

|

Examples
========
::

    >>> import pandas as pd
    >>> df = pd.DataFrame(
    ...     [[0, pd.to_datetime('2017-01-23'), 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
    ...      [1, pd.to_datetime('2017-01-24'), 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
    ...      [2, pd.to_datetime('2017-01-25'), 4.7, 3.2, 1.3, 0.2, 'Iris-setosa'],
    ...      [3, pd.to_datetime('2017-01-26'), 4.6, 3.1, 1.5, 0.2, 'Iris-setosa'],
    ...      [4, pd.to_datetime('2017-01-27'), 5.0, 3.6, 1.4, 0.2, 'Iris-setosa']],
    ...     columns=['_sid', '_datetime', 'SepalLength', 'SepalWidth',
    ...              'PetalLength', 'PetalWidth', 'Name'])
