sampo_util csv2arffheader

Overview

Warning

sampo_util csv2arffheader command is deprecated.

sampo_util csv2arffheader, with an input CSV file, generates @ATTRIBUTE statements, attribute values, and frequencies of value occurrence which are needed in finalizing the @ATTRIBUTE statements and their respective data scales in the header section of ARFF files. The result is output to stdout.


Synopsis

See sampo_util command help:

$ sampo_util csv2arffheader --help

Examples

  • No designated nominal threshold.

    • Command:

      $ sampo_util csv2arffheader data2.csv
      
    • Input:

      _sid,dayofweek,flowername,temperature,date,comment
      0,MON,Iris-setosa,23.5,2012-01-01T01:23:45.678+09:00,
      1,TUE,Iris-setosa,20.1,2012-01-01T01:23:45.678+09:00,"John, Doe"
      2,?,Iris-setosa,,,
      3,THU,Iris-setosa,22.1,2012-04-01T04:23:45.678-10:00,
      4,FRI,Iris-setosa,21,2012-06-01T06:23:45.678Z,
      5,SAT,Iris-versicolor,24.1,2012-06-01T06:23:45.678Z,"Jacob, Smith"
      6,SUN,Iris-versicolor,21.5,2012-08-01T06:23:45.678Z,
      7,MON,Iris-versicolor,19.2,2012-09-01T06:23:45.678Z,"Ethan, Johnson"
      8,TUE,Iris-virginica,22.3,2012-04-01T06:23:45.678Z,
      
    • Output:

      @ATTRIBUTE _sid {0,1,2,3,4,5,6,7,8}
      0,1,2,3,4,5,6,7,8
      1,1,1,1,1,1,1,1,1
      
      @ATTRIBUTE dayofweek {MON,TUE,FRI,SAT,SUN,THU}
      MON,TUE,FRI,SAT,SUN,THU
      2,2,1,1,1,1
      
      @ATTRIBUTE flowername {Iris-setosa,Iris-versicolor,Iris-virginica}
      Iris-setosa,Iris-versicolor,Iris-virginica
      5,3,1
      
      @ATTRIBUTE temperature {19.2,20.1,21,21.5,22.1,22.3,23.5,24.1}
      19.2,20.1,21,21.5,22.1,22.3,23.5,24.1
      1,1,1,1,1,1,1,1
      
      @ATTRIBUTE date DATE "yyyy-MM-dd'T'HH:mm:ss"
      2012-01-01T01:23:45.678+09:00,2012-06-01T06:23:45.678Z,2012-04-01T04:23:45.678-10:00,2012-04-01T06:23:45.678Z,2012-08-01T06:23:45.678Z,2012-09-01T06:23:45.678Z
      2,2,1,1,1,1
      
      @ATTRIBUTE comment {"Ethan, Johnson","Jacob, Smith","John, Doe"}
      "Ethan, Johnson","Jacob, Smith","John, Doe"
      1,1,1
      

  • With designated nominal thresholds.

    • Command:

      $ sampo_util csv2arffheader data2.csv  --numeric-nominal-threshold 5 --string-nominal-threshold 5
      
    • Input:

      Same as the above example.

    • Output:

      @ATTRIBUTE _sid INTEGER
      0,1,2,3,4,5,6,7,8
      1,1,1,1,1,1,1,1,1
      
      @ATTRIBUTE dayofweek STRING
      MON,TUE,FRI,SAT,SUN,THU
      2,2,1,1,1,1
      
      @ATTRIBUTE flowername {Iris-setosa,Iris-versicolor,Iris-virginica}
      Iris-setosa,Iris-versicolor,Iris-virginica
      5,3,1
      
      @ATTRIBUTE temperature REAL
      19.2,20.1,21,21.5,22.1,22.3,23.5,24.1
      1,1,1,1,1,1,1,1
      
      @ATTRIBUTE date DATE "yyyy-MM-dd'T'HH:mm:ss"
      2012-01-01T01:23:45.678+09:00,2012-06-01T06:23:45.678Z,2012-04-01T04:23:45.678-10:00,2012-04-01T06:23:45.678Z,2012-08-01T06:23:45.678Z,2012-09-01T06:23:45.678Z
      2,2,1,1,1,1
      
      @ATTRIBUTE comment {"Ethan, Johnson","Jacob, Smith","John, Doe"}
      "Ethan, Johnson","Jacob, Smith","John, Doe"
      1,1,1
      

Output Format

The following information is printed with each attribute:

  • @ATTRIBUTE statement.

  • Attribute values separated by comma.

  • Appearance frequency of each attribute value separated by comma.