sampo_util csv2arffheader¶
Contents
Overview¶
Warning
sampo_util csv2arffheader command is deprecated.
sampo_util csv2arffheader, with an input CSV file, generates @ATTRIBUTE statements, attribute values, and frequencies of value occurrence which are needed in finalizing the @ATTRIBUTE statements and their respective data scales in the header section of ARFF files. The result is output to stdout.
Examples¶
No designated nominal threshold.
Command:
$ sampo_util csv2arffheader data2.csv
Input:
_sid,dayofweek,flowername,temperature,date,comment 0,MON,Iris-setosa,23.5,2012-01-01T01:23:45.678+09:00, 1,TUE,Iris-setosa,20.1,2012-01-01T01:23:45.678+09:00,"John, Doe" 2,?,Iris-setosa,,, 3,THU,Iris-setosa,22.1,2012-04-01T04:23:45.678-10:00, 4,FRI,Iris-setosa,21,2012-06-01T06:23:45.678Z, 5,SAT,Iris-versicolor,24.1,2012-06-01T06:23:45.678Z,"Jacob, Smith" 6,SUN,Iris-versicolor,21.5,2012-08-01T06:23:45.678Z, 7,MON,Iris-versicolor,19.2,2012-09-01T06:23:45.678Z,"Ethan, Johnson" 8,TUE,Iris-virginica,22.3,2012-04-01T06:23:45.678Z,
Output:
@ATTRIBUTE _sid {0,1,2,3,4,5,6,7,8} 0,1,2,3,4,5,6,7,8 1,1,1,1,1,1,1,1,1 @ATTRIBUTE dayofweek {MON,TUE,FRI,SAT,SUN,THU} MON,TUE,FRI,SAT,SUN,THU 2,2,1,1,1,1 @ATTRIBUTE flowername {Iris-setosa,Iris-versicolor,Iris-virginica} Iris-setosa,Iris-versicolor,Iris-virginica 5,3,1 @ATTRIBUTE temperature {19.2,20.1,21,21.5,22.1,22.3,23.5,24.1} 19.2,20.1,21,21.5,22.1,22.3,23.5,24.1 1,1,1,1,1,1,1,1 @ATTRIBUTE date DATE "yyyy-MM-dd'T'HH:mm:ss" 2012-01-01T01:23:45.678+09:00,2012-06-01T06:23:45.678Z,2012-04-01T04:23:45.678-10:00,2012-04-01T06:23:45.678Z,2012-08-01T06:23:45.678Z,2012-09-01T06:23:45.678Z 2,2,1,1,1,1 @ATTRIBUTE comment {"Ethan, Johnson","Jacob, Smith","John, Doe"} "Ethan, Johnson","Jacob, Smith","John, Doe" 1,1,1
With designated nominal thresholds.
Command:
$ sampo_util csv2arffheader data2.csv --numeric-nominal-threshold 5 --string-nominal-threshold 5
Input:
Same as the above example.
Output:
@ATTRIBUTE _sid INTEGER 0,1,2,3,4,5,6,7,8 1,1,1,1,1,1,1,1,1 @ATTRIBUTE dayofweek STRING MON,TUE,FRI,SAT,SUN,THU 2,2,1,1,1,1 @ATTRIBUTE flowername {Iris-setosa,Iris-versicolor,Iris-virginica} Iris-setosa,Iris-versicolor,Iris-virginica 5,3,1 @ATTRIBUTE temperature REAL 19.2,20.1,21,21.5,22.1,22.3,23.5,24.1 1,1,1,1,1,1,1,1 @ATTRIBUTE date DATE "yyyy-MM-dd'T'HH:mm:ss" 2012-01-01T01:23:45.678+09:00,2012-06-01T06:23:45.678Z,2012-04-01T04:23:45.678-10:00,2012-04-01T06:23:45.678Z,2012-08-01T06:23:45.678Z,2012-09-01T06:23:45.678Z 2,2,1,1,1,1 @ATTRIBUTE comment {"Ethan, Johnson","Jacob, Smith","John, Doe"} "Ethan, Johnson","Jacob, Smith","John, Doe" 1,1,1
Output Format¶
The following information is printed with each attribute:
@ATTRIBUTE statement.
Attribute values separated by comma.
Appearance frequency of each attribute value separated by comma.