単一ラベルの分類

ここでは、RAPID機械学習時系列数値解析 Python APIを用いた分析の例として、 センサーデータの中に、正常時の情報しかないようなデータに対して、 異常検知システムで異常を検知する予測モデルを作成し、 その予測モデルで予測して、予測結果を評価するシナリオを示します。

データを準備する

  1. ユーザがアクセス可能なディレクトリにrapid-tsa-python-getting_started.zipを格納します。

  2. 格納したファイルを解凍します。

    $ unzip rapid-tsa-python-getting_started.zip -d ~/work
    
  3. [work/examples/classification_ocn] ディレクトリに移動します。

    $ cd ~/work/examples/classification_ocn/
    

分析を実行する

1. 予測モデル作成

学習データを用いて予測モデルを作成します。

In [1]:
import os
from rapid_tsa_python import exec_train

ROOT_DIR = os.path.abspath(os.path.curdir)
DATA_DIR = os.path.join(ROOT_DIR, 'data')
preprocess_def_path = os.path.join(DATA_DIR, 'preprocess_def.json')
model_dir = os.path.join(ROOT_DIR, 'model')
exec_train('cls', '1DOCN', os.path.join(DATA_DIR, 'train_label_ocn.lab'), model_dir,
           preprocess_def_path=preprocess_def_path, param_conf_path=os.path.join(DATA_DIR, 'train_param.conf'))
[ Initialize the process. ]	
[ Load preprocessing definition from file(/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/classification_ocn/data/preprocess_def.json). ]	
[ Training start. ]	
[Elapsed time],[File],[Status],[Progress in file]	
0:00:00,1/1,LOAD,0%	
0:00:00,1/1,LOAD,50%	
0:00:00,1/1,LOAD,100%	
0:00:00,1/1,PREPROCESS(sampling),0%	
0:00:00,1/1,PREPROCESS(sampling),100%	
0:00:00,1/1,PREPROCESS(normalize),0%	
0:00:00,1/1,PREPROCESS(normalize),100%	
0:00:00,1/1,SET DATA,0%	
0:00:00,1/1,SET DATA,100%	
[Elapsed time],[Total count],[Learning rate],[Average error],[Epoch],[Status],[Progress in epoch]	
0:00:05,118,0.010000,1.594758e-01,1/5,TRAIN,41%	
0:00:10,249,0.010000,1.363300e-01,1/5,TRAIN,86%	
0:00:11,292,0.010000,1.318809e-01,1/5,TRAIN,100%	
0:00:16,422,0.007525,9.828929e-02,2/5,TRAIN,45%	
0:00:21,544,0.007525,9.426031e-02,2/5,TRAIN,87%	
0:00:23,584,0.007525,9.338711e-02,2/5,TRAIN,100%	
0:00:28,705,0.005050,8.545949e-02,3/5,TRAIN,42%	
0:00:33,828,0.005050,8.395370e-02,3/5,TRAIN,84%	
0:00:35,876,0.005050,8.347237e-02,3/5,TRAIN,100%	
0:00:40,1004,0.002575,7.647706e-02,4/5,TRAIN,44%	
0:00:45,1135,0.002575,7.568922e-02,4/5,TRAIN,89%	
0:00:46,1168,0.002575,7.550640e-02,4/5,TRAIN,100%	
0:00:51,1298,0.000100,7.084814e-02,5/5,TRAIN,45%	
0:00:56,1428,0.000100,7.097346e-02,5/5,TRAIN,90%	
0:00:57,1460,0.000100,7.114370e-02,5/5,TRAIN,100%	
0:01:06,1460,0.000100,7.114370e-02,5/5,OUTPUT,0%	
0:01:06,1460,0.000100,7.114370e-02,5/5,OUTPUT,100%	
[ Training end. ]	
[ Finish the process normally. ]	

2. 結果評価

予測データと予測モデルを用いて予測を実行します。

In [2]:
from rapid_tsa_python import exec_predict

label_path=os.path.join(DATA_DIR, 'predict_label_ocn.lab')
result_dir = output_dir=os.path.join(ROOT_DIR, 'result')
exec_predict(model_dir, label_path=label_path, preprocess_def_path=preprocess_def_path, output_dir=result_dir,
             param_conf_path=os.path.join(DATA_DIR, 'predict_param.conf'))
[ Initialize the process. ]	
[ Load preprocessing definition from file(/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/classification_ocn/data/preprocess_def.json). ]	
[ Predicting start. ]	
[ [Elapsed time],[Total count],[File],[Status],[Progress in file] ]	
0:00:00,0,1/1,LOAD,0%	
0:00:00,0,1/1,LOAD,50%	
0:00:00,0,1/1,LOAD,100%	
0:00:00,0,1/1,PREPROCESS(sampling),0%	
0:00:00,0,1/1,PREPROCESS(sampling),100%	
0:00:00,0,1/1,PREPROCESS(normalize),0%	
0:00:00,0,1/1,PREPROCESS(normalize),100%	
0:00:00,0,1/1,SET DATA,0%	
0:00:00,0,1/1,SET DATA,100%	
0:00:00,0,1/1,PREDICTION,0%	
0:00:00,98,1/1,PREDICTION,100%	
[ Predicting end. ]	
[ Finish the process normally. ]	

時刻ごとのラベルの実測値と予測値を表示します。

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df_actual = pd.read_csv(label_path).reset_index()
df_actual.columns = ['start', 'end', 'actual']
df_predict = pd.read_csv(os.path.join(result_dir, 'predict_label_ocn.result')).reset_index()
df_predict.columns = ['start', 'end', 'predict']

df_result = pd.DataFrame(np.zeros((len(df_predict), 2), dtype=object),
                         columns=['actual', 'predict'], index=df_predict['start'])
df_result['predict'] = df_predict.set_index('start')['predict']
for idx in range(len(df_actual)):
    mask = (df_actual['end'][idx] >= df_predict['end']) & (df_actual['start'][idx] <= df_predict['start'])
    df_result.loc[mask.values, 'actual'] = df_actual.loc[idx,'actual']

df_result.replace({'NG': 0, 'OK': 1.0}, inplace=True)

df_result.plot()
plt.xlabel("Time")
plt.ylabel("Label")
plt.yticks([0.0, 1.0],['NG', 'OK'])
plt.show()