複数ラベルの分類

ここでは、RAPID機械学習時系列数値解析 Python APIを用いた分析の例として、 センサーデータの中に、異常時の情報と正常時の情報があるデータに対して、 異常検知システムで異常を検知する予測モデルを作成し、 その予測モデルで予測して、予測結果を評価するシナリオを示します。

データを準備する

  1. ユーザがアクセス可能なディレクトリにrapid-tsa-python-getting_started.zipを格納します。

  2. 格納したファイルを解凍します。

    $ unzip rapid-tsa-python-getting_started.zip -d ~/work
    
  3. [work/examples/classification] ディレクトリに移動します。

    $ cd ~/work/examples/classification/
    

分析を実行する

1. 予測モデル作成

学習データを用いて予測モデルを作成します。

In [1]:
import os
from rapid_tsa_python import exec_train

ROOT_DIR = os.path.abspath(os.path.curdir)
DATA_DIR = os.path.join(ROOT_DIR, 'data')
preprocess_def_path = os.path.join(DATA_DIR, 'preprocess_def.json')
model_dir = os.path.join(ROOT_DIR, 'model')
exec_train('cls', '1DCNN', os.path.join(DATA_DIR, 'train_label.lab'), model_dir,
           preprocess_def_path=preprocess_def_path, param_conf_path=os.path.join(DATA_DIR, 'train_param.conf'))
[ Initialize the process. ]	
[ Load preprocessing definition from file(/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/classification/data/preprocess_def.json). ]	
[ Training start. ]	
[Elapsed time],[File],[Status],[Progress in file]	
0:00:00,1/1,LOAD,0%	
0:00:00,1/1,LOAD,50%	
0:00:00,1/1,LOAD,100%	
0:00:00,1/1,PREPROCESS(sampling),0%	
0:00:00,1/1,PREPROCESS(sampling),100%	
0:00:00,1/1,PREPROCESS(normalize),0%	
0:00:00,1/1,PREPROCESS(normalize),100%	
0:00:00,1/1,SET DATA,0%	
0:00:00,1/1,SET DATA,100%	
[Elapsed time],[Total count],[Learning rate],[Average error],[Epoch],[Status],[Progress in epoch]	
0:00:05,17190,0.010000,1.552734e-02,1/5,TRAIN,87%	
0:00:05,19827,0.010000,1.443377e-02,1/5,TRAIN,100%	
0:00:10,38925,0.007525,7.111058e-03,2/5,TRAIN,97%	
0:00:10,39654,0.007525,7.133604e-03,2/5,TRAIN,100%	
0:00:15,58143,0.005050,5.400561e-03,3/5,TRAIN,94%	
0:00:16,59481,0.005050,5.275680e-03,3/5,TRAIN,100%	
0:00:21,78852,0.002575,5.194904e-03,4/5,TRAIN,98%	
0:00:21,79308,0.002575,5.191385e-03,4/5,TRAIN,100%	
0:00:26,97923,0.000100,4.538004e-03,5/5,TRAIN,94%	
0:00:26,99135,0.000100,4.538002e-03,5/5,TRAIN,100%	
0:00:26,99135,0.000100,4.538002e-03,5/5,OUTPUT,0%	
0:00:26,99135,0.000100,4.538002e-03,5/5,OUTPUT,100%	
[ Training end. ]	
[ Finish the process normally. ]	

2. 結果評価

予測データと予測モデルを用いて予測を実行します。

In [2]:
from rapid_tsa_python import exec_predict

label_path=os.path.join(DATA_DIR, 'predict_label.lab')
result_dir = output_dir=os.path.join(ROOT_DIR, 'result')
exec_predict(model_dir, label_path=label_path, preprocess_def_path=preprocess_def_path, output_dir=result_dir,
             param_conf_path=os.path.join(DATA_DIR, 'predict_param.conf'))
[ Initialize the process. ]	
[ Config file (/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/classification/data/predict_param.conf) : item [1DOCN_FILTER_SIZE] is not specified. Use default value(1). ]	
[ Config file (/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/classification/data/predict_param.conf) : item [1DOCN_TOLERANCE] is not specified. Use default value(1). ]	
[ Load preprocessing definition from file(/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/classification/data/preprocess_def.json). ]	
[ Predicting start. ]	
[ [Elapsed time],[Total count],[File],[Status],[Progress in file] ]	
0:00:00,0,1/1,LOAD,0%	
0:00:00,0,1/1,LOAD,50%	
0:00:00,0,1/1,LOAD,100%	
0:00:00,0,1/1,PREPROCESS(sampling),0%	
0:00:00,0,1/1,PREPROCESS(sampling),100%	
0:00:00,0,1/1,PREPROCESS(normalize),0%	
0:00:00,0,1/1,PREPROCESS(normalize),100%	
0:00:00,0,1/1,SET DATA,0%	
0:00:00,0,1/1,SET DATA,100%	
0:00:00,0,1/1,PREDICTION,0%	
0:00:00,97,1/1,PREDICTION,100%	
[ Predicting end. ]	
[ Finish the process normally. ]	

時刻ごとのラベルの実測値と予測値を表示します。

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df_actual = pd.read_csv(label_path).reset_index()
df_actual.columns = ['start', 'end', 'actual']
df_predict = pd.read_csv(os.path.join(result_dir, 'predict_label.result')).reset_index()
df_predict.columns = ['start', 'end', 'predict']

df_result = pd.DataFrame(np.zeros((len(df_predict), 2), dtype=object),
                         columns=['actual', 'predict'], index=df_predict['start'])
df_result['predict'] = df_predict.set_index('start')['predict']
for idx in range(len(df_actual)):
    mask = (df_actual['end'][idx] >= df_predict['end']) & (df_actual['start'][idx] <= df_predict['start'])
    df_result.loc[mask.values, 'actual'] = df_actual.loc[idx,'actual']

df_result.replace({'Error': 0, 'Warning': 0.5, 'Normal': 1.0}, inplace=True)

df_result.plot()
plt.xlabel("Time")
plt.ylabel("Label")
plt.yticks([0.0, 0.5, 1.0],['Error', 'Warning', 'Normal'])
plt.show()