連続値の予測

ここでは、RAPID機械学習時系列数値解析 Python APIを用いた分析の例として、 異常検知システムでセンサーデータから機器の正常度を予測する予測モデルを作成し、 その予測モデルで予測して、予測結果を評価するシナリオを示します。

データを準備する

  1. ユーザがアクセス可能なディレクトリにrapid-tsa-python-getting_started.zipを格納します。

  2. 格納したファイルを解凍します。

    $ unzip rapid-tsa-python-getting_started.zip -d ~/work
    
  3. [work/examples/regression] ディレクトリに移動します。

    $ cd ~/work/examples/regression/
    

分析を実行する

1. 予測モデル作成

学習データを用いて予測モデルを作成します。

In [1]:
import os
from rapid_tsa_python import exec_train

ROOT_DIR = os.path.abspath(os.path.curdir)
DATA_DIR = os.path.join(ROOT_DIR, 'data')
preprocess_def_path = os.path.join(DATA_DIR, 'preprocess_def.json')
model_dir = os.path.join(ROOT_DIR, 'model')
exec_train('reg', '1DCNN', os.path.join(DATA_DIR, 'train_label.lab'), model_dir,
           preprocess_def_path=preprocess_def_path, param_conf_path=os.path.join(DATA_DIR, 'train_param.conf'))
[ Initialize the process. ]	
[ Load preprocessing definition from file(/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/regression/data/preprocess_def.json). ]	
[ Training start. ]	
[Elapsed time],[File],[Status],[Progress in file]	
0:00:00,1/1,LOAD,0%	
0:00:00,1/1,LOAD,50%	
0:00:00,1/1,LOAD,100%	
0:00:00,1/1,PREPROCESS(sampling),0%	
0:00:00,1/1,PREPROCESS(sampling),100%	
0:00:00,1/1,PREPROCESS(normalize),0%	
0:00:00,1/1,PREPROCESS(normalize),100%	
0:00:00,1/1,SET DATA,0%	
0:00:00,1/1,SET DATA,100%	
[Elapsed time],[Total count],[Learning rate],[Average error],[Epoch],[Status],[Progress in epoch]	
0:00:05,17126,0.010000,3.310125e-02,1/5,TRAIN,87%	
0:00:05,19807,0.010000,3.186446e-02,1/5,TRAIN,100%	
0:00:10,38721,0.007525,2.089667e-02,2/5,TRAIN,96%	
0:00:10,39614,0.007525,2.087175e-02,2/5,TRAIN,100%	
0:00:15,58426,0.005050,1.829225e-02,3/5,TRAIN,95%	
0:00:16,59421,0.005050,1.823982e-02,3/5,TRAIN,100%	
0:00:21,78735,0.002575,1.669550e-02,4/5,TRAIN,98%	
0:00:21,79228,0.002575,1.669504e-02,4/5,TRAIN,100%	
0:00:26,98651,0.000100,1.569601e-02,5/5,TRAIN,99%	
0:00:26,99035,0.000100,1.567782e-02,5/5,TRAIN,100%	
0:00:26,99035,0.000100,1.567782e-02,5/5,OUTPUT,0%	
0:00:26,99035,0.000100,1.567782e-02,5/5,OUTPUT,100%	
[ Training end. ]	
[ Finish the process normally. ]	

2. 結果評価

予測データと予測モデルを用いて予測を実行します。

In [2]:
from rapid_tsa_python import exec_predict

label_path=os.path.join(DATA_DIR, 'predict_label.lab')
result_dir = output_dir=os.path.join(ROOT_DIR, 'result')
exec_predict(model_dir, label_path=label_path, preprocess_def_path=preprocess_def_path, output_dir=result_dir,
             param_conf_path=os.path.join(DATA_DIR, 'predict_param.conf'))
[ Initialize the process. ]	
[ Config file (/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/regression/data/predict_param.conf) : item [1DOCN_FILTER_SIZE] is not specified. Use default value(1). ]	
[ Config file (/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/regression/data/predict_param.conf) : item [1DOCN_TOLERANCE] is not specified. Use default value(1). ]	
[ Load preprocessing definition from file(/home/aapfuser/rapid_tsa/rapid-tsa-python/doc/rapid-tsa-python-getting_started/examples/regression/data/preprocess_def.json). ]	
[ Predicting start. ]	
[ [Elapsed time],[Total count],[File],[Status],[Progress in file] ]	
0:00:00,0,1/1,LOAD,0%	
0:00:00,0,1/1,LOAD,50%	
0:00:00,0,1/1,LOAD,100%	
0:00:00,0,1/1,PREPROCESS(sampling),0%	
0:00:00,0,1/1,PREPROCESS(sampling),100%	
0:00:00,0,1/1,PREPROCESS(normalize),0%	
0:00:00,0,1/1,PREPROCESS(normalize),100%	
0:00:00,0,1/1,SET DATA,0%	
0:00:00,0,1/1,SET DATA,100%	
0:00:00,0,1/1,PREDICTION,0%	
0:00:00,97,1/1,PREDICTION,100%	
[ Predicting end. ]	
[ Finish the process normally. ]	

時刻ごとのラベルの実測値と予測値を表示します。

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df_actual = pd.read_csv(label_path).reset_index()
df_actual.columns = ['start', 'end', 'actual']
df_predict = pd.read_csv(os.path.join(result_dir, 'predict_label.result')).reset_index()
df_predict.columns = ['start', 'end', 'predict']

df_result = pd.DataFrame(np.zeros((len(df_predict), 2), dtype=object),
                         columns=['actual', 'predict'], index=df_predict['start'])
df_result['predict'] = df_predict.set_index('start')['predict']
for idx in range(len(df_actual)):
    mask = (df_actual['end'][idx] >= df_predict['end']) & (df_actual['start'][idx] <= df_predict['start'])
    df_result.loc[mask.values, 'actual'] = df_actual.loc[idx,'actual']

df_result.plot()
plt.xlabel("Time")
plt.ylabel("Label")
plt.show()