{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 属性生成コンポーネントを用いた手軽な有効属性の生成\n",
    "\n",
    "## 目次\n",
    "\n",
    "- [1. はじめに](#1.-はじめに)\n",
    "- [2. データの準備](#2.-データの準備)\n",
    "- [3. コンポーネントが出力する属性の条件](#3.-コンポーネントが出力する属性の条件)\n",
    "- [4. 複製と結合](#4.-複製と結合)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. はじめに\n",
    "\n",
    "本章を通して、ユーザーはモデルの予測精度を向上させるための有効属性を増やす、属性生成の理解を深められます。  \n",
    "また、属性生成コンポーネントを利用することで、手軽に二値展開や標準化などが行えるようになります。\n",
    "\n",
    "具体的な達成目標は、以下の通りです。\n",
    "\n",
    "- **「コンポーネントが出力する属性の条件を理解する」**\n",
    "- **「『複製』と『結合』を理解した上で、SPDを記述できる」**\n",
    "  - 複製：1つのコンポーネントから複数のコンポーネントにデータを渡す場合、全てに同一なデータを渡すこと\n",
    "  - 結合：複数のコンポーネントから1つのコンポーネントへデータを渡す場合、データを結合してから渡すこと\n",
    "\n",
    "自動車の燃料消費量予測の分析対象データを用いて、属性の複製と結合とコンポーネントが出力する属性について示します。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. データの準備\n",
    "\n",
    "本節では、[学習・予測の実行と結果確認の2. データの準備](../simple/simple.ipynb#2.-データの準備)と同様にデータの準備について記述します。\n",
    "\n",
    "以下のデータ準備の手順について示します。\n",
    "\n",
    "1. 分析対象データに _sid の追加\n",
    "2. ASD（属性スキーマ）の作成\n",
    "3. 分析対象データから学習用と予測用を作成\n",
    "\n",
    "以下のコードで、自動車の燃料消費量予測で使用される分析対象データを示します。\n",
    "\n",
    "データは、UCIのオープンデータである Auto MPG Data Set (https://archive.ics.uci.edu/ml/datasets/auto+mpg) を属性名`car_name`を削除して、利用しています。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_sid</th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>horsepower</th>\n",
       "      <th>weight</th>\n",
       "      <th>acceleration</th>\n",
       "      <th>model_year</th>\n",
       "      <th>origin</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>130</td>\n",
       "      <td>3504</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>165</td>\n",
       "      <td>3693</td>\n",
       "      <td>11.5</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>318.0</td>\n",
       "      <td>150</td>\n",
       "      <td>3436</td>\n",
       "      <td>11.0</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>16.0</td>\n",
       "      <td>8</td>\n",
       "      <td>304.0</td>\n",
       "      <td>150</td>\n",
       "      <td>3433</td>\n",
       "      <td>12.0</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>17.0</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>140</td>\n",
       "      <td>3449</td>\n",
       "      <td>10.5</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   _sid   mpg  cylinders  displacement  horsepower  weight  acceleration  \\\n",
       "0     0  18.0          8         307.0         130    3504          12.0   \n",
       "1     1  15.0          8         350.0         165    3693          11.5   \n",
       "2     2  18.0          8         318.0         150    3436          11.0   \n",
       "3     3  16.0          8         304.0         150    3433          12.0   \n",
       "4     4  17.0          8         302.0         140    3449          10.5   \n",
       "\n",
       "   model_year  origin  \n",
       "0          70       1  \n",
       "1          70       1  \n",
       "2          70       1  \n",
       "3          70       1  \n",
       "4          70       1  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "input_data = pd.read_csv('./data/auto-mpg.csv', na_values='?')\n",
    "\n",
    "input_data.dropna(inplace=True)\n",
    "input_data.insert(0, '_sid', list(range(input_data.shape[0])))\n",
    "\n",
    "input_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下記のコードで、分析対象データからASDを作成します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>scale</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>_sid</th>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mpg</th>\n",
       "      <td>REAL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cylinders</th>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>displacement</th>\n",
       "      <td>REAL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>horsepower</th>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>weight</th>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acceleration</th>\n",
       "      <td>REAL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>model_year</th>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>origin</th>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                scale\n",
       "_sid          INTEGER\n",
       "mpg              REAL\n",
       "cylinders     INTEGER\n",
       "displacement     REAL\n",
       "horsepower    INTEGER\n",
       "weight        INTEGER\n",
       "acceleration     REAL\n",
       "model_year    INTEGER\n",
       "origin        INTEGER"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sampotools.api import gen_asd_from_pandas_df\n",
    "\n",
    "asd = gen_asd_from_pandas_df(input_data)\n",
    "pd.DataFrame(asd).T[['scale']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[学習・予測の実行と結果確認の2. データの準備](../simple/simple.ipynb#2.-データの準備)で示している通り、上記のASDのweightとoriginの型を修正し、ASDファイルとして出力します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>scale</th>\n",
       "      <th>domain</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>_sid</th>\n",
       "      <td>INTEGER</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mpg</th>\n",
       "      <td>REAL</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cylinders</th>\n",
       "      <td>INTEGER</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>displacement</th>\n",
       "      <td>REAL</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>horsepower</th>\n",
       "      <td>INTEGER</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>weight</th>\n",
       "      <td>REAL</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acceleration</th>\n",
       "      <td>REAL</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>model_year</th>\n",
       "      <td>INTEGER</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>origin</th>\n",
       "      <td>NOMINAL</td>\n",
       "      <td>[1, 2, 3]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                scale     domain\n",
       "_sid          INTEGER        NaN\n",
       "mpg              REAL        NaN\n",
       "cylinders     INTEGER        NaN\n",
       "displacement     REAL        NaN\n",
       "horsepower    INTEGER        NaN\n",
       "weight           REAL        NaN\n",
       "acceleration     REAL        NaN\n",
       "model_year    INTEGER        NaN\n",
       "origin        NOMINAL  [1, 2, 3]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sampotools.api import save_asd\n",
    "\n",
    "#修正\n",
    "asd['weight'] = {'scale': 'REAL'}\n",
    "asd['origin'] = {'scale': 'NOMINAL', 'domain': ['1', '2', '3']}\n",
    "\n",
    "#出力\n",
    "save_asd(asd_object=asd, file_path='./data/auto-mpg.asd')\n",
    "\n",
    "pd.DataFrame(asd).T[['scale', 'domain']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下記のコードを実行することで、全体の90%にあたる件数を学習用とし、CSVファイルを出力します。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_all = len(input_data)\n",
    "n_predict = n_all // 10\n",
    "n_learn = n_all - n_predict\n",
    "\n",
    "learn_data = input_data.iloc[0:n_learn,:]\n",
    "learn_data.to_csv('./data/auto-mpg_learn.csv', sep=\",\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本節の具体的な説明は、[学習・予測の実行と結果確認の2. データの準備](../simple/simple.ipynb#2.-データの準備) を参照してください。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. コンポーネントが出力する属性の条件  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下図は、以下の2点について示しています。\n",
    " - 分析対象データは、データローダーコンポーネントから二値展開の属性生成コンポーネントに入力される\n",
    " - 二値展開の属性生成コンポーネントから出力された属性が予測器コンポーネントに入力される\n",
    "\n",
    "二値展開は、カテゴリ型の属性から数値型の属性を生成しています。そのため、予測器コンポーネントに二値展開で生成した属性とSPDで設定した属性を渡しています。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![属性生成を行うときのデータフロー](../_static/sampo_attr_single.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "属性生成コンポーネントが出力する属性の条件は、以下の通りです。\n",
    "\n",
    "- 出力する属性\n",
    "    - コンポーネントで生成された属性 (図中の属性名：x3, x4)\n",
    "    - SPDの`global_settings`の`keep_attributes`で設定した属性 (図中の属性名：y)  \n",
    "  \n",
    "- 出力しない属性\n",
    "    - 属性選択条件で一致しなかった属性（図中の属性名：x1）\n",
    "    - 属性選択条件で一致し、入力された属性 (図中の属性名：x2)\n",
    "\n",
    "ただし、属性選択条件で一致してコンポーネントに入力された属性でも、SPDの`global_settings`の`keep_attributes`で設定していれば属性は出力されます。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SPDのglobal_settingsは、下記のパラメーターを持ちます。\n",
    "\n",
    "- **keep_attributes** \n",
    "    - コンポーネントに必ず引き継がせたい属性名を指定します。目的変数のような予測器コンポーネントまで引き継ぐ必要がある属性を設定します。\n",
    "\n",
    "- **feature_exclude** \n",
    "    - 各コンポーネントの属性選択条件に一致しても、一切使用しない属性名を指定します。目的変数は使用されないように設定する必要があります。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以下に、二値展開の属性生成コンポーネントを使用するSPDの記述例を示します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "spd_content = '''\n",
    "dl -> bexp -> rg\n",
    "\n",
    "---\n",
    "\n",
    "components:\n",
    "    dl:\n",
    "        component: DataLoader\n",
    "\n",
    "    bexp:\n",
    "        component: BinaryExpandFDComponent\n",
    "        features: scale == 'nominal'\n",
    "        \n",
    "    rg:\n",
    "        component: FABHMEBernGateLinearRgComponent\n",
    "        features: name != 'mpg'\n",
    "        target: name == 'mpg'\n",
    "        standardize_target: True\n",
    "        tree_depth: 3\n",
    "\n",
    "global_settings:\n",
    "    keep_attributes:\n",
    "        - mpg\n",
    "    feature_exclude:\n",
    "        - mpg\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以下に学習用SRCの記述例を示します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn_src_templ = '''\n",
    "fabhmerg_learn_attr:\n",
    "    type: learn\n",
    "\n",
    "    data_sources:\n",
    "        dl:\n",
    "            path: ./data/auto-mpg_learn.csv\n",
    "            attr_schema: ./data/auto-mpg.asd\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上記のSPDと学習用SRCを用いて、学習を実行します。\n",
    "SAMPO/FABが読み込めるように、SPDをgen_spd()関数で、学習用SRCをgen_src()関数で生成します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "fabhmerg_learn_attr.6929da69-b371-4f19-ae85-cd65ba3dcdf3"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
   "import os\n",
    "from sampo.api import gen_spd, gen_src, process_store, process_runner\n",
    "\n",
    "pstore_url = './pstore_attr'\n",
    "if not os.path.isdir(pstore_url):\n",
    "    process_store.create(pstore_url)\n",
    "\n",
    "spd = gen_spd(template=spd_content)\n",
    "learn_src = gen_src(template=learn_src_templ)\n",
    "process_runner.run(src=learn_src, spd=spd, pstore_url=pstore_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "学習を実行し、属性生成コンポーネントの出力した属性を確認します。\n",
    "load_comp_output()関数に属性生成のコンポーネントIDを指定することで、下表のように生成された属性と`keep_attributes`に指定された属性が出力されていることを確認できます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>bexp(1)_origin</th>\n",
       "      <th>bexp(2)_origin</th>\n",
       "      <th>bexp(3)_origin</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_sid</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>15.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>16.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>17.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       mpg  bexp(1)_origin  bexp(2)_origin  bexp(3)_origin\n",
       "_sid                                                      \n",
       "0     18.0             1.0             0.0             0.0\n",
       "1     15.0             1.0             0.0             0.0\n",
       "2     18.0             1.0             0.0             0.0\n",
       "3     16.0             1.0             0.0             0.0\n",
       "4     17.0             1.0             0.0             0.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sampo.api import process_store\n",
    "import pandas as pd\n",
    "\n",
    "with process_store.open_process(pstore_url, 'fabhmerg_learn_attr') as prl:\n",
    "    bexp_df = prl.load_comp_output('bexp')\n",
    "\n",
    "bexp_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 複製と結合\n",
    "\n",
    "本節では、複数の属性生成コンポーネントを使用するケースで複製と結合を示します。\n",
    "  - 複製：1つのコンポーネントから複数のコンポーネントにデータを渡す場合、全てに同一なデータを渡すこと\n",
    "  - 結合：複数のコンポーネントから1つのコンポーネントへデータを渡す場合、データを結合してから渡すこと"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下図では、2つの属性生成コンポーネントを使用する例を示します。\n",
    "\n",
    "  - データローダーコンポーネントから複数の属性生成コンポーネントへ分析対象データを入力する場合、分析対象データを複製して各属性生成コンポーネントへ同じ入力をする \n",
    "  - 2つある属性生成コンポーネントが出力した属性を、予測器コンポーネントは結合して入力します。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![結合時のデータフロー](../_static/sampo_attr_multi.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上図のように、複数の属性生成コンポーネントを使用する場合の SPDの記述例を示します。  \n",
    "下記のデータフローセクションでは、`dl`から`bexp`と`std`にデータを複製して入力するので、`std`の前の`dl`は省略することが出来ます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "spd_content_attr = '''\n",
    "dl -> bexp -> rg\n",
    "   -> std -> rg\n",
    "\n",
    "---\n",
    "\n",
    "components:\n",
    "    dl:\n",
    "        component: DataLoader\n",
    "\n",
    "    bexp:\n",
    "        component: BinaryExpandFDComponent\n",
    "        features: scale == 'nominal'\n",
    "\n",
    "    std:\n",
    "        component: StandardizeFDComponent\n",
    "        features: scale == 'real' or scale == 'integer'\n",
    "\n",
    "    rg:\n",
    "        component: FABHMEBernGateLinearRgComponent\n",
    "        features: name != 'mpg'\n",
    "        target: name == 'mpg'\n",
    "        standardize_target: True\n",
    "        tree_depth: 3\n",
    "\n",
    "global_settings:\n",
    "    keep_attributes:\n",
    "        - mpg\n",
    "    feature_exclude:\n",
    "        - mpg\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "データフローセクションの仕様については、`Analytics Reference`の`SPD (SAMPO Process Description) Specification`を参照してください。\n",
    "\n",
    "以下に、学習用SRCを示します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn_src_templ = '''\n",
    "fabhmerg_learn_attr:\n",
    "    type: learn\n",
    "\n",
    "    data_sources:\n",
    "        dl:\n",
    "            path: ./data/auto-mpg_learn.csv\n",
    "            attr_schema: ./data/auto-mpg.asd\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上記のSPDとSRCを使用して学習を実行します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "fabhmerg_learn_attr.5cfd4c92-31bc-4ff3-a71c-d2dab6a8289d"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sampo.api import gen_spd, gen_src, process_store, process_runner\n",
    "\n",
    "pstore_url = './pstore_attr'\n",
    "if not os.path.isdir(pstore_url):\n",
    "    process_store.create(pstore_url)\n",
    "\n",
    "spd = gen_spd(template=spd_content_attr)\n",
    "learn_src = gen_src(template=learn_src_templ)\n",
    "process_runner.run(src=learn_src, spd=spd, pstore_url=pstore_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "各属性生成コンポーネントから出力された属性が、結合して予測器コンポーネントに入力されたことを確認します。  \n",
    "確認方法として、実行済プロセスから予測式の情報を読み込み、その中から`attr_name`を出力して確認します。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>attr_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bexp(1)_origin</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>bexp(2)_origin</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bexp(3)_origin</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>std_cylinders</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>std_displacement</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>std_horsepower</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>std_weight</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>std_acceleration</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>std_model_year</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>bias</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>variance</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           attr_name\n",
       "0     bexp(1)_origin\n",
       "1     bexp(2)_origin\n",
       "2     bexp(3)_origin\n",
       "3      std_cylinders\n",
       "4   std_displacement\n",
       "5     std_horsepower\n",
       "6         std_weight\n",
       "7   std_acceleration\n",
       "8     std_model_year\n",
       "9               bias\n",
       "10          variance"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sampo.api import process_store\n",
    "import pandas as pd\n",
    "\n",
    "with process_store.open_process(pstore_url, 'fabhmerg_learn_attr') as prl:\n",
    "    df = prl.load_model('rg')\n",
    "\n",
    "result_df = df['prediction_formulas']\n",
    "result_df.reset_index(inplace=True)\n",
    "result_df.iloc[:, 0:1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`attr_name`には、二値展開された属性と標準化された属性の名前が混在します。そのため、各属性生成コンポーネントにデータローダーコンポーネントから同じ入力データを用いてそれぞれで生成された属性が出力され、それらが結合されて予測器コンポーネントに入力されたことが確認できます。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[ページトップへ](#top)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
