機器學習介紹 (Machine learning introduction)

  • XX: feature domain (input), YY range (output)

  • f:XYf: X \rightarrow Y, 將資料由feature轉換到classification (如果YRY \in \mathbb{R}為regression)的函數,其解析形式未知,只有部份資料集合DD被觀測到。

  • 假設資料集合DD的特徵{x1,x2,,XN}\{ x_1, x_2, \cdots, X_N\}的定態(stationary)機率分佈為PP

  • 訓練資料的由未知的資料分佈PP隨機抽取樣本而產生x={x1,x2,,xN} \mathbf{x} = \{x_1, x_2, \cdots, x_N\} ,然後x\mathbf{x}及其目標值yy被提供給學習器,學習器在學習目標函數ff時的定義域為假設集合HH (set of function)。

  • 在觀察了一系列訓練資料x\mathbf{x}後,學習器需要從假設集合HH中得到最終的假設(函數)gg,這是對資料集合DD未知分佈的目標函數ff的理想估計函數。

  • 最後,我們通過訓練出來的假設ggXX中新的資料的性能來評估訓練器(leave-one-out, K-fold cross-validation or other unobserved samples)。

Example: 核定信用卡問題

  • 以下為某個使用者的個人信用資料,請問是否應核準其信用卡的申請?
欄位
age 23
gender female
annual salary NTD 1,000,000
year in residence 1 year
year in job 3 year
current debt 200,000
  • 因此input x={23,female,1000000,1year,3year,200000} x = \lbrace 23, female, 1000000, 1 year, 3 year, 200000 \rbrace .
    • output y=1 y = 1:核準, y=0y=0:拒絕。
    • unknown target function f:xy f: x \rightarrow y .
    • hypothesis: g:xy g: x \rightarrow y .
    • 目標是使機器學習模型的函數 h h 逼近未知的真正目標函數 f f .

Machine learning and data mining

  • ML: use data to compute hypothesis that approximates target f.
  • DM: use (huge) data to find property that is interesting.

  • If "interesting property" == "hypothesis that approximate target"

    • then ML == DM

Machine learning and artificial intelligence

  • AI: compute something that shows intelligent behavior.

  • g approximates f is something that shows intelligent behavior

    • ML can realize AI, among other routes.
    • eg. chess playing.

Machine learning and statistics

  • statistics: use data to make inference about an unknown process.

  • g is an inference outcome;

    • f is something unknown
    • statistics can be used to achieve ML
    • traditional statistics also fcous on provable results with math assumptions, and care less about computation.

results matching ""

    No results matching ""