Doing Bayesian Data Analysis, 2nd筆記
- 雖然本書的範例均非PyMC,但我們將改寫為PyMC3的程式討論之。
- 此書包含三大部份:
- 貝式機率、模型、推理介紹。
- 現代貝式分析與MCMC(Monte Carlo Markov chain)介紹與範例。
- 應用貝式模型至真實資料上。
Chapter 2: Introduction: Credibility, Models, and Parameters
文中在不混淆的情形下,使用可信度(credibility)替代機率(probability)一詞。
Bayesian data analysis has two foundational ideas.
- Bayesian inference is reallocation of credibility across possibilities. (貝式推理基於可能性重新分配可信度(機率))。
- The possibilities, over which we allocate credibility, are parameter values in meaningful mathematical models.
假設我們早上出門時,發現人行道是濕的,其原因可能是最近下雨、最近花園在灌溉、新噴發的泉水、下水道管道爆裂、路人不小心灑了飲料等等不同可能性。如果我們只知道人行道是濕的以及上述不同原因基於事前可信度所得出的事前機率,比如說最近下雨的機率應大於路人不小心灑了飲料的機率。繼續往前走時,會觀察到許多新的資料,若我們發現路上所見之處都是濕的,而且樹上與路邊的車輛也是濕的,那麼我們會提升最近下雨的可信度。若我們發現路上只有一小部份是濕的,而在不遠處發現路上有一個空的飲料杯,那麼我們會提升路人不小心灑了飲料的可信度。這種基於機率重新分配可信程度是貝式推理的基礎。
Sherlock Holmes "How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?",即使某一事件事前(priori)機率非常小,但是若其它因素均排除後,即使此事件發生的機率很小,仍應相信其發生的可能性。
上圖中,假設事件的結果與原因有絕對的關係;然而現實中,事件與原因的關係通常是機率性的,比如說在犯罪現場發現鞋印,通過鞋印的大小和尺寸可以猜測犯人所穿的鞋子,但是無法百分之百正確猜中。這個範例中,原因(cause)是鞋子,而量測結果(measured effect)是鞋印的大小,其為隨機關係。在科學實驗中,量測通常充滿了隨機性,外在的因素常常會影響量測的準確性。所有的量測資料必定含有雜訊(noise), 而我們的目標是從含有雜訊的資料中找出其趨勢。
THE STEPS OF BAYESIAN DATA ANALYSIS
- Identify the data relevant to the research questions. (定義特徵與預測的對象變數)
- Define a descriptive model for the relevant data. The mathematical form and its parameters should be meaningful and appropriate to the theoretical purposes of the analysis.
- Specify a prior distribution on the parameters.
- Use Bayesian inference to re-allocate credibility across parameter values. Interpret the posterior distribution with respect to theoretically meaningful issues (assuming that the model is a reasonable description of the data; see next step).
- Check that the posterior predictions mimic the data with reasonable accuracy (i.e., conduct a “posterior predictive check”). If not, then consider a different descriptive model.
Data analysis without parametric models?
- 本書不討論Bayesian nonparametric model。 * nonparametric model不是沒有參數,而是參數太多(無窮大)以致於無法表達。
- For a tutorial on Bayesian nonparametric models, see Gershman and Blei (2012); for a recent review, see Müller and Mitra (2013); and for textbook applications, see Gelman et al. (2013).
參考文獻
- [Doing Bayesian Data Analysis, 2nd Edition] (http://store.elsevier.com/Doing-Bayesian-Data-Analysis/John-Kruschke/isbn-9780124059160/)