Statistics @ ToTo 奇妙の冒險

Nov 15 Sat 2008 05:01
Cross。Correlation。Function

兩年前，我在遇到一個「如何計算兩組時間序列資料的相關程度」的問題時，著實是卡條了好一陣子。當時去問還在當我 supervisor 的馬克時，他跟我說簡單地用一個 mixed model 然後那個參數估計值就可以當作兩個時間序列資料的相關程度，但我一直覺得這個答案很不好。其實也不是說他的建議是錯的，而是就兩個方面來講，一是 mixed model 的配適過程複雜，除了一開始要 model selection 外，之後還要弄 model diagnostics，對一般使用者來講並不是那麼容易。其次，參數估計值出來後並非介於 -1 到 1 之間，所以沒有一般相關係數那麼直觀。

後來這個問題就有點不了了之，直到多年後的兩年後才無意間發現解決方法。

在這邊跟大家介紹一種叫做 CCF 的方法。當然這個 CCF 不是指陳金鋒（Chen, Chin-Feng），而是指 cross-correlation function。中文怎麼翻譯呢？其實我也不知道。囧rz

cchien 發表在痞客邦留言(0) 人氣()

個人分類：Statistics

▲top

Dec 02 Sun 2007 21:56
Nearest。Neighbor。Imputation

為了提升本部落格的專業水準，即日起不定期刊登一些專業的統計知識。

（小聲講）好吧～我承認。。。這其實是要放在proposal裡面，只是先寫出來，等有空再翻譯成英文 >_<

上面的字看不清楚嗎？沒關係，不重要，跳過吧！

今天來講一個好玩的資料插補方法，叫做 Nearest neighbor imputation。中文翻譯，嗯，最靠近鄰居插補法。。。。

(繼續閱讀...)

cchien 發表在痞客邦留言(1) 人氣()

個人分類：Statistics

▲top

Jul 22 Sun 2007 13:24
Heteroscedasticity Test in Linear Regression Model

話說很久沒發正經一點的統計文章了。。。。

在迴歸分析中，最麻煩的事情莫過於模式鑑定。其中，又以模式是否違反變異數同質性是許多人感到最困難的。變異數同質性的道理基於殘差的變異數需呈現一個固定常數。但要達到這個需求，就必須要靠依變數的變異數 Var(Y) 不會隨著其期望值 E(Y) 的變動而變動。在一般大學的迴歸分析課程裡面，老師們通常會教學生畫一張 standardized residual (or studentized residual) v.s. predicted value 的散佈圖。如果這張圖裡面的點是呈現水平帶狀散佈，就表示模式的變異數同質性沒有違反，如下所示：

如果是呈現某種特殊趨勢，如馬鞍狀或扇形，就表示模式違反了這個假設。圖形如下所示：

但是「看圖說故事」人人都會，有時候圖形明明出現一點趨勢，但還是可以硬凹講成水平帶狀。反正這是自由心證，也沒有科學數據來證實「多少程度水平帶狀才叫做真的水平帶狀」。此外，樣本太少也是導致誤判的因素之一。因為可能在樣本數少的時候呈現水平帶狀，但誰知道等到樣本數增加時會不會變成扇形。。。。

如果這個假設能夠使用一些真正可算出 p-value 的數據來幫助判別，那一定相當方便，也可避免硬凹的情況。不過，以前的老師沒教過，教科書上也鮮少提到。其實還是有人發明出來的，其實這個檢定早在 1979 年就由 Breusch 和 Pagan 發表出來了，就取名為 Breusch-Pagan Test。

Breusch, T. and Pagan, A. (1979), ``A Simple Test for Heteroscedasticity and Random Coefficient Variation," Econometrica, 47, 1287-1294.

接著 White 也發明了另一個檢定來進行變異數同質性檢定（取名為 White's Test）。

White, H. (1980), ``A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroscedasticity," Econometrica, 48, 817-838.

SAS 並沒有把這兩個檢定放進 PROC REG 裡面，而是放在 PROC MODEL 中。有興趣的人可以到下面的連結去 copy 程式碼。

A Simple Regression Model with Correction of Heteroscedasticity

cchien 發表在痞客邦留言(1) 人氣()

個人分類：Statistics

▲top

Feb 06 Mon 2006 02:58
The test of multivariate normal distribution

In some statistical analysis, we'd like to test assumption of
normality in the beginning before analyzing. In univariate case, we
all understand Q-Q plot and some K-S statistic can be used to
assess normality. However, in multivariate normal distribution, how
about that?

Mardia's statistic is a test for multivariate normality. Based on
functions of skewness and kurtosis, Mardia's PK should be less than
3 to assume the assumption of multivariate normality is met. But,
whatever in SAS or SPSS, there is no easy way to use any statement
to perform it in any procedure.

In SAS, we need to use a macro procedure to calculate Mardia's PK
statistics. SAS Inc. released the codes on official website. Please
check the following link:

http://support.sas.com/ctx/samples/index.jsp?sid=480

Also, in SPSS, we need to use a macro to examine
bivariate/multivariate normality. Check it:

http://www.columbia.edu/~ld208/

-----

cchien 發表在痞客邦留言(0) 人氣()

個人分類：Statistics

▲top

ToTo 奇妙の冒險

ToTo bizarre adventure:::: 美國生活、統計學習、JOJO冒險野郎

目前分類：Statistics (4)

Cross。Correlation。Function

Nearest。Neighbor。Imputation

Heteroscedasticity Test in Linear Regression Model

The test of multivariate normal distribution

QR Code