I met with Marianne, a post doc in the School of Nursing in UNC-CH,

to discuss some remaining problems in her publication. Those

problems caused me confused as well, but my supervisor Mark didn't

have time to participate in our meeting last Friday. Hence, I stop

by Mark's office again today (Monday). There are some useful

conclusion after meeting with Mark.

The first question is whether Likelihood Ratio Test(LRT) is

suitable in comparing two models have the same independent

variables but some of them have differnt attributes. For example,

one variable is continuous in a Model, but how about using a

categorical one in the same model? That's the key point of this

question. The only confused thing is that LRT is only used in

nested model, but I am not sure whether this kind of situation is

the same. Mark told me it is because we can regard the continuse

one is a reduced model and the categorical one is full model in

that there are more variables in categorical one. Based on this

assumption, we can use LRT as usual.

The second question is more complicated. Marianne had already

finish the part of model selection, but just need a LRT to confirm

her final models are the best one. By using LRT, the decision

should be non-significant with large p-value, then we can have no

rejection of the null hypothesis which is reduced (final) model.

However, it is totally conflicted because the result is

significant. After checking the original SAS code, there is no

problem as well. However, Mark said, based on Marianne study

design, she needs to keep two important variables in this model

whatever it is significant or non-significant. After including the

two variables in this model, the conflict was eliminated. But, I

was still wondering whether one of them is highly overlapped with

another one because they are all geographic variables and have

highly similarity. I dropped out a less important one and fit the

model again, the result looked better. From this problem, we can

understand that we need to know more about variables before model

fitting, then we will decrease confusion from that.

The two solutions had already been emailed to Marianne. Hope she

will feel useful.

### 目前日期文章：200602 (4)

- Feb 27 Mon 2006 05:03
## Consulting Case Study -- No.20060224

- Feb 08 Wed 2006 05:35
## Consulting Case Study -- No.20060207

June Cho, a Korean woman who is a postdoc in the School of Nursing in UNC-CH. I handled with her dissertation from 2004.DEC to 2005.MAY, and she graduated smoothly on 2005.JUL. Her husband is a professor in the School of Pharmacy. I guess they have been the U.S. citizens. After she graduated, she stay here to be her advisor's postdoc, and keep doing advanced research from her dissertation.

She wants to do a 2-way ANOVA to compare simple main effect in her current study. It's very easy, but she just needs my confirmation. I constructed a macro to her and she can just call this macro to fit all of her models (18 models). However, simple main effect is only used under the interaction term is significant. I only ran a model and the interaction term is significant, but I can predict not all of them have significant interaction terms. However, simple main effect is her only purpose of current research. How could we do it under non-significant interaction?

Regularly, I asked my supervisor, Mark. He said even though the interaction term is not significant, but we can still keep it in GLM model. Therefore, we con consist all results of simple main effect from those 18 models because all of them include interaction term. This could be a more suitable conclusion in discussion section.

-----

- Feb 06 Mon 2006 02:58
## The test of multivariate normal distribution

In some statistical analysis, we'd like to test assumption of

normality in the beginning before analyzing. In univariate case, we

all understand Q-Q plot and some K-S statistic can be used to

assess normality. However, in multivariate normal distribution, how

about that?

Mardia's statistic is a test for multivariate normality. Based on

functions of skewness and kurtosis, Mardia's PK should be less than

3 to assume the assumption of multivariate normality is met. But,

whatever in SAS or SPSS, there is no easy way to use any statement

to perform it in any procedure.

In SAS, we need to use a macro procedure to calculate Mardia's PK

statistics. SAS Inc. released the codes on official website. Please

check the following link:

http://support.sas.com/ctx/samples/index.jsp?sid=480

Also, in SPSS, we need to use a macro to examine

bivariate/multivariate normality. Check it:

http://www.columbia.edu/~ld208/

-----

- Feb 03 Fri 2006 08:44
## Consulting Case Study -- No.20060203

Lindsey Austin, a master student (I guess) who works for a professor to be something (I am not sure whether she is a TA). Her professor requests her to analyze some records to see student's study ability. However, she is not good at statsitics, so she sent the data set to me.

The question is very easy: how to calculate the correlation between individual scores and GPA in reading, math, science, and fundamentals in some courses. The individual score variable is scale (0~100), but the GPA is ordinal (A+, A, A-,...., F).

In correlation analysis, there are three correlation coefficients we often use: Pearson, Kendall's tau, and Spearman. However, none of them are for the case of "scale vs ordinal".

I am wondering whether there are some special correlation coefficients that I don't know. I went to check SAS menu to see "PROC CORR", but there is no special correlation. My supervisor, Mark, even took his old handouts (because he also graduates from biostatics department in UNC-CH) to search for any evidence, but there is no way as well.

Finally, we conclude that, we can rank the individual score variable, and use Spearman correlation.

This is a pretty special case. I think there should be a specific correlation for this situation, but we haven't figure it out. If so, I will show here.