Sunday, August 22, 2010

Whats the difference between inferential and descriptive statistics?

I mean, in English and not text book terminology, what is the difference?

Whats the difference between inferential and descriptive statistics?
Descriptive statistics are, as the name implies, used to describe the characteristics of the population from which your data are drawn. The mean, standard deviation, median, 25th percentile, 75th percentile, etc. are descriptive statistics.





Inferential statistics relates to hypothesis testing. Usually, you have a null hypothesis and you want to assess whether your data are inconsistent with your hypothesis. A t-test, for example, may be used for inferential statistics, as can confidence intervals.





As an example, I am a biostatistician and my work concerns clinical trials, in which we test drugs or other treatments on people. As a simple example, we might design a trial to test and active treatment (A) against a placebo (B). Subjects who qualify for the trial are randomized to receive either A or B, but the placebo is made to look like the active treatment, so usually neither the subject nor their physician know what they are getting.





The null hypothesis in such a clinical trial is that A and B are no different in their effect on a specific outcome. We then test whether our data are consistent with the null hypothesis of no difference. If we let Y(A) = the mean outcome for those randomized to A and define Y(B) similarly, then a test statistic is of the form T = (Y(A) - Y(B)) / SE where SE is the standard error (measure of precision) of the numerator. If T is large (usually larger than 1.96 or so, depending on sample size), then we reject the hypothesis of no treatment difference. We make the inference then that A is a better treatment than B for this particular outcome.





From the value of T, you can compute something called the p-value, which is the probability that you would have seen a value as large as your observed T just by chance if the null hypothesis is true. The smaller p is, the more evidence you have that the null hypothesis may be false. Usually, p %26lt; 0.05 is considered sufficient evidence to reject the null hypothesis.





Now, even if the true difference between A and B is small in clinical terms, you can make T large by designing a very large trial. So, there are two concepts to take into account: statistical significance and clinical significance. Many people make the mistake of relying only on statistical significance (p %26lt;= 0.05) to decide whether a new active treatment is worthwhile compared with placebo. They should also take account, however, of clinical significance, by which I mean asking whether the observed difference Y(A) - Y(B) is large enough that a doctor would want to adopt the new treatment for his/her patients. The statistical significance is purely inferential, but you can only assess clinical significance through descriptive statistics.





You really need to look at both, however, to assess the results of a trial. If you have statistical significance, then you can feel confident that there is a "real" difference between the active treatement and placebo. If you have clinical significance from assessing the magnitude of your observed treatment difference, then you would feel confident that the new drug has meaningful advantages over the placebo.





Sometimes, B is not placebo, but an active drug, but all of the above still is relevant.


No comments:

Post a Comment