Search This Blog

Wednesday, February 8, 2012

The Power in a Sample

Since my last assignment, which included the development of a sampling plan, determination of a sample size, and using power to estimate a sample size, I became interested in learning more about power (or a capacity to detect a difference) theories, and its best practices. The researcher's "real goal is to design a high-quality study" (Lenth, 2001, p. 1), and an ethical high quality study does not include using "shortcuts related to power and sample size" (Lenth, 2001, p. 1). Finding the right variances according to Lenth (2001), requires an understanding that "power functions usually involve parameters unrelated to the hypotheses…they often involve one or more variances" (p. 5). Lenth (2001) reported that "sample size is but one of several quality characteristics of a statistical study" (p. 6). "If the sample size is too large, then the study is under-powered" (Lenth, 2001, p. 6). "Sample size may be smaller than necessary so the planned study is over-powered" (Lenth, 2001, p. 6).

Lenth (2001) wrote that "eliciting meaningful effect sizes and estimating error variances constitute two potentially difficult obstacles in addressing sample-size problems" (p. 8). The choice of "instrumentation has a huge effect on the results, and so it should affect your sample-size calculations" (Lenth, 2001, p. 8) standardized-effect-size goals are misused in many situations" (Lenth, 2001, p. 8). However, using a "standardized effect-size measure (such as represented by) simple linear regression of a variable y on another variable x, the correlation (or squared correlation) between x and y can serve as a standardized effect-size measure" (Lenth, 2001, p. 8).

This measurement combines three quantities that must be considered separately rather than being lumped together into a single R2 measure: "the slope of the line, the error variance, and the variance of the x values, which are, respectively, absolute effect size, variance, and experimental design" (Lenth, 2001, p. 8). According to Lenth (2001), there is strong statistical evidence supporting the belief that H0 is true when the test is non-significant but the observed power is high. However, since the "observed power increases as the P value decreases, high observed power constitutes evidence against the null hypothesis" (p. 9).

"Sample-size problems are context-dependent" (Lenth, 2001, p. 10). Increasing the sample size to explain for uncertainty depends upon practical and ethical criteria. Sample size represents a study design's quality (Lenth, 2001). Lenth (2001) reported that in addition to the "power approach discussed here, there are other respectable approaches to sample-size planning, including Bayesian ones and frequentist methods that focus on estimation rather than testing" (p. 10). "While technically different, those approaches also require care in considering scientific goals, incorporating pilot data, ethics, and study design" (Lenth, 2001, p. 10).

Luh and Guo (2010) reported that literature was minimal regarding the subject of "allocating participants into different treatment groups to achieve the desired power when one group is fixed" (p. 14). Focusing on determining the sample size "for the second group for the two-sample k" (p. 14) was the key subject for Luh and Guo (2010) who determined that "the sample size needed is less than that of the traditional B.L. Welch test especially for nonnormal distributions. " (p. 14). Simulation results also demonstrate the accuracy of the proposed formula in terms of Type I error and statistical power.

Research designs used for data analysis include the trimmed mean method, which for non-normal distributions provides a robust estimate (Luh and Guo, 2010). Regarding heterogeneous variance, many researchers reported that the trimmed mean method was frequently used for analyzing real data (Luh and Guo, 2010). After devising and integrating new formulations during simulated tests, Luh and Guo (2010) wrote that sample size determination is one of the key features approached by researchers during the investigative planning phase since "underestimation will reduce the power to detect an experiment effect" (p. 22). Knowing that "power is a function of the significance level, the true alternative hypothesis, the sample size, and the particular test used" (Luh and Guo, 2010, p. 22).

When test's assumptions are unmet, the trimmed mean method and the calculation of its corresponding sample size provide robust statistical results. Luh and Guo's (2010) study used "the trimmed mean method for variance heterogeneity, had a fixed number of subjects for one group at the time of planning, and derived the sample size determination for another group" (p. 22). Simulation using the proposed method resulted in a "consistent pattern" (Luh and Guo, 2010, p. 22), and eliminated the potential for a Type I error because the desired power was controlled (Luh and Guo, 2010). Advice for researchers includes a consideration of "a range of population parameters because the adequacy of the sample size depends on the accuracy of the initial specifications of the assumed parameters in the population" (Luh and Guo, 2010, p. 22). Unfortunately, current statistical software packages are not able to analyze some unique situations but Luh and Guo's (2010) recommendations fill that gap as well as provides a good approximation.

Houser (2007) discusses the relationship between power and sample size. The sample size must allow the detection of the effects of one or more items on other items in a study, and give the researcher confidence in the analytical results. Inadequate power that does not detect outcomes can result in a researcher's assessment that the effects of one or more items on other items in a study were not successful (a Type II error).

Unsuccessful results that later determine that the treatment was effective (successful) indicate that the "sample size was inadequate" (Houser, 2007, p. 1). As samples get larger, and the test's results more precise, it becomes easier to detect inconsequential clinically. Guidelines for determing power include: (1) planning the statistical test, (2) determining a detectable effect size, (3) an acceptable power level, and (4) the sample's particular attributes (Houser, 2007). Understanding and applying the information above gains an important place of note in the development of my first dissertation.


References:

Houser, J. (2007). How many are enough? Statistical power analysis and sample size estimation in clinical research. Journal of Clinical Research Best Practices, 3(3), 1-4. Retrieved from http://firstclinical.com/journal/2007/0703_Power.pdf

Lenth, R.V. (2001, March). Some practical guidelines for effective sample-size determination. Retrieved from http://www.stat.uiowa.edu/techrep/tr303.pdf

Luh, W., & Guo, J. (2009, Fall). The sample size needed for the trimmed t test when one group size is fixed. Journal of Experimental Education, 78(1), 14-25. Retrieved from EBSCOHost.

No comments:

Post a Comment