As a scholar-practitioner, it is important for you to understand that just because a hypothesis test indicates a relationship exists between an intervention and an outcome, there is a difference between groups, or there is a correlation between two constructs, it does not always provide a default measure for its importance. Although relationships are significant, they can be very minute relationships, very small differences, or very weak correlations. In the end, we need to ask whether the relationships or differences observed are large enough that we should make some practical change in policy or practice.
For this Discussion, you will explore statistical significance and meaningfulness.
To prepare for this Discussion:
- Review the Learning Resources related to hypothesis testing, meaningfulness, and statistical significance.
- Review Magnussonâ€™s web blog found in the Learning Resources to further your visualization and understanding of statistical power and significance testing.
- Review the American Statistical Associationâ€™s press release and consider the misconceptions and misuse of p-values.
- Consider the scenario:
- A research paper claims a meaningful contribution to the literature based on finding statistically significant relationships between predictor and response variables. In the footnotes, you see the following statement, â€œgiven this research was exploratory in nature, traditional levels of significance to reject the null hypotheses were relaxed to the .10 level.â€
BY DAY 3
Post your response to the scenario in which you critically evaluate this footnote. As a reader/reviewer, what response would you provide to the authors about this footnote?
Scenarios are listed as follows:
1. The p-value was slightly above conventional threshold, but was described as â€œrapidly approaching significanceâ€ (i.e., p =.06). An independent samples t test was used to determine whether student satisfaction levels in a quantitative reasoning course differed between the traditional classroom and on-line environments. The samples consisted of students in four face-to-face classes at a traditional state university (n = 65) and four online classes offered at the same university (n = 69). Students reported their level of satisfaction on a fivepoint scale, with higher values indicating higher levels of satisfaction. Since the study was exploratory in nature, levels of significance were relaxed to the .10 level. The test was significant t(132) = 1.8, p = .074, wherein students in the face-to-face class reported lower levels of satisfaction (M = 3.39, SD = 1.8) than did those in the online sections (M = 3.89, SD = 1.4). We therefore conclude that on average, students in online quantitative reasoning classes have higher levels of satisfaction. The results of this study are significant because they provide educators with evidence of what medium works better in producing quantitatively knowledgeable practitioners. 2. A results report that does not find any effect and also has small sample size (possibly no effect detected due to lack of power). A one-way analysis of variance was used to test whether a relationship exists between educational attainment and race. The dependent variable of education was measured as number of years of education completed. The race factor had three attributes of European American (n = 36), African American (n = 23) and Hispanic (n = 18). Descriptive statistics indicate that on average, European Americans have higher levels of education (M = 16.4, SD = 4.6), with African Americans slightly trailing (M = 15.5, SD = 6.8) and Hispanics having on average lower levels of educational attainment (M = 13.3, SD = 6.1). The ANOVA was not significant F (2,74) = 1.789, p = .175, indicating there are no differences in educational attainment across these three races in the population. The results of this study are significant because they shed light on the current social conversation about inequality. 3. Statistical significance is found in a study, but the effect in reality is very small (i.e., there was a very minor difference in attitude between men and women). Were the results meaningful? An independent samples t test was conducted to determine whether differences exist between men and women on cultural competency scores. The samples consisted of 663 women and 650 men taken from a convenience sample of public, private, and non-profit organizations. Each participant was administered an instrument that measured his or her current levels of cultural competency. The Â© 2016 Laureate Education, Inc. Page 2 of 2 cultural competency score ranges from 0 to 10, with higher scores indicating higher levels of cultural competency. The descriptive statistics indicate women have higher levels of cultural competency (M = 9.2, SD = 3.2) than men (M = 8.9, SD = 2.1). The results were significant t (1311) = 2.0, p <.05, indicating that women are more culturally competent than are men. These results tell us that gender-specific interventions targeted toward men may assist in bolstering cultural competency. 4. A study has results that seem fine, but there is no clear association to social change. What is missing? A correlation test was conducted to determine whether a relationship exists between level of income and job satisfaction. The sample consisted of 432 employees equally represented across public, private, and non-profit sectors. The results of the test demonstrate a strong positive correlation between the two variables, r =.87, p < .01, showing that as level of income increases, job satisfaction increases as well.
Press release as follows:
AMERICAN STATISTICAL ASSOCIATION RELEASES STATEMENT ON STATISTICAL SIGNIFICANCE AND P-VALUES Provides Principles to Improve the Conduct and Interpretation of Quantitative Science March 7, 2016 The American Statistical Association (ASA) has released a â€œStatement on Statistical Significance and P-Valuesâ€ with six principles underlying the proper use and interpretation of the p-value [http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN]. The ASA releases this guidance on p-values to improve the conduct and interpretation of quantitative science and inform the growing emphasis on reproducibility of science research. The statement also notes that the increased quantification of scientific research and a proliferation of large, complex data sets has expanded the scope for statistics and the importance of appropriately chosen techniques, properly conducted analyses, and correct interpretation. Good statistical practice is an essential component of good scientific practice, the statement observes, and such practice â€œemphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean.â€ â€œThe p-value was never intended to be a substitute for scientific reasoning,â€ said Ron Wasserstein, the ASAâ€™s executive director. â€œWell-reasoned statistical arguments contain much more than the value of a single number and whether that number exceeds an arbitrary threshold. The ASA statement is intended to steer research into a â€˜post p<0.05 era.â€™â€ â€œOver time it appears the p-value has become a gatekeeper for whether work is publishable, at least in some fields,â€ said Jessica Utts, ASA president. â€œThis apparent editorial bias leads to the â€˜file-drawer effect,â€™ in which research with statistically significant outcomes are much more likely to get published, while other work that might well be just as important scientifically is never seen in print. It also leads to practices called by such names as â€˜p-hackingâ€™ and â€˜data dredgingâ€™ that emphasize the search for small p-values over other statistical and scientific reasoning.â€ The statementâ€™s six principles, many of which address misconceptions and misuse of the pvalue, are the following: 1. P-values can indicate how incompatible the data are with a specified statistical model. 2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. 3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. 4. Proper inference requires full reporting and transparency. 5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. 6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. The statement has short paragraphs elaborating on each principle. In light of misuses of and misconceptions concerning p-values, the statement notes that statisticians often supplement or even replace p-values with other approaches. These include methods â€œthat emphasize estimation over testing such as confidence, credibility, or prediction intervals; Bayesian methods; alternative measures of evidence such as likelihood ratios or Bayes factors; and other approaches such as decision-theoretic modeling and false discovery rates.â€ â€œThe contents of the ASA statement and the reasoning behind it are not newâ€”statisticians and other scientists have been writing on the topic for decades,â€ Utts said. â€œBut this is the first time that the community of statisticians, as represented by the ASA Board of Directors, has issued a statement to address these issues.â€ â€œThe issues involved in statistical inference are difficult because inference itself is challenging,â€ Wasserstein said. He noted that more than a dozen discussion papers are being published in the ASA journal The American Statistician with the statement to provide more perspective on this broad and complex topic. â€œWhat we hope will follow is a broad discussion across the scientific community that leads to a more nuanced approach to interpreting, communicating, and using the results of statistical methods in research.â€