WIKIMEDIA, SAM DAOUDStatistics are the basis of scientific data analysis, and with the flood of data coming from new genomics technologies, biostatistics has truly become an inseparable part of modern science. Nevertheless, a fundamental statistical technique—correlation analysis, which measures the relationship between two variables—is often employed incorrectly, leading to erroneous conclusions about the true nature of the relationship between the studied phenomena.
The primary task of correlation analysis is to test for a relationship, or agreement, between two variables of interest—say smoking and higher incidence of lung cancer. Furthermore, provided that the survey was carried out on a sufficiently large sample, a rough assessment of the degree of correlation between the observed phenomena, quantified as the linear correlation coefficient, can be performed.
This coefficient must then be interpreted and critically analyzed, as correlation analysis does not aim at explaining the nature of the quantitative agreement—in other words, the causal relationship between the two variables. In addition to assuming causality, researchers commonly fall victim to two other misconceptions: inferring the nature of the individual based on the group findings, and thinking that a correlation of zero implies independence. ...