Association (statistics)

Jump to navigation Jump to search


In statistics, an association comes from two variables that are related and is often confused with causation though association does not imply a causal relationship. In formal statistics, correlation and association are related but not entirely overlapping concepts.

For example, the United Nations studied governmental failure—when governments fall or are overthrown and found that the best indicator of a government about to fall was the infant mortality rate. Causally, it may seem as though the dying children cause the government to fall but their mere association does not imply this.

Another example is the rates of ice cream consumption and murder, which exhibit a strong positive association. Which causes which; does eating ice cream cause murder or does murder make people eat ice cream? The answer is neither—increases in both ice cream consumption and murder are associated with hot weather.

Another perspective on the relationship between association and causality is that association does not imply a direct causal connection between the associated variables. If, however, association is nonrandom (i.e., not due purely to chance), then it implies that some causal mechanism is operative. Often, the nature of the causal mechanism underlying an association is the joint influence of one or more common causes operating on the variables in question. For example, both the increase in ice cream consumption and murder may occur during warm weather (a conclusion that would require further information to confirm or fail to confirm). If this were so, then the occurrence of the association between ice cream consumption and murder would be a manifestation of causation, but not in the simple, linear fashion that one initially might be tempted to assume. Associations of this sort, involving a third variable that jointly causes the association between the two original variables, is often termed "spurious association."

Several tests can be used to determine association. Computation of any of several versions of the correlation coefficient, the P test, t-test, and chi-squared test are the most common.

See also

Template:WikiDoc Sources