Liars, *d Liars and Statisticians

The Truth About Statisticians

Statistics provide a way to capture a more or less accurate picture of the world. Any "lying" that is done is done by (1) those statisticians who are unprofessional enough to allow the funding organization or individual to influence their research design or (2) the funding organization which selects the numbers which support their position and ignore those which refute it. Consumers of statistics need to know enough to determine whether the picture being painted is accurate or warped by the goals/motivations of those who paid for the statistician to collect the data. A free society deserves accurate, unbaised information, the statistics themselves. In perfectly applied statistical techniques, "It is what it is."

Statistics come in two forms:
1. Descriptive statistics. These do exactly what the name implies. They describe. The most familiar descriptive statistics for most Americans is the Census data.
This is a description of the population of the US.
2. Inferential statistics. This is the use of statistical methodology to INFER things Market research infers the coming behavior of consumers from past behavior.

Some people are bothered by the use of sampling procedures. "How," they ask, "can the picture be accurate if only part of the population is surveyed?" The answer is that any statistical data is reported with a confidence interval and or a margin or error. Confidence intervals are usually reported as percentages, as in "14.9% of the American population claim to be atheists, agnostics,or unaffiliated with a specific religion, with a confidence interval of 90%." What this means, in the simplest terms, s is that if you took 100 samples of the exact same size, with each and every American having an equal chance of being included, 90 of the samples would find the same percentage of Americans reporting themselves to have that religious identification. The other 10 samples would report different percentages, some higher, some lower. Some of those might be higher or lower by several percentage points. Others will be close. This is where a margin of error comes in. These are usually reported as "plus or minus" (+/-) some percentage points, as in "14.9% +/- 2.5%) The way to read this is "12.4% to 17.4%."

Sampling is used due to limitations of time and money. Consider, for example, a political poll regarding which candidate voters prefer. What good would it do to poll every registered voter, or every registered voter in one political party, if the time to poll them all took longer than the time between today and the election? It would also be very expensive to locate and deliver a poll form, much less to have a poll-taker visit, to each address? Prohibitively expensive!

Data, to be useful, must be relevant, sufficiently specific, and timely, as well as accurate.

Good Articles on Statistics

Six Ways to Separate Lies From Statistics