There is no difference in the mean number of safety violations between site A and site B over the 1-year period.
The stated alternative hypothesis, abbreviated Ha, is the converse of this and, assuming a two-tailed test, would be:
Ha: There is a difference in the mean number of safety violations between site A and site B over the 1-year period.
The analysis we perform will allow us to either reject or fail to reject our null hypothesis. The rule here is that we never accept a hypothesis. Why? Because we can never be 100% certain what the relationship between two things is “in reality” at a given point in time, for reasons stated earlier in this section. Instead, we use hypothesis testing and statistics to make probabilistic inference into the relationship between two sets of measured data or observations. Interpreting our hypotheses now requires the use of statistics, and also a brief introduction to theoretical probability distributions, otherwise thought of as why we can be certain we are at least partially certain.
Probability Distributions
If someone were to ask you what the probability of flipping a normal coin and having it come up heads, you would no doubt say that it is a 50/50 chance, or 50% of the time. Yet you would also likely agree that it is quite possible that you could flip a coin and heads would come up three times in a row. How can this be? Two reasons. One is that each coin flip is not dependent on the previous one. There are two sides of the coin, so you only have two possible outcomes. Each time you flip, they are equally likely to come up (if the coin is balanced and not a trick coin). The second is that we know if you continue to flip over and over again, the number of heads and the number of tails will start to equal out. In statistical language, we would say the probability of heads grows closer to 0.5 as your n (number of flips) increases. Suffice it to say that flipping a coin has a known probability. Could we observe 37 heads in a row? Sure, but it is highly unlikely.
Most phenomena in the world have a distribution of measurement, whether height, weight, income, hair length, etc. Consider height. There are a range of heights of individuals throughout the world. Some are quite tall, and others are not. If, for example, we see someone who is 8 feet tall, we might think that it is unusual, but not impossible. (Seeing is believing.) But how do we test this statistically?
Here we offer a non-statistical explanation of a statistical occurrence. As we stated before, at a point in time, all phenomena are theoretically measurable. If we are examining a data element that is continuous in nature, such as height, then at a point in time there is also a “true” mean of the observed data—right now there is a “true” mean height of all people in the world. Similarly, if we were to measure all persons, there would also be a “true” standard deviation around that mean. Some measurements will be close to the mean and others further away. We would expect that observations that were further from the mean would be less likely to occur, as with our 8-foot friend. From statistics we know how likely certain data will be to occur in relation to its mean by measuring how far those observations are from the mean in units of standard deviation. The reason for this is that many types of data are distributed normally, or in a fashion in which there is a mean and a symmetrical distribution of values on either side in the shape of a bell curve ( Figure 2-3A ). For example, if we were to know that the “true” mean of heights in the United States for men is 68 inches, and the “true” standard deviation is 2 inches, someone who is 8 feet tall (96 inches) would be 14 standard deviations above the mean or (96–68)/2. And because all data can be examined by how far in standard deviations they are from the mean, we can construct a theoretical normal distribution in which the mean is zero and the area under the curve represents units of standard deviation, called z-values. Why is the mean zero? If the distances under the curve are measures of standard deviation, then how many standard deviations away from the mean is the mean? Zero. Not only does this allow us to assess the probability of occurrence for certain data, it allows us to compare any type of data because the units are the same (standard