View Full Table | Close Full ViewTable 1.

Glossary of commonly used statistical and research design terms

 
Term Practical definition Why it is important to researchers
Bias Any factor that could systematically influence the study outcome away from the truth. If study results are biased, inferences should not be made from the study.
Blinding Ensuring that no one involved in describing or measuring study variables, care of the animals, or analysis of the data has knowledge of treatment exposure. Blinding is one of the most effective mechanisms to prevent bias in outcome assessment and is essential when subjective variables are the primary outcomes.
Confidence interval (CI) An estimation of the proportion of sample population intervals that would contain the true population a specified percent of the time. For a 95% CI, if a population was sampled 100 times, the 95% CI from each sample population would encompass the true population parameter 95% of the time. The larger the sample size, the more precision there is in the outcome estimated resulting in a smaller CI. Similar to the P-value, a CI provides information about the uncertainty surrounding an estimate of the numerical difference in an outcome of interest between sampled treatment or observation groups as a reflection of the true difference between populations. In contrast to a P-value, a CI also provides information about the probable magnitude of effect, which is helpful when considering the clinical relevance of the results.
Confounding A specific type of bias when a factor other than the treatment or observation factor of interest is associated with the study outcome but this factor is not evenly distributed between treatment or observation groups. When confounding is present (or possible) and not controlled, distinguishing treatment effects from the effect of the potential confounder is impossible, leading to an inability to draw firm conclusions from the research.
Experimental or observational unit The smallest independent physical unit that is assigned to a treatment (experimental study) or observed (observational study), and each experimental or observational unit must be able to receive a different treatment. The experimental or observational unit for each hypothesis must be correctly identified by the investigator to ensure that the study has adequate sample size (e.g., power) and that the statistics tests were properly performed. Livestock studies often have hierarchical data with animals nested within pens nested within buildings nested within farms or with repeated measurements of the same units being taken over time. These types of hierarchical data structures makes selection of the correct experimental or observation unit and the appropriate statistical test more challenging and may result in different experimental or observational units being appropriate for different hypothesis in the same study.
External validity The ability for study results to accurately be generalized to other populations. Research may be internally valid yet performed in a population much different from the population of clinical interest; therefore, extrapolating study results may not be possible.
Internal validity The study design is appropriate for the hypotheses and study variables while controlling for potential issues related to bias. If the study is not internally valid, conclusions based on results should not be made.
Interaction The effect of one variable on the outcome of interest is modified by the effect of another variable. Interactions are relatively common in biologic studies, and if present, they can influence interpretation of study outcomes.
Least squares mean The result of a statistical analysis to approximate the solution in a model fitting the outcome and adjusting for other variables in the model. Least squares means are calculated from a model that adjusts the estimated mean based on variables included in the model. This adjustment should result in a more accurate estimate of the population mean than a simple arithmetic mean of the sample data.
Multivariable analysis A statistical analysis that incorporates the relationship of more than 1 variable when evaluating the outcome of interest. Biologic systems are complex, and often, experimental or observational units are not completely independent. Multivariable analyses allow for evaluation of effects while adjusting for potential variables that may be confounding, resulting in more accurate estimate of effects.
Null hypothesis The starting assumption for most research; the assumption that there is no difference among treatment or observation groups. Because the initial assumption is no difference between treatment or observation groups, if the statistical tests fail to identify a difference, no real conclusions should be drawn from the findings (one can say only that the treatments were not statistically different at the magnitude observed with this study sample size). Failure to disprove the null hypothesis does not indicate that treatment or observation groups are the same, only that they did not statistically differ in this study.
Numerical differences The outcomes (e.g., mean, median, relative risk, odds ratio, proportion, etc.) of 2 treatment or observation groups differ, but the difference could be due to chance, bias, or true treatment or observation group differences. Study findings may be described as numerically different but the difference could be due solely to chance and biological variability. If a statistical difference was not identified in the presence of numerical differences between treatment or observation groups, this means either the sample size was too small to detect a true population difference of small magnitude or no difference exists. Conclusions should not be based on numerical differences alone.
P-value The result of a statistical test that reports the probability of an outcome difference as great as or greater than that described by the study data being incompatible with a specific model for the data (the model typically assumes that study populations are truly not different as well as making other assumptions). The P-value is used to determine statistical significance but can be interpreted only if the study is internally valid and the interpreter has knowledge about the study design and the topic being investigated. A P-value does not test for bias, study design, or the appropriateness of applying study results to other populations dissimilar to the population tested. In addition, it does not measure the probability that the hypothesis is true or the probability the data were produced by chance.
Pseudoreplication The error of treating multiple observations from the same experimental or observational unit as replications of independent experimental or observational units. Taking multiple samples from a single experimental or observational unit and treating them as independent samples can lead to the danger of concluding that a difference exists when that may not be true.
Randomization A common process of assigning experimental units to treatment groups used to prevent inadvertent bias based on selection criteria. Randomization is the primary mechanism to mitigate the danger of selection bias and confounding. Randomization attempts to prevent a factor outside the study criteria being present in unequal distributions among the treatment groups.
Sample size The number of experimental or observational units for each treatment or observation factor group within the study. Adequate sample size is based on the outcome of interest, the expected variability of the outcome, and the expected magnitude of effect of the treatment or observation factor on the outcome. Without adequate sample size, studies could be referred to as underpowered and are unlikely to identify true differences.
Statistically significant The results of a statistical test to compare results. The specific definition of the threshold of “significance” may vary among researchers, but significance (α) levels of P ≤ 0.05 and P ≤ 0.01 are commonly used. Denoting statistical significance describes the probability of observing a difference as large as or larger than that identified in the study with the current sample size if there were truly no differences between treatment or observation groups (see P-value). This designation does not differentiate between studies with and without true differences between study groups, nor does a statistically significant difference indicate a finding that is necessarily biologically meaningful, and findings should be interpreted accordingly. The α selected to be considered statistically significant should be influenced by the number of comparisons being made, the type of hypothesis (confirmatory vs. discovery) being tested, and expert knowledge of the study subject.
Type I error Based on the statistical analysis, concluding that a difference among treatment or observation groups exists when there is truly no difference. Even when treatment or observation groups do not truly differ, individual studies will identify statistical differences among treatment or observation groups. The frequency with which this error occurs compared with true parameter differences is based on the significance (α) level or the CI selected and the underlying probability that the treatment or observation groups are truly different (discovery vs. confirmatory research).
Type II error Based on the statistical analysis, failing to reject the null hypothesis of no difference between treatment or observation groups when a difference truly exists. Even when treatment or observation groups truly differ[, individual studies can fail to identify statistical differences among study groups. The frequency with which this error occurs is based on the power of the study (influenced by sample size and the selected beta or CI).