M1 - Descriptive Statistics
In this module, we will cover:
Objectives
Section titled “Objectives”By the end of this module, you will be able to:
- Calculate some descriptive statistics (Mean, Median, Percentiles, Variance and Standard Deviation) and construct a histogram given a set of univariate data.
- Understand the difference between measures of location and measures of variability.
- Construct a scatter plot of bivariate data and a box plot of univariate data.
1.1 Populations, Sample, and Processes
Section titled “1.1 Populations, Sample, and Processes”1.2 Pictorial and Tabular Methods in Descriptive Statistics
Section titled “1.2 Pictorial and Tabular Methods in Descriptive Statistics”1.3 Measures of Location
Section titled “1.3 Measures of Location”1.4 Measures of Variability
Section titled “1.4 Measures of Variability”1.5 Excel
Section titled “1.5 Excel”Mean
=AVERAGE(select all data points)
Variability
- x - x.bar
- tip: f4 to lock cell
if you were to add all the differences (values of (x - x.bar) ) together -> you will get a number very close to zero
- =SUM(x-x.bar)
so then we square each difference (x-x.bar )
- = (x-x.bar ^2)
Final Calculation: = sum ((x-x.bar)^2) / (count(values)-1) Ex. =SUM(C2:C29)/(COUNT(C2:C29)-1)
OR: =VAR(select all data points)
- =VAR.P() for population
- =VAR.S() for sample
- gives the same values as =VAR()
these are different because when you’re calulcating the variance on the population, you do not need to calculate the mean. The mean is no longer an estimate, it is known (the true population mean), therefore there is not that bias. It did not need to be corrected by the n-1
Note: 99% of the time, will be working with sample statistics
Standard Deviation
=SQRT(Variance)
OR =STDEV(select all data points)
Summary
Section titled “Summary”Fundamental Concepts
Section titled “Fundamental Concepts”Data: Observations (measurements, counts, or categories) collected for study
Population: The entire collection of elements (individuals, objects, or measurements) under study
Sample: A subset of the population selected for analysis.
Variable: A characteristic of an element that can assume different values
Observation (Measurement): The value of a variable for a particular element.
Quantitative Data: Measurements expressed numerically (counts or amounts)
Qualitative (Categorical) Data: Measurements classified by labels or categories
Univariate Data: Data consisting of one variable measured on each element
Bivariate Data: Data consisting of two variables measured on each element
Multivariate Data: Data consisting of more than two variables measured on each element
Graphical Methods (Displaying Data)
Section titled “Graphical Methods (Displaying Data)”Frequency Distribution: A tabular summary of data showing the number (frequency) of observations in each category or interval.
Relative Frequency: The proportion of the total number of observations falling into a class
Histogram: A bar-type graph showing frequencies (or relative frequencies) for classes of quantitative data; adjacent bars touch.
Stem-and-Leaf Display: A table where each observation is split into a “stem” (leading digits) and a “leaf” (final digit), showing the distribution while preserving data values.
Dotplot: A simple display placing a dot above a number line for each data value, stacking dots for repeated values.
Boxplot (or Box-and-Whisker Plot): A graphical summary based on the five-number summary (min, Q1, median, Q3, max); shows spread, skewness, and outliers.
Scatterplot: A plot of paired bivariate data points on an xy-plane, useful for studying relationships between two variables.
Populations and Samples
Section titled “Populations and Samples”Pictorial and Tabular Methods
Section titled “Pictorial and Tabular Methods”Measures of Location
Section titled “Measures of Location”Measures of Variability
Section titled “Measures of Variability”- Read about how-to guides in the Diátaxis framework
Euler: .