Measurement Operations

Normalization explained

Understand how to normalize your data

A blue circle on the left and a yellow square on the right combine to make a green, rounded-corner square in the middle.

What is normalization?1

You’ve identified your big why, its indicators, and the datasets that support those indicators. Now you’re looking at a data table (or tables) full of mismatched units of measure. This isn’t a problem. A big why has many facets and can be measured in more than one way– that’s exactly what a compiled indicator does.

In this section, we will talk about math and some formulas. This is important because we want to be able to compare apples to apples. To be able to do this, we need all the data points to be in the

same unit of measure. We’ll break this down step-by-step for you.

The process of converting data points to a common unit is called normalization. There are many ways to normalize your data, and the min-max normalization method works well for our purposes to create a compiled indicator.

Here’s how it works:

Let’s walk through an example below.

Set the stage

Imagine that your big why is public engagement. Two indicators of public engagement are visitor satisfaction and number of visitors. You have data points for these indicators: several years of survey satisfaction scores and visitor counts. The datasets look like this.

Sample survey satisfaction scores

The table below is a small sample from thousands of survey responses you’ve received over the last five years. Each row in the chart represents an individual response from a single person, labeled with the year it was submitted. It shows that two individuals responded in 2023, 2021, and 2020, and one individual responded in 2022 and 2019. (In real life, this chart would be thousands of lines long, we’re just pretending for the sake of simplicity that the survey only got one or two responses per year.)

Survey YearSatisfaction (1-5, where 1=Very unsatisfied and 5=Very satisfied)
20235
20232
20223
20214
20211
20205
20203
20194

Visitor counts

The table below is a count of the visitors to an exhibition over the last five years.

YearVisitors
20232,602,981
20222,047,293
20211,109,098
20201,578,391
20193,495,209

It doesn’t make sense to squish together a five-point survey score and a quantity of people, so we’ll need to normalize our data. Here is what the min-max normalization formula means.

An equation reading: X super 1 equals x minus minimum paren x close paren over max paren x close paren minus min paren x close paren.

As you may remember from high school math, subtract the bits on the top and bottom of the fraction first, and then divide. The normalized value of x equals (x-min(x)) divided by (max(x)-min(x)).

To check your work, make sure that x’ isn’t less than 0 or greater than 1.

Gather materials

You’ll need access to spreadsheet software, but you can also do this manually.

To normalize the survey satisfaction scores, start by finding the largest and smallest numbers in the column:

Survey YearSatisfaction (1-5, where 1=Very unsatisfied and 5=Very satisfied)
20235 Maximum
20232
20223
20214
20211 Minimum
20205 Maximum
20203
20194

In the satisfaction score column above, the year 2021 stands as the minimum (1), and both 2023 and 2020 are the maximum (5), thus we can write min(x)=1 and max(x)=5

Next, we’ll need to “normalize” all the numbers that are in-between the min and max. In this example, that means we need to normalize 2, 3, and 4. Let’s start by normalizing the score of 2 from 2023.

To begin, we say that x=2. The formula is:

An equation reading: X super 1 equals x minus minimum paren x close paren over max paren x close paren minus min paren x close paren.

When you substitute 2 for x, 1 for min(x), and 5 for max(x), the normalized value of x equals 1/4.

Check your work: 0.25 isn’t less than 0 or greater than 1, so it’s a valid normalized data point. You can apply the same calculation to all the other scores in the table:

Survey YearSatisfactionMin Max MethodNormalized Satisfaction Score (x’)
202355-1/5-11
202322-1/5-1.25
202233-1/5-1.5
202144-1/5-1.75
202111-1/5-10
202055-1/5-11
202033-1/5-1.5
201944-1/5-1.75

This is a very small example, and real survey datasets can contain thousands of rows. Instead of normalizing every score by hand, you can write a formula in Excel or Google Sheets.

In a spreadsheet, use the formula =(B2-min(B:B))/(max(B:B)-min(B:B))

Copy the formula to all the rows in the dataset. You’ve successfully normalized the survey dataset.

Now, do the same for the “visitor count” dataset. Assuming that your data is in column C, do that quickly with this spreadsheet formula: =(C2-min(C:C))/(max(C:C)-min(C:C))

Copy the formula to the other rows:

Now the visitor count and the satisfaction scores are both on a scale between 0 and 1. You can combine the two indicators mathematically without the mismatched units of measure getting in the way.

Since the visitor data is reported by year, calculate the average satisfaction score for each year using the normalized values.

Tool tip: The AVERAGEIF function finds all the rows where column I=B2 (the two year columns match) and then averages those rows’ values from column K (normalized satisfaction score). AVERAGEIF works in both Excel and Google Sheets, and it’s a useful tool when you need to aggregate data by year, state, agency, etc.

Finally, calculate the composite indicator for public engagement by averaging the normalized visitors and average survey score (normalized) for each year.

You have successfully turned two mismatched datasets into a single composite indicator representing multiple facets of your big why.

Footnotes

  1. This section was authored by Katherine Petway with Ana Monroe serving as editor.

Up Next

Buy PDFs Book a call Share feedback