Missing Data, Sample Bias, and Why Representation Matters

Why Data Isn’t Always “Fair”

Data is often thought of as objective — just numbers, facts, and charts. But in reality, data is shaped by decisions:

  • Who we choose to ask

  • What questions we ask

  • What we ignore

  • And how we interpret the answers

These decisions can unintentionally (or intentionally) leave people out — especially those from underrepresented or marginalized communities.

If we only ask certain people certain questions, we’ll only get part of the truth. That’s not just a math problem — it’s a justice issue.

Missing Data & Missing Voices

Sometimes, certain people are not represented in a dataset at all. This is known as missing data. It can happen for many reasons:

  • The data collectors didn’t think to include them

  • They were excluded due to technical barriers (like language, internet, or access)

  • They chose not to participate because they didn’t feel safe, respected, or seen

When voices are missing from data, entire stories and perspectives disappear from the conversation. This makes the final conclusions unfair, misleading, or incomplete.

For example, a national survey about teen mental health only collects responses from schools with reliable Wi-Fi and English-language instruction. This leaves out:

  • Rural students without internet access

  • Newcomer students who don’t speak English

  • Teens who skip school due to trauma or instability

Even if the data shows “most teens are fine'“, it may be ignoring the very teens who are most at risk — meaning the data fails to inform the people who need it the most.

When the Data Doesn’t Reflect the Real World

Sample Bias

A sample is the group of people you study in order to understand a bigger population. But if your sample is too narrow, your results will be biased — and unequal experiences will be erased.

For example,

If a study on teen friendships only includes students from elite private schools, the results might show low rates of bullying and strong social networks.

But, those patterns could be very different for public school students, homeschooled students, or students in alternative schools.

Biased data leads to biased conclusions.

Biased conclusions lead to biased decisions — about policy, mental health, technology, and more.

More Than Just “Counting People”

Representational Equity

Sometimes people are technically “included” in the data — but their experiences are minimized, miscategorized, or erased.

Consider this:

A survey includes 1,000 teens, and 10 of them are Native American. That’s technically “representation” — but if:

  • The survey doesn’t allow them to select their tribal affiliation

  • None of their stories appear in the summary

  • Their answers are grouped under “Other”

Representation

means people are counted.

… then their inclusion is performative, not meaningful.

Equity in data means making sure every voice is counted, heard, and visible in the analysis — not just squeezed into a chart.

Inclusion

means their stories are respected.

Equity

means we recognize that different groups may need different things in order to be seen fairly.

Download PDF
  • Add a short summary or a list of helpful resources here.