July 30, 2025

How a Duplicate Lead Issue Led to Thousands in Lost Ad Spend: Lessons in Data Quality

In the world of data engineering, even the smallest oversight can have outsized consequences. This was made painfully clear during an ad campaign at a company, where a simple data quality issue—duplicate leads—resulted in a significant financial loss. While the mistake seemed minor at first glance, it serves as a powerful reminder of the critical role that proper data preparation and validation play in business success.


The Cost of Ignoring Data Quality

The story begins with our internal database, which was specifically designed to track the performance of advertising campaigns. As a data engineer, I was tasked with maintaining the integrity and accuracy of this data.

However, a failure in our data pipeline allowed duplicate leads to slip through undetected. These inflated lead counts misled our business stakeholders into believing the campaign was performing better than it actually was. As a result, they increased ad spending based on faulty assumptions—a decision that cost the company thousands of dollars (Garzon, 2025).

The error wasn’t due to a major system failure or a malicious attack. It was simply a case of poorly managed data joins and a lack of automated quality checks. And unfortunately, this kind of problem is more common than many organizations realize.


Where It Went Wrong: Inner Joins and Duplicate Records

One of the most frequent culprits behind duplicate data is improper use of SQL joins, particularly inner joins. When joining two tables, it’s essential to think carefully about the cardinality of each relationship.

For example, if Table A has one record per user and Table B has multiple actions per user, an inner join on user_id could return multiple rows for each user—introducing duplicates if the context isn’t well-understood. This seemingly small oversight can wreak havoc downstream, especially when metrics like leads or conversions are aggregated.


Building a Culture of Data Quality

While mistakes happen, they can be mitigated—or entirely prevented—with the right processes and tooling. Here are a few key strategies that could have prevented our costly error:

1. Automated Data Quality Checks

Set up recurring jobs to monitor critical aspects of your data, such as:

  • NULL value checks on key columns
  • Date range validations to ensure freshness
  • Duplicate detection in key tables like leads or conversions

These checks can alert teams to issues early, before they impact decisions.

2. Clear Data Ownership

Every dataset should have a clearly defined owner responsible for its quality and correctness. When issues arise, this makes it easy to identify who needs to investigate and resolve them.

3. Definitions of “Good Data”

It's not enough to say “our data should be clean.” Each organization should define what that means in concrete terms. For instance:

  • No duplicate records in the leads table
  • All email fields must match a valid pattern
  • Dates must fall within active campaign timelines

Establishing and communicating these expectations helps reduce ambiguity and makes quality enforcement easier (Deur, 2024).


Final Thoughts

This experience reinforced a hard truth: data quality is not optional. It’s a foundational element that can make or break business decisions. As data professionals, it’s our job not just to build robust pipelines, but to ensure the data flowing through them is trustworthy.

A single misstep—like an unnoticed duplicate—can ripple through reports, dashboards, and executive decisions. But with proactive monitoring, thoughtful query design, and clear ownership, data quality issues can be caught early—and costly mistakes can be avoided.

If there’s one takeaway from this experience, it’s this: treat data quality with the same seriousness as system uptime or security. The financial consequences are just as real.


References:

  • Garzon, C. (2025, January 18). How poor data quality led to a $1 million loss – Lessons learned. Data Engineer Academy. Link
  • Deur, P. (2024, January 10). Why companies need to address bad data immediately. Forbes Technology Council. Link