Dirty Data Tax
The Dirty Data Tax is the compounding cost of bad data — mislabeled, stale, duplicated, or unreconciled records — that is silently priced into every decision built on top of it. You never see a line item for it, but you pay it in wrong calls, missed opportunities, and the slow erosion of trust in your own numbers.
I named this in Dirt, Data and Decisions to capture why so many real estate and operating businesses fail to get value from analytics and AI. They invest in dashboards and models while the underlying data is a swamp — rent rolls that do not reconcile to the bank, comps that are mislabeled, the same property entered three different ways. The tools then produce precise-looking answers built on rotten inputs.
It is a tax because it is unavoidable until you fix the source, it scales with how much you rely on the data, and it is regressive in a specific way: the more advanced your tooling, the faster dirty data converts into confident errors.
A firm deploys an AI underwriting assistant to speed up deals. The model is excellent. But the rent roll it reads counts three concession-burdened units as full-rent, an expense category is double-booked, and two comps are duplicates of the same sale entered under different addresses. The AI does exactly what it is told and returns a clean, confident value — that is roughly 8% too high.
Nobody re-checks it, because the output looks authoritative. The firm overpays, and the loss never gets attributed to data quality; it gets attributed to “a bad deal.” That misattributed loss is the Dirty Data Tax. The fix was never a better model — it was reconciling the inputs first.