Enigma Knowledge

Glossary

Ground Truth: Verified Data from Primary Sources

February 5, 2026

Understand ground truth in business data—verified, authoritative information from primary sources rather than estimates or models.

Ground truth is verified, authoritative data derived from primary sources rather than estimates, models, or aggregated signals. In business verification, ground truth comes from official registries, observed transactions, and validated operating data—not inferred or modeled attributes.

Ground Truth vs. Estimates

Much business data is estimated or modeled:

Revenue

  • Estimate: Modeled from employee count and industry
  • Ground Truth: Actual transaction data

Employee count

  • Estimate: Inferred from office size
  • Ground Truth: Payroll records

Operating status

  • Estimate: Assumed from last filing date
  • Ground Truth: Observed recent transactions

Location

  • Estimate: Registered address
  • Ground Truth: Verified operating site

Estimates have their place, but high-stakes decisions require ground truth.

Why Ground Truth Matters

Verification Accuracy

Estimates can be wildly wrong:

  • A company might file in Delaware but have zero Delaware presence
  • Revenue models assume industry averages; actual businesses vary enormously
  • A business might be registered but never actually operated

Ground truth tells you what's real.

Risk Assessment

Risk models built on estimates inherit their errors:

  • Overestimated revenue → underestimated risk
  • Assumed active status → missed business closures
  • Modeled employee count → wrong industry classification

Ground truth enables accurate risk scoring.

Regulatory Compliance

Regulators expect verified information:

  • KYB requires confirming business legitimacy
  • CDD requires understanding the customer
  • EDD requires source of funds verification

"We estimated they were legitimate" doesn't satisfy examiners.

Sources of Ground Truth

Official Registries

  • Secretary of State filings (entity existence, officers, registered agent)
  • IRS records (EIN, tax status)
  • State licensing databases (professional licenses, permits)
  • Court records (liens, judgments, bankruptcies)

Transaction Data

  • Card transaction records (actual revenue, operating status)
  • Banking data (account activity, cash flow)
  • Payment processor records (processing volume)

Direct Verification

  • Site visits (physical presence)
  • Utility records (operational indicators)
  • Business correspondence (verified contact)

Third-Party Validation

  • Credit bureau business records
  • Industry-specific databases
  • Verified review platforms

The Ground Truth Hierarchy

Not all sources are equal:

1

  • Source Type: Government records
  • Example: Secretary of State, IRS

2

  • Source Type: Financial transactions
  • Example: Card spend, bank records

3

  • Source Type: Licensed third parties
  • Example: Credit bureaus, D&B

4

  • Source Type: Self-reported, verified
  • Example: Applications with document upload

5

  • Source Type: Self-reported, unverified
  • Example: Form submissions

6

  • Source Type: Modeled/estimated
  • Example: Revenue models, inferred data

Higher tiers provide stronger ground truth.

Ground Truth in Practice

KYB Verification

Ground truth approach:

  1. Match application to Secretary of State record (Tier 1)
  2. Verify operating status via transaction data (Tier 2)
  3. Confirm ownership through registry (Tier 1)
  4. Validate address through multiple sources (Tier 1-3)

Estimate approach:

  1. Accept stated name and address
  2. Model revenue from industry
  3. Assume active if recently filed

When Estimates Are Acceptable

Ground truth isn't always available or necessary:

  • Low-risk decisions may tolerate estimates
  • Some attributes (future growth) can only be projected
  • Cost/benefit may favor estimates for certain use cases

The key is knowing when you have ground truth and when you don't.

Key Takeaways

  • Ground truth is verified data from primary, authoritative sources
  • Estimates and models are not ground truth—they're approximations
  • High-stakes decisions require ground truth—verification, compliance, risk
  • Sources have different authority levels—government records > models
  • Know what you have—distinguish ground truth from estimates in your data

Related: Entity Verification | Data Enrichment | Operating Status