Data quality

What's in every record — honestly.

We'd rather tell you a field is often missing than pretend it isn't. Here's exactly how complete the data is, how we collapse duplicates, and what happens when a posting comes down.

Always populated

100%
  • titleRole title, as listed.
  • companyHiring company name.
  • locationAt least the raw string; usually structured city/region/country.
  • urlOriginal posting URL on the source ATS.
  • sourceWhich of the 13 ATS platforms it came from.
  • role_category / seniorityNormalized taxonomy — defaulted when unclear, never blank.
  • duplicate_cluster_idAssigned to every record for dedup.

Usually present

~90%
  • description_htmlFull posting body, sanitized to HTML. Present on roughly 90% of postings (Starter+ to read).

Present when disclosed

15–70%
  • remote_policyMeaningful on ~70% of postings; null when the source doesn't say.
  • salaryOnly ~15–25% of postings disclose pay. When the source omits it, the field is null — we never fabricate a number.

Deduplication

The same role is frequently posted to more than one board — a company might list it on Greenhouse and mirror it to LinkedIn-style feeds and niche boards. We detect those and group them under one duplicate_cluster_id. Records sharing that UUID are the same underlying posting seen on multiple sources, so you can collapse them to one row — or keep every variant — without guessing at fuzzy title/company matches yourself.

Lifecycle of a posting

  1. 1

    Active window — 21 days

    A posting stays in the active set as long as we keep seeing it live, and for up to 21 days after we last did. Beyond that it rolls out of the active window.

  2. 2

    Delisted — status: removed

    When a posting disappears from its source we don't silently drop it. We flip its status to “removed” so your mirror can reconcile, rather than leaving a stale row that looks active.

  3. 3

    valid_through on every record

    Each record carries a valid_through hint for when the posting is expected to expire, so you can age out roles consistently even before we mark them removed.