This pulls live job postings into pandas and runs three real analyses — hiring by seniority, salary disclosure rates by category, and remote share over time — on a normalized dataset, not scraped HTML. Every snippet is complete and runnable against the live API with a free key.
Setup
Two dependencies: requests to pull the data and pandas to shape it. Set your API key as an environment variable so it never lands in the code.
# Python 3.10+
pip install requests pandasThen export JLA_API_KEY=jla_live_your_key_here. A free key from the dashboard gives you 5,000 requests a month, which is plenty for analysis.
Pulling a dataset
The fetch function pages through /v1/jobs with offset pagination, capping the pull at a row limit so an exploratory run stays cheap. The one piece that matters in practice is rate-limit handling: when the API returns 429, it tells you exactly how long to wait in the Retry-After header, so you honor that rather than guessing a backoff.
# fetch.py — pull a bounded dataset with polite rate-limit handling.
import os, time, requests
API_BASE = "https://api.joblistingsapi.com/v1"
API_KEY = os.environ["JLA_API_KEY"]
SESSION = requests.Session()
SESSION.headers["X-API-Key"] = API_KEY
def fetch_jobs(max_rows: int = 1000, page_size: int = 100, **filters) -> list[dict]:
"""Page through /v1/jobs with offset, capping the pull at max_rows.
The free tier allows 10 requests/minute and 5,000/month, so a 1,000-row
pull is ten requests — well inside the free budget. On a 429 we honor the
Retry-After header instead of guessing a backoff.
"""
jobs: list[dict] = []
offset = 0
while len(jobs) < max_rows:
limit = min(page_size, max_rows - len(jobs))
res = SESSION.get(
f"{API_BASE}/jobs",
params={"limit": limit, "offset": offset, **filters},
)
# Rate limited: wait exactly as long as the server asks, then retry.
if res.status_code == 429:
wait = int(res.headers.get("Retry-After", "5"))
time.sleep(wait)
continue
res.raise_for_status()
body = res.json()
batch = body["jobs"]
if not batch:
break # reached the end of the dataset before max_rows
jobs.extend(batch)
offset += len(batch)
# Be a good citizen between pages even when not rate limited.
time.sleep(0.2)
return jobs[:max_rows]
if __name__ == "__main__":
rows = fetch_jobs(max_rows=1000, country="GB")
print(f"pulled {len(rows)} jobs")Capping at 1,000–2,000 rows keeps the example inside the free tier — a 1,000-row pull is ten requests, against a budget of 10 per minute and 5,000 per month. Scale the cap up only when you move to a paid plan and need the volume.
Into a DataFrame
pd.json_normalize flattens the nested records. Two fields need attention: salary is a nested object that is null on most postings, so we lift its fields to flat columns and let absent pay become NaN; and listed_at is a timestamp we parse so we can resample by time later.
# analyze.py — normalize the rows into a flat DataFrame.
import pandas as pd
from fetch import fetch_jobs
rows = fetch_jobs(max_rows=1000, country="GB")
df = pd.json_normalize(rows)
# Salary is a nested object that is null on most postings. Pull the fields we
# need to flat columns; rows without disclosed pay become NaN, which is correct.
for field in ("min", "max", "currency"):
col = f"salary.{field}"
df[f"salary_{field}"] = df[col] if col in df.columns else pd.NA
# Parse the listing timestamp so we can resample by time. Some postings have no
# listed_at; coerce keeps those as NaT instead of raising.
df["listed_at"] = pd.to_datetime(df["listed_at"], errors="coerce", utc=True)
df["has_salary"] = df["salary_min"].notna()
print(df[["title", "company", "seniority", "role_category", "has_salary"]].head())Three analyses
Each of these is a few lines on the DataFrame above. They are the questions people most often bring to posting data.
Postings by seniority
A straight value_counts on the normalized seniority column tells you where hiring is weighted. Because seniority is normalized on every record, "Sr.", "Senior", and "II" have already resolved to one level — you are not counting spelling variants.
# 1. Postings by seniority — where is the hiring weighted?
by_seniority = df["seniority"].value_counts(dropna=False)
print(by_seniority)
# senior 412
# mid 301
# junior 188
# lead 74
# NaN 25 (postings where the source did not declare a level)Salary disclosure rate by category
Group by role category and take the mean of the boolean has_salary column. The mean of a 0/1 series is the share that is true, so this is the percentage of postings in each category that disclosed pay — a genuinely interesting cut, since disclosure varies a lot by function.
# 2. Salary disclosure rate by role category — who actually posts pay?
# Group by category and take the mean of the boolean has_salary column: the
# mean of a 0/1 series is the share that is True, i.e. the disclosure rate.
disclosure = (
df.groupby("role_category")["has_salary"]
.mean()
.sort_values(ascending=False)
.mul(100)
.round(1)
)
print(disclosure)
# engineering 34.2
# data 29.8
# sales 18.1
# (each value is the percent of postings in that category that disclosed pay)Remote share over time
Resample the postings into weekly buckets and take the mean of is_remote. Being boolean, its weekly mean is the remote share of that week's postings.
# 3. Remote share over time — is remote rising or falling week to week?
# Resample the postings into weekly buckets and take the mean of is_remote,
# which (being boolean) gives the remote share of each week's postings.
weekly_remote = (
df.dropna(subset=["listed_at"])
.set_index("listed_at")
.sort_index()["is_remote"]
.resample("W")
.mean()
.mul(100)
.round(1)
)
print(weekly_remote)
# Caveat: the live dataset is a rolling 21-day window, so this resample only
# spans ~3 weeks. For a real time series, snapshot on a schedule (see below).Caveats
Two honest limits shape what these numbers can and cannot tell you.
- The live window is 21 days. The endpoint holds a rolling 21-day active window, so the remote-over-time resample only spans about three weeks. For trends over months, you cannot read them out of one pull — snapshot the dataset on a schedule (a daily or weekly cron writing each pull to storage) and build your time series from your own snapshots. The data is a current census, not a historical archive.
- Salary is self-reported by the employer. The salary fields are populated only when the posting states pay, and only roughly 15–25% of postings do. Disclosure rates therefore measure how often employers in a category choose to post pay, which is not the same as the underlying pay distribution. Never read an absent salary as a low one — it is simply undisclosed.
For ready-made aggregates you do not have to compute yourself — live counts by source, category, and country — see the Job Market Pulse, or the docs for the full schema and filters. To put the same data on a page instead of in a notebook, see building a job board with Next.js.