Case study · Activation · open source

Qualify and route every inbound lead in seconds, not hours

Speed-to-Lead Agent is an open-source AI agent that qualifies and routes inbound leads in seconds, self-hosted and running locally with zero keys. A LangGraph pipeline enriches each lead, scores its fit with a fine-tuned classifier, drafts a tailored reply, and routes it to a CRM or Slack. Speed is the highest-leverage variable in inbound sales: the canonical HBR study found that responding within an hour makes a team about 7× more likely to reach a decision-maker, yet most first replies still take hours.

PythonFastAPILangGraphRedisDocker
PEFT / LoRAMCP serverSelf-hosted
0.938
classifier accuracy
0.933
macro-F1
~2×
vs the keyword baseline
seconds
first-response time
The problem

Good leads go cold while someone is in a meeting

Every inbound lead has a half-life. The longer a real buyer waits for a first response, the colder the conversation gets, and the data on this is brutal: a lead worked within the hour is multiples more likely to convert than one worked an hour later. But staffing a human to read, qualify, and personally reply to every form fill in seconds, around the clock, does not scale. The honest fix is not a faster human. It is an agent that does the reading and the first draft, and knows when it is confident enough to send on its own versus hand a human a ready-to-go reply.

How it works

A LangGraph pipeline from webhook to routed reply

Webhook (202 instant) Redis queue Research / enrich Qualify (classifier) Draft reply Route (CRM / Slack / send) Funnel metrics

The webhook returns 202 immediately and a worker runs the slow part, so lead capture never blocks on an LLM call. Spam and non-buyers are discarded and logged; real leads get drafted and routed.

The classifier

A fine-tuned model that beats keyword rules for ~$0

Qualification runs behind one interface with two implementations: a transparent rule baseline (the keyless default), and a DistilBERT intent classifier fine-tuned with PEFT/LoRA, 744K trainable params (1.1% of the model). make train produces the adapter in about 30 seconds on a laptop; when present it loads automatically.

StrategyAccuracyMacro-F1$ / 1k leads
Rule baseline (keyword)0.5000.500$0
LoRA classifier0.9380.933~$0 (local)

Measured on a hand-written, held-out set of realistic messages the model never saw in training, nearly 2× the intent accuracy of keyword rules on unseen phrasing, for about $0, locally, in milliseconds. That is the case for fine-tuning over a per-lead LLM call. Full methodology lives in the repo’s benchmarks and model card.

Why it is production-shaped

Explainable, provider-agnostic, and yours to host

Explainable qualification

Every lead gets a tier (hot / warm / cold / spam), an ICP-fit score, a buyer-intent label, and human-readable reasons. A confidence gate decides auto-send versus human review, the same discipline as never presenting a guess as fact.

Provider-agnostic drafts

Intent-aware first-touch replies through litellm (Gemini / Groq / OpenAI and others), with a keyless template fallback so a missing key disables a feature instead of breaking the app.

Real GTM integrations

Push leads to Twenty or HubSpot, alert a Slack channel, or send email, all behind your own keys. An MCP server exposes the same capabilities to Claude or Cursor as tools.

Funnel analytics

Source attribution, qualification rate, and speed-to-lead p50 / p95 latency, exposed as JSON and Prometheus, so the speed gain shows up in numbers, not claims.

Self-hosted instant lead response, without the per-seat SaaS

It is open-source (MIT) and runs locally with zero keys. Clone it, run make demo, and watch sample leads get qualified, scored, and routed, then bring your own keys to put it in production.