SourceKit logo
Docs
User reference guide
Shareable product description

SourceKit finds engineers by what they've built.

SourceKit is an evidence-first sourcing system for technical hiring. You start with a role or JD, SourceKit turns it into a discovery plan, verifies signal across GitHub and market context, then returns a ranked pipeline that is ready for outreach.

artifact-first discovery EEA criteria support persistent Websets parallel company graph ranked candidate pipeline
Run your first search
SourceKitSourceKit
ExaExa
ClaudeClaude
HarmonicHarmonic
ParallelParallel
GitHubGitHub

Quick start paths

Pick one input path, get ranked candidates, and take one clear first action.

Role + Company

Fastest

Input

`Staff Backend Engineer` + target company context.

Expected output

Repo set, shortlist of contributor-led candidates, score distribution.

First action

Edit top repos before running outreach.

Full JD

Most complete

Input

Paste full JD with stack, seniority, and constraints.

Expected output

Criteria draft + market-adjacent discovery + stronger filtering.

First action

Convert top criteria to binary EEA checks.

Job URL

Operational

Input

Paste Lever/Greenhouse/Ashby link.

Expected output

Autoparsed scope, ranked candidates, Webset-ready criteria.

First action

Promote durable searches into a weekly Webset.

Suggested EEA signals are added for every search

If strategy output has no usable EEA criteria, SourceKit seeds 3-5 draft checks you can edit before creating a Webset. This keeps every search evidence-first by default.

Founding ML Engineer

ML + infra
  • Contribution ownership in core repos (maintainer/reviewer/top contributor).
  • Production ML system ownership with reliability/scale evidence.
  • Model-infrastructure depth across training/serving/tooling surfaces.
  • Public technical artifact tied to shipped ML work.

Staff Backend (Distributed)

Systems
  • Contribution ownership in distributed systems repos.
  • Reliability/latency/throughput improvement ownership.
  • Maintainer/reviewer/RFC-author behavior on infra projects.
  • Recent shipped impact in the last 12 months.

Security Engineer

Security
  • Security remediation ownership (CVE/advisory/critical patch).
  • Public security artifact (talk, writeup, audit, or analysis).
  • Contribution ownership in security-focused repositories.
  • Recent shipped security impact with public proof.

Staff Frontend Platform

Frontend
  • Framework/tooling contribution beyond app-level changes.
  • Platform performance or build-system ownership evidence.
  • Contribution ownership in shared frontend surfaces.
  • Public artifact showing cross-team DX impact.

Value: Faster setup

Operators start from concrete criteria in every run instead of writing EEA checks from scratch.

Value: Lower noise

Binary evidence criteria reduce false positives before scoring and outreach effort accumulates.

Value: Better spend

Verification-first criteria keep enrichment and outreach spend focused on candidates with proof.

Value proposition

Why teams use SourceKit instead of title search and static LinkedIn filters.

Hidden gem rate

~40%

Top candidates often have limited profile visibility. Artifact-led discovery surfaces strong builders before they become heavily recruited.

Pipeline behavior

Always on

Websets convert one strong search into a persistent, auto-updating candidate stream with verified entrants.

Screening logic

Binary proof

Criteria can be framed as pass/fail against public evidence, reducing soft interpretation and resume-style noise.

End-to-end walkthrough

The operating flow from intake to pipeline. Click each stage to view what to do, what the system does, and what success looks like.

Operator action

Start from role input, full JD, or job URL.

  • Input: role statement, full JD, or job URL.
  • Action: keep stack and constraints explicit.
  • Output: cleaner repo targeting at planning step.
Details

Specificity at intake is the highest leverage quality control for the entire run.

System output

Normalized role context for planning APIs.

Success signal

Role statement is precise enough to eliminate generic repo suggestions.

Feature reference

Core capabilities and where they create leverage in the workflow.

Role to Search Strategy

Planning
  • Accepts role text, full JD, or job URL.
  • Generates repo targets, company targets, and criteria draft.
  • Supports manual refinement before execution.

Multi-API Discovery Layer

Discovery
  • Combines Exa, Parallel, and GitHub signals.
  • Expands discovery through technical and market adjacency.
  • Returns artifact-backed candidate context.

Builder Score Evaluation

Scoring
  • Scores candidates 0-100 across contribution dimensions.
  • Weights recency, commit velocity, stack match, and impact.
  • Produces ranked shortlist with evidence markers.

Pipeline Workflow

Execution
  • Stage-based candidate movement for sourcing operations.
  • Supports compare, summarize, and batch actions.
  • Designed for recruiter throughput after technical filtering.

Exa Websets

Persistent search
  • Creates auto-updating candidate collections from criteria.
  • Appends new verified matches on schedule.
  • Supports enrich, monitor, and override workflows.

Exports and Integrations

Output
  • Exports via API and CSV for downstream workflows.
  • Feeds Clay/Parallel-style sequencing and enrichment flows.
  • Keeps artifact-level signal attached to candidate records.

How to get maximum value

Operational habits that improve quality and reduce wasted effort.

Do this

Use precise role framing.

Specific role language drives better repo targeting and less scoring noise.

Edit the repo list every run.

Repo quality is the biggest upstream lever for candidate quality.

Define 3-5 verifiable EEA markers.

Use objective proof signals before enrichment and outreach.

Convert durable roles to Websets.

Let the best searches compound through weekly or daily monitoring.

Avoid this

Generic criteria text.

Criteria like "strong engineer" will inflate false positives.

Title-only filtering.

Front-loading title filters reintroduces profile bias and misses builders.

Enriching before verification.

Verify first, enrich survivors second to protect spend and quality.

Single-market assumptions.

Use adjacency signals to reach less saturated ecosystems.

Sample EEA criteria + Webset guidance

Practical examples you can reuse. Keep criteria verifiable from public artifacts and keep Websets strict at admission.

Sample EEA criteria by role

Founding ML Engineer

  • Top-10 contributor to frontier ML infra repo.
  • Shipped production inference or training system.
  • Publication or conference artifact in ML systems.

Staff Backend (Distributed)

  • 50+ commits to distributed systems project.
  • Maintainer/reviewer or RFC ownership signal.
  • Evidence of production-scale reliability work.

Security Engineer

  • CVE discovery, advisory, or remediation ownership.
  • Security commits to relevant OSS repos.
  • Public proof: talks, writeups, or audits.

Staff Frontend Platform

  • Core contributions to framework/tooling ecosystem.
  • Perf or build-system optimization ownership.
  • Cross-team DX/platform impact evidence.

Webset operating playbook

1) Set strict admission criteria

Use 3-5 binary checks. Avoid soft language like "strong" or "solid."

2) Verify first, enrich second

Run criteria filters before adding contact or publication enrichments.

3) Use weekly cadence by default

Daily is best for urgent hiring; weekly is cleaner for most durable roles.

4) Replace stale criteria quickly

If false positives rise, tighten criteria before scaling outreach volume.

5) Track conversion by criteria set

Keep the criteria version with each cohort to learn which definition works.

Use case recipes

Example patterns teams can run immediately.

Founding ML Engineer (stealth startup)

ML Infra

Search setup

Seed with model infra repos (`vllm`, `transformers`, `triton`) and narrow to startup/early-team surfaces.

EEA criteria

Top contributor rank, shipped production inference/training system, and paper/talk signal in relevant venues.

Expected output

High-signal shortlist of builders with maintainer velocity and low profile saturation.

Target repos

10-20

Score bar

85+

Webset cadence

Weekly

Forward Deployed Engineer

Deployment

Search setup

Target distributed systems repos plus customer deployment indicators and implementation depth constraints.

EEA criteria

Production ownership proof, systems reliability changes, and evidence of customer-facing technical delivery.

Expected output

Candidates with both backend depth and field execution signal, not pure platform-only profiles.

Primary stack

Go + Python

Score bar

82+

Pipeline stage

Contact fast

Staff Frontend Platform Engineer

Frontend Platform

Search setup

Focus on framework core repos, perf tooling ecosystems, and maintainership markers over title matching.

EEA criteria

Core contribution to framework/tooling, performance ownership, and cross-team DX impact evidence.

Expected output

Platform-minded ICs who improve system-level frontend velocity across teams.

Signal type

Maintainer

Score bar

80+

Key proof

Tooling commits

Security Engineer (product + OSS)

Security

Search setup

Define criteria around CVE discovery, advisories, and security-centric repos with active remediation work.

EEA criteria

CVE or advisory contribution, sustained security commits, and public research artifact (talk or writeup).

Expected output

Candidates with proof of practical offensive/defensive capability and product-grade security ownership.

Signal source

CVE + commits

Score bar

83+

Review mode

Strict verify

Worked examples (realistic)

Concrete repo targets, criteria thresholds, and expected pipeline output to calibrate your first runs.

Example 1: Founding ML Infra Engineer

Expected 35-60 candidates

Repo targets

`vllm`, `transformers`, `triton`, `deepspeed`, `llama.cpp`, `ray`.

Score threshold

Builder Score >= 85 and at least 2 EEA criteria met.

Bad criteria

"Strong ML engineer, startup mindset, good communicator."

Better criteria

Top-10 contributor rank OR production inference ownership + public technical artifact.

Example 2: Staff Backend (Distributed Systems)

Expected 45-80 candidates

Repo targets

`kubernetes`, `temporal`, `envoy`, `vitess`, `cockroachdb`, `grpc`.

Score threshold

Builder Score >= 82 with production ownership proof.

Bad criteria

"Great backend developer from top company."

Better criteria

50+ commits + maintainer/reviewer signal + reliability/latency or scaling evidence.

Best for, not ideal for, and limits

Set expectations early so teams use SourceKit where it performs best.

Best for

  • Technical roles with strong public artifact signal.
  • Teams prioritizing objective proof over profile polish.
  • Durable searches that benefit from weekly Websets.

Not ideal for

  • Roles with little or no public technical footprint.
  • Hiring where title pedigree is the primary requirement.
  • One-off searches with no criteria refinement cycle.

What it does not do

  • Does not replace interview evaluation or references.
  • Does not guarantee intent-to-join or compensation fit.
  • Does not rely on self-reported profile claims alone.

Starter templates

Copy-ready prompts for new searches and Webset setup.

Role intake template

Search setup
Role: Staff Backend Engineer (distributed systems) Primary work: high-throughput APIs and workflow orchestration Must-have evidence: - 50+ meaningful commits to relevant repos - ownership signal (maintainer/reviewer/RFC) - production-scale system evidence Target company surfaces: Infra-heavy startups + OSS-adjacent teams

Webset criteria template

Persistent pipeline
Build a weekly Webset for Founding ML Engineers. Admit only if candidate matches at least 2/3: 1) top contributor to frontier ML repo 2) shipped production ML system 3) publication/talk evidence in relevant venues Add enrichments: email, current company, GitHub stats.

Run this now

Take one action to validate the workflow with your current role.

Start with one role, tighten criteria after first results, and convert the winning search into a weekly Webset.

Run your first search