AI Sourcing Benchmark 2025:
The First Independent Study of People Search Engines


Jerry Sahon · September 2025

🕐 · 7 min read

AI Sourcing / HR Tech / People search

Dozens of AI recruiting tools have launched since 2023. Most of them focus on later stages of hiring — screening, interviewing, engaging, onboarding. Useful, yes, but not groundbreaking. These are problems that are relatively well understood and solvable with enough automation.

The real challenge shows up much earlier: proactive candidate sourcing. That’s where companies get stuck when they need rare talent, senior experts, or when their inbound pipeline is full of irrelevant applications — or empty altogether. Tools for sourcing are far fewer than those for automating recruitment workflows. We’re one of the few.

And here’s the truth: none of us have cracked sourcing. Not yet. No one has achieved the level of candidate relevance that makes the process feel solved.
People search is not the same as information search. And while AI has made supersonic progress in general search, candidate sourcing remains a messy, multi-stage, bias-laden process.
Why we ran a benchmark
In spring 2025, we decided to run a study. Not to race competitors, but to map the ground we’re all standing on. We never thought of ourselves as fighting rivals. From the start, our team has believed that we — alongside the others in this space — are reshaping the form of recruiting itself. Together, we’re creating a new frame for the industry and helping clients learn how to use it.

Our aim with the benchmark was simple: to understand the landscape of this still tiny domain, and to locate ourselves within it.

Over two and a half years of R&D, we’d measured ourselves only against internal metrics we had invented for ourselves. But no public metrics existed. No fair ranking. No real measurement. With one of our founders,
Vlad Sly, already published in academic research, and an investor event on the calendar, the conclusion was obvious: if no one else would measure this industry, we had to.
How we designed the benchmark
We aren’t part of an academic community — frankly, such a community barely exists yet in sourcing. But we wanted this benchmark to be reproducible, open, and transparent. That meant publishing it as a paper, with public data.

Here’s how we approached it:
  1. Real client queries. Natural language searches drawn directly from recruiter use cases. The same phrasing they use in practice.
  2. Cross-platform testing. The same queries run across multiple candidate sourcing platforms, producing more than 1,700 results.
  3. Consistent labeling. Pearch had already invested heavily in internal scoring, but we built a simpler framework so independent recruiters could judge relevance fairly.
  4. Independent reviewers. Professional recruiters evaluated candidates side by side in a blind test, scored through the Elo system.
In chess, Elo measures player strength based on who they beat — defeating a strong opponent moves your score up more than defeating a weaker one. Applied here, recruiters compared two candidates at a time and selected the better match.
5. Scale. 1,000 candidate pairs. Dozens of hours of recruiter time.
👉🏻 The point wasn’t to crown a winner. It was to create a clear, practical, replicable way to measure the state of AI candidate sourcing.
What the test revealed
This market is still early. Some tools just convert prompts into filters. Others rely on semantic search, which understands nuance but often misses precision. Data is still king — but without a strong engine, even petabytes of data don’t deliver.

From the outside, the category looks like it’s sprinting. Up close, progress is uneven. Some systems are optimized for speed and deliver generic results. Others dig deeper but wander off target. People search is not text search. It carries bias, context, career paths, and human intent. The bar for relevance — shaped by human judgment and domain expertise — is much higher than it seems. That makes it fundamentally different from general information search, where teams like Exa have focused on advancing large-scale semantic search across the web.
And yet, the progress from just a few years ago is real. Tools that didn’t exist then are running in production today. Even our own internal sense — that the best results in the market right now reach about four out of ten — lands as both underwhelming and remarkable. Underwhelming because the gap to ten is obvious. Remarkable because, not long ago, four out of ten wasn’t even possible.
What it means for us
This research gave us clarity: not just about where the market is, but about where it could go. For us, it confirmed that the right move isn’t to build another polished interface with colored buttons and endless integrations.

Our mission is deeper — to build one search engine for people that others in our field can rely on. Companies like Juicebox are building powerful tools designed directly for recruiters and professionals. By using our technology as a foundation, they can stay focused on the end-users — recruiters — and deliver maximum value, while we take care of the sourcing engine underneath.

Closing thoughts

The sourcing engine we’re building isn’t just for us. It’s infrastructure for an industry still in its infancy.

If sourcing becomes as measurable and transparent as other technologies, the winners won’t be individual tools —

it will be the entire ecosystem that grows on top of it.


Benchmarks aren’t about winners and losers; they’re about raising the floor. By publishing ours, we want to push sourcing forward together. And that will take more than one company.


Full study: our research paper on ArXiv.org