FDE Screening Methodology

We don't do leetcode.
We drop them in a real repo.

A Forward-Deployed Engineer is assessed across four deep dimensions — and then put through a live battle test with an actual codebase. Theory without practice is noise. We want to see how they think when the stakes are real.

See the Battle Test The 4 Pillars
4
Technical dimensions
assessed in depth
<3%
Pass rate through
full screening
72h
Live repo challenge
window
30d
Post-hire guarantee
if fit isn't right

The Four Pillars
What we actually test
Four areas where most AI-era engineers have critical blind spots. We don't skim them — we go deep on each one.
Pillar 01
🧠

Soft Skills in AI Context

Working with AI is a collaboration, not a query engine. We look for engineers who have developed a genuine working relationship with AI tools — who know when to push, when to steer, and when to trust the output.

  • Delegates effectively — offloads boilerplate, keeps judgment for architecture
  • Prompts with context, not just instructions — gives the model what it needs to be right
  • Reviews AI output critically — doesn't ship what it can't explain
  • Communicates AI limitations clearly to non-technical stakeholders
  • Stays async-friendly — writes decisions down, not just tells them to Claude
  • Flag: treats AI as a magic box, copies output without reading
  • Flag: over-corrects every output, never trusts the model at all
Pillar 02
⚙️

QA & CI/CD Fluency

AI-generated code ships fast. Which means the testing and delivery pipeline matters more than ever. An FDE who can't wire a CI check is a liability at 10x velocity.

  • Writes tests as part of the feature, not after — before, during, with
  • Can set up GitHub Actions from scratch: lint, test, deploy stages
  • Understands the difference between unit, integration, and e2e — and when each is worth the cost
  • Reads CI failure output and fixes root causes, not symptoms
  • Treats a red build as a blocker, not a suggestion
  • Uses test coverage as a signal, not a target
  • Flag: "tests are for later" mentality
  • Flag: can't explain what their CI pipeline actually does
Pillar 03
🗄️

Data Modeling

Most AI-generated code has a weak data layer. A real FDE designs the schema first, shapes the API around the data, and understands what happens at 10M rows.

  • Can reason about normalization — knows when to denormalize and why
  • Designs for query patterns, not just storage — indexes aren't an afterthought
  • Understands relational vs document vs vector stores — picks the right tool
  • Knows how schema changes propagate through an app — migrations, backward compat
  • Can read an ERD and find the problems without being told where to look
  • Thinks about data lifecycle: creation, mutation, archival, deletion
  • Flag: designs schemas by vibes, not query patterns
  • Flag: no experience with migrations in a live system
Pillar 04

LLM Theoretical Base

You can't build reliable AI features without understanding the substrate. We test for genuine understanding — not vendor docs memorized, but actual mental models of how the technology works.

  • Understands transformer attention — can explain what a context window actually is
  • Knows the difference between fine-tuning, RAG, and prompt engineering — uses each where appropriate
  • Understands tokenization and its implications for prompting and cost
  • Can reason about latency vs quality tradeoffs (model size, streaming, caching)
  • Knows how agents work: tool use, multi-step reasoning, memory patterns
  • Understands hallucination: causes, mitigation, when to trust the output
  • Flag: thinks "bigger model = always better"
  • Flag: hasn't thought about cost at production scale

Instant disqualifiers — we stop the process here

Can't explain their own code Ships without running locally No opinion on tradeoffs Blames the model for wrong output No test for any feature in portfolio Can't describe a past production incident Uses AI to avoid understanding the problem Passive about ownership

Live Battle Test
We give them a real repo.
Then we watch.
No whiteboard. No contrived puzzles. We hand over a production-grade codebase — bugs, messy history, and all — and give them 72 hours to ship something real.
fde-challenge — zsh
# Candidate receives this at 09:00 Monday
 
$git clone git@github.com:recruiter-assistant/fde-challenge-2026.git
Cloning into 'fde-challenge-2026'... done.
 
$cat CHALLENGE.md
 
## Your mission (72h window)
 
1. The /candidates/score endpoint is broken for batches >100.
   Find the root cause. Fix it. Add a regression test.
 
2. The recruiter agent loses context after 3 tool calls.
   Trace the issue. Propose a fix with tests. Ship it if confident.
 
3. The data model for `Job` is missing soft-delete support.
   Write the migration. Update the API. Don't break existing tests.
 
4. (Bonus) The CI pipeline skips linting on PRs from forks.
   Fix it. Explain why it was skipped in the first place.
 
## Rules
- Use whatever AI tools you normally use. We want to see how you work.
- Commit as you go. We read the git history, not just the final state.
- Write a short ADR for each decision that wasn't obvious.
- Open a PR when done. Self-review it first.
 
Deadline: Wednesday 09:00. Questions via async Slack only.
What we read

Git history

We watch every commit. Tiny atomic commits show discipline. One giant commit at the end shows panic. How they work under pressure is the data we actually want.

What we read

The PR description

A good FDE writes like a senior who's been burned before: what broke, why, what they changed, what they'd do differently with more time. If they can't write it, they can't own it.

What we read

The ADRs

Architecture Decision Records show whether they're making choices or guessing. We look for tradeoff reasoning, not just answers. "I did X because Y was slower given Z" beats "X seemed better".


Evaluation
What passing looks like
We're not looking for perfect code. We're looking for the right instincts.
We want to see this
  • Reads the existing code before writing a single line
  • Asks one precise clarifying question via async Slack
  • Reproduces the bug before touching the fix
  • Tests go in the same commit as the fix, not after
  • Migration has a rollback path
  • PR description explains tradeoffs, not just what changed
  • Uses Claude Code to speed up boilerplate, writes logic themselves
  • Comments on their own PR before asking for review
  • Mentions what they didn't get to and why
This ends the process
  • First commit is 47 files, no message
  • No tests anywhere in the submission
  • Migration has no rollback
  • PR description is "fixed stuff"
  • AI-generated code they clearly didn't read
  • Broke existing tests, didn't notice
  • Changed the challenge scope without flagging it
  • No ADRs — decisions made, none explained
  • Submitted at 08:59 Wednesday. Rushed output shows.

Scoring
How we weight the dimensions
Each dimension is scored independently. A perfect battle test doesn't offset a broken LLM mental model. All four gates must pass.
25%
Soft skills
in AI context
25%
QA & CI/CD
fluency
25%
Data
modeling
25%
LLM
theory
Battle test gate
Candidates must pass the battle test to proceed regardless of pillar scores. A great theory score doesn't compensate for shipping broken code.
Domain fit overlay
After scoring, we overlay your domain requirements. A fintech FDE needs different instincts than a healthtech one. We match the final profile to your context.

“They sent me a candidate who had already shipped an AI-assisted migration in a fintech codebase. I hired them in the first 30 minutes of the call.
— CTO, B2B SaaS, 40-person team

Want an engineer who passes this bar?

Tell us your domain, team size, and what “good” looks like. We'll find someone who clears it.

Start Hiring FDEs Back to Recruiter Assistant