Research-backed · AI-era technical hiring

Assess how candidates work with AI — not just what they submit.

FairShot is a process-aware technical assessment platform grounded in peer-reviewed research on behavioral telemetry. It captures 51 signals across coding sessions to distinguish strategic AI use from blind copying.

51
Behavioral signals captured
96.75%
Model accuracy on synthetic benchmark
7
Human–AI collaboration archetypes
HACI
Human–AI Collaboration Index
The future of hiring is not about banning AI — it is about understanding who uses it with judgment, speed, and real problem-solving ability.

Traditional evaluation breaks when AI rewrites the process.

Two candidates can submit identical correct code for completely different reasons. Final-output evaluation is blind to the difference.

01

Correct code proves nothing

AI can generate working solutions in seconds. A polished final result no longer signals understanding, problem-solving, or independent thought.

02

The question is how, not whether

Did they prompt strategically? Edit AI output critically? Debug intentionally? Or paste the first plausible result without verification?

03

Reviewers lack evidence

Hiring teams have no visibility into the collaboration process — only a deliverable stripped of every signal that actually mattered.

Built on peer-reviewed behavioral telemetry research.

FairShot's evaluation model is grounded in a controlled synthetic simulation study that defined 51 signals across five behavioral categories — tested at 96.75% classification accuracy.

Signal Distribution — 51 total

Code Evolution
10 signals
AI Prompt NLP
12 signals
IDE Interaction
10 signals
Keystroke Dynamics
9 signals
Temporal Workflow
10 signals

Behavioral Telemetry for Process-Aware Evaluation of AI-Assisted Programming

−20.25pts

Removing AI Prompt signals drops accuracy by 20 points

Ablation studies confirm that how a candidate interacts with AI output is by far the most predictive feature family — more than IDE activity, keystrokes, or code evolution combined.

0.0872

Silhouette score reveals a behavioral spectrum

Unsupervised clustering shows collaboration styles don't form neat boxes — they exist on a continuum. The HACI index captures this gradient more faithfully than any binary label.

XGBoost

96.75% held-out accuracy, robust 5-fold CV

A StandardScaler + XGBoost pipeline successfully recovers intended synthetic archetype labels. Random Forest follows closely at 95.30%, both outperforming SVM significantly.

Seven patterns of human–AI collaboration.

FairShot maps every session to one of seven research-defined archetypes — from independent problem-solvers who barely touch AI, to blind copiers who paste without review.

🧠
Independent Solver
22.9%
🤝
Structured Collaborator
21.3%
⚙️
Prompt Engineer Solver
15.4%
🔄
Iterative Debugger
14.9%
🤖
AI-Dependent Constructor
15.2%
🔍
Exploratory Learner
5.7%
📋
Blind Copier
4.6%

Process beats output, every time.

SHAP feature importance analysis reveals a clear hierarchy: how a candidate interacts with AI output predicts collaboration style far better than raw activity counts.

SHAP Feature Importance

AI Output Edit Distance process 0.120
Prompt Refinement Count process 0.102
Max Paste Length process 0.089
Compile Events process 0.085
Avg Prompt Length process 0.078
Total Keystrokes count only 0.031
Files Opened count only 0.018
20pt

accuracy drop when AI Prompt NLP signals are removed

From 96.75% down to 76.50% — the single most dramatic finding from the ablation study. No other signal group comes close. Removing code evolution features actually increased accuracy slightly, suggesting partial redundancy.

Source: Ablation Study, Fig. 5
"AI interaction features are the most important feature family to consider."

Three steps to process-aware evaluation.

FairShot is built for controlled pilot assessments — with telemetry capture, session evidence, and reviewer-facing analysis.

1

Run a managed assessment

Candidates complete a technical task in a structured environment designed for AI-era workflows — not an AI-free fiction that bears no resemblance to real work.

2

Capture behavioral evidence

FairShot collects 51 session-level telemetry signals spanning IDE interaction, prompt patterns, code evolution, keystrokes, and temporal flow — preserving the path, not just the destination.

3

Support reviewer decisions

Reviewers get integrity-aware summaries, archetype classification, a HACI score, and evidence packs that make collaboration style visible — keeping humans in the loop at every step.

Pilot-ready for the teams that need it most.

Best suited today for forward-thinking partners who want better signal than traditional coding tests can provide.

Startups hiring engineers

Teams that want to evaluate tool-augmented performance in context — not memorized LeetCode solutions delivered under artificial constraints.

Bootcamps & training programs

Programs that need honest evidence of how learners use AI to solve problems — not just whether they submit something that runs.

Universities & placement cells

Academic settings exploring fair AI-era technical evaluation for emerging developers entering a workforce that already runs on AI assistance.

Join a small, curated pilot group.

FairShot is currently best suited for controlled pilots with hiring teams, bootcamps, or university partners. If you want to test AI-era technical evaluation with real reviewer workflows, let's talk.

Current stage

Pilot-ready for controlled users. Research-backed, hypothesis-validated. Not yet marketed as broad self-serve enterprise software.

By submitting, you're joining a curated waitlist. No spam — just a direct conversation about whether FairShot is a fit for your team.