Amazon

AI resume screening tool that systematically downgraded female candidates

Read primary source

Maturity stage

Abandoned

Use-case type

Analysis

Function

Human Resources/Talent

Company size

Enterprise

Evidence

Encoded gender bias through indirect markers

ROI / outcome figure

Abandoned system; reputational damage

Deep dive

The setup

Amazon built an internal resume-screening model trained on a decade of hiring decisions, intending to scale recruiter throughput across high-volume technical roles.

What happened

The model learned the historical bias in Amazon's own hiring patterns. Resumes containing markers correlated with female applicants - women's college names, certain extracurriculars, the word 'women's' as in 'women's chess club captain' - were systematically downgraded. The system was abandoned before production.

Root cause

A data-readiness failure that masquerades as a model failure. The training data encoded protected-class proxies that no amount of debiasing the model could remove, because the proxies were the signal the model had learned. Resume-screening AI in particular has now produced four documented enterprise failures (Amazon, IBM, iTutorGroup, plus a UW academic audit), enough that the deployment archetype carries a standing warning across this site.

Takeaway for teams considering similar work

RAPID's Readiness dimension scores data-quality and bias-audit discipline. A high R score before building this system would have surfaced the proxy-encoded-bias risk during the data-readiness review, which is the only stage where it is cheap to fix.

What RAPID would have flagged

Failure mode: Data — Poor data quality, bias in training data, or insufficient data volume leading to unreliable AI outputs

Dimensions a pre-deployment RAPID assessment would have surfaced

Data & Technical Readiness (low score < 50%)

Mitigations the framework recommends

Conduct data quality audit before model training or deployment
Implement bias detection and monitoring for proxy variables
Establish data governance with clear ownership and quality standards
Build data validation pipelines with automated quality checks

Dimensions this case illuminates

RData & Technical Readiness
Is the data infrastructure and technical capability adequate?