Case · 2018
Amazon
AI resume screening tool that systematically downgraded female candidates
Maturity stage
Abandoned
Use-case type
Analysis
Function
Human Resources/Talent
Company size
Enterprise
Evidence
Encoded gender bias through indirect markers
ROI / outcome figure
Abandoned system; reputational damage
Deep dive
The setup
Amazon built an internal resume-screening model trained on a decade of hiring decisions, intending to scale recruiter throughput across high-volume technical roles.
What happened
The model learned the historical bias in Amazon's own hiring patterns. Resumes containing markers correlated with female applicants - women's college names, certain extracurriculars, the word 'women's' as in 'women's chess club captain' - were systematically downgraded. The system was abandoned before production.
Root cause
A data-readiness failure that masquerades as a model failure. The training data encoded protected-class proxies that no amount of debiasing the model could remove, because the proxies were the signal the model had learned. Resume-screening AI in particular has now produced four documented enterprise failures (Amazon, IBM, iTutorGroup, plus a UW academic audit), enough that the deployment archetype carries a standing warning across this site.
Takeaway for teams considering similar work
RAPID's Readiness dimension scores data-quality and bias-audit discipline. A high R score before building this system would have surfaced the proxy-encoded-bias risk during the data-readiness review, which is the only stage where it is cheap to fix.
What RAPID would have flagged
Failure mode: Data — Poor data quality, bias in training data, or insufficient data volume leading to unreliable AI outputs
Dimensions a pre-deployment RAPID assessment would have surfaced
- Data & Technical Readiness (low score < 50%)
Mitigations the framework recommends
- Conduct data quality audit before model training or deployment
- Implement bias detection and monitoring for proxy variables
- Establish data governance with clear ownership and quality standards
- Build data validation pipelines with automated quality checks
Dimensions this case illuminates
RData & Technical Readiness
Is the data infrastructure and technical capability adequate?
More from Technology/Software
Industry deep-diveFortune 500 Customer Service Provider · CASE-014 · Success
14% productivity increase; 34% improvement for novice workers
Zillow · CASE-017 · Failure
Failed to predict rapid price swings across local markets
ZoomInfo · CASE-032 · Success
33% suggestion acceptance rate; 20% line acceptance rate; 90% of developers report time savings (median 20%); 72% developer satisfaction
Apply this to your team
Take the RAPID assessment to see whether your organisation is exposed to the same failure modes as this case - or already has the discipline that made it work.