Case · 2023
BCG (Internal)
Controlled experiment with BCG consultants using GPT-4 for 18 tasks
Maturity stage
Pilot/POC
Use-case type
Augmentation
Function
Strategy/Executive
Company size
Large
Evidence
12.2% more tasks; 25.1% faster; 40% higher quality
ROI / outcome figure
12.2% more tasks; 25.1% faster; 40% higher quality
Deep dive
The setup
BCG and Harvard ran a controlled experiment with 758 BCG consultants on 18 realistic consulting tasks (drafting, structuring, comparison, analysis), randomising access to GPT-4.
What happened
Consultants with AI completed 12.2% more tasks, 25.1% faster, and at 40% higher quality. The effect was largest below the experts' median performance and disappeared on tasks at the edge of GPT-4's capability ('jagged frontier').
Root cause
The clearest evidence to date that GenAI is a great equaliser inside its capability frontier and a great trap outside it. Consultants who worked tasks where GPT-4 had a known failure mode performed worse with AI than without.
Takeaway for teams considering similar work
RAPID's Alignment dimension surfaces this risk: AI initiatives without crisp scope are likely to drift across the jagged frontier, where outcomes become unpredictable. The takeaway for ops leaders is to build the deployment around the tasks where the model is reliably useful, then expand only after that envelope is well-mapped. Anchors most drafting / structuring / comparison time-saved figures in the Task Mapper.
Why this case is cited as evidence
IMeasurement Maturity
Are there established KPIs, baselines, and evaluation cadences?
More from Professional Services/Consulting
Industry deep-diveApply this to your team
Take the RAPID assessment to see whether your organisation is exposed to the same failure modes as this case - or already has the discipline that made it work.