BCG (Internal)

Controlled experiment with BCG consultants using GPT-4 for 18 tasks

Read primary source

Maturity stage

Pilot/POC

Use-case type

Augmentation

Function

Strategy/Executive

Company size

Large

Evidence

12.2% more tasks; 25.1% faster; 40% higher quality

ROI / outcome figure

12.2% more tasks; 25.1% faster; 40% higher quality

Deep dive

The setup

BCG and Harvard ran a controlled experiment with 758 BCG consultants on 18 realistic consulting tasks (drafting, structuring, comparison, analysis), randomising access to GPT-4.

What happened

Consultants with AI completed 12.2% more tasks, 25.1% faster, and at 40% higher quality. The effect was largest below the experts' median performance and disappeared on tasks at the edge of GPT-4's capability ('jagged frontier').

Root cause

The clearest evidence to date that GenAI is a great equaliser inside its capability frontier and a great trap outside it. Consultants who worked tasks where GPT-4 had a known failure mode performed worse with AI than without.

Takeaway for teams considering similar work

RAPID's Alignment dimension surfaces this risk: AI initiatives without crisp scope are likely to drift across the jagged frontier, where outcomes become unpredictable. The takeaway for ops leaders is to build the deployment around the tasks where the model is reliably useful, then expand only after that envelope is well-mapped. Anchors most drafting / structuring / comparison time-saved figures in the Task Mapper.

Why this case is cited as evidence

IMeasurement Maturity
Are there established KPIs, baselines, and evaluation cadences?

More from Professional Services/Consulting

Industry deep-dive

Accenture · CASE-030 · Success

550K of ~780K staff trained on generative AI (AI Refinery, SynOps); AI usage monitored as a promotion input

Apply this to your team

Take the RAPID assessment to see whether your organisation is exposed to the same failure modes as this case - or already has the discipline that made it work.

Take the assessment