CASE-040·SuccessTier 1Professional Services/Consulting

Case · 2023

BCG (Internal)

Controlled experiment with BCG consultants using GPT-4 for 18 tasks

Maturity stage

Pilot/POC

Use-case type

Augmentation

Function

Strategy/Executive

Company size

Large

Evidence

12.2% more tasks; 25.1% faster; 40% higher quality

ROI / outcome figure

12.2% more tasks; 25.1% faster; 40% higher quality

Deep dive

The setup

BCG and Harvard ran a controlled experiment with 758 BCG consultants on 18 realistic consulting tasks (drafting, structuring, comparison, analysis), randomising access to GPT-4.

What happened

Consultants with AI completed 12.2% more tasks, 25.1% faster, and at 40% higher quality. The effect was largest below the experts' median performance and disappeared on tasks at the edge of GPT-4's capability ('jagged frontier').

Root cause

The clearest evidence to date that GenAI is a great equaliser inside its capability frontier and a great trap outside it. Consultants who worked tasks where GPT-4 had a known failure mode performed worse with AI than without.

Takeaway for teams considering similar work

RAPID's Alignment dimension surfaces this risk: AI initiatives without crisp scope are likely to drift across the jagged frontier, where outcomes become unpredictable. The takeaway for ops leaders is to build the deployment around the tasks where the model is reliably useful, then expand only after that envelope is well-mapped. Anchors most drafting / structuring / comparison time-saved figures in the Task Mapper.

Why this case is cited as evidence

  • IMeasurement Maturity

    Are there established KPIs, baselines, and evaluation cadences?

Apply this to your team

Take the RAPID assessment to see whether your organisation is exposed to the same failure modes as this case - or already has the discipline that made it work.

Take the assessment