Generative AI · Strategy Case Study·Retail · Women's Fashion E-commerce

CustomerFeedback Intelligence

Turning thousands of unstructured customer reviews into same-day operational decisions — what to fix, how urgent it is, and who should act.

MIT Professional Education · Applied AI & Data Science

Scroll

01 / The Challenge

It isn't volume.
It's ambiguity.

A retailer's reviews are full of mixed signals — praise and problems in the same breath. Traditional sentiment analysis flattens them to "neutral," and the most fixable issues stay buried in the text.

of mid-rated reviews carry contrasting positive & negative signals

still recommend the product — while reporting a real problem

“

I love this dress — but the brand runs large. I'd order a size down.

+ love the design – sizing runs large

A 3-class model reads this as neutral — and routes it nowhere. The sizing signal, and the customer, are lost.

→ 01

Fragmentation

Separate NLP models for sentiment, classification, summarisation — each with its own data and upkeep.

→ 02

Complexity

Cost and latency compound when review volume triples during seasonal peaks.

→ 03

Delay

Insight arrives in periodic reports, days after the signal — too late to act on.

02 / The Insight

Same rating.
Different customers.

Clustering on behaviour — not stars — reveals four customer types that traditional review metrics fail to distinguish. A single response strategy can't serve them all.

Silhouette · k=4
K-Means clustering

Segments were derived from behavioural signals including review length, recommendation patterns, and sentiment conflict revealing actionable differences that rating averages alone could not capture

+ Feature engineering

Highest leverage

~2%

Power Reviewers

Detailed, trusted reviews other buyers read. Every response amplifies brand trust.

SignalLong, influential, mixed ratings — a leading indicator of market perception.

ResponseHuman-in-the-loop on every output. Treat as brand-critical.

+ Response strategy

Revenue risk

~18%

Vocal Critics

Active churn risk — but their complaints concentrate on specific, fixable issues.

SignalLow ratings, near-zero recommendation, highest conflict rate.

ResponseFast recovery on Critical/Actionable items; root-cause the recurring fault.

+ Response strategy

Growth opportunity

~36%

Detailed Advocates

Persuasive reviews that drive new-customer acquisition. Underused as an asset.

SignalHigh ratings, 100% recommend, long detailed reviews.

ResponseSurface to marketing; amplify and feed acquisition.

+ Response strategy

Stable base

~44%

Quick Endorsers

High satisfaction, low engagement depth. No urgent action — but easy to take for granted.

SignalShort positive reviews, low feedback count.

ResponseMonitor only; protect the experience that keeps them loyal.

+ Response strategy

The highest-value insight isn't from angry customers. It's from the ones who still recommend — while telling you exactly what to fix.

03 / The Solution

One review in.
Five decisions out.

A single GenAI pass replaces a fragmented stack of models — and returns outputs teams can act on, not just analyse.

● Live transform · select a review

Input · raw review

"I love this top and was so happy when it came out in black. I'm typically an 8 on top and sized up to a 10, and it still feels slightly small across the shoulders and arms. It also requires an under tank because it's very sheer, even with the lining. Worth it though!"

★ 4 / 5Recommends: Yes

Output · decision-ready

Sentiment

Four-label, incl. mixed-negative — the nuance 3-class systems miss.

Summary

The customer's core message in ≤25 words.

Message

A brand-toned, specific reply ready for the customer.

Insight + urgency

Root cause, owning team, action, and a four-tier urgency tag.

04 / The Operating Model

Every signal routed to
the team that owns it.

The real shift isn't the model — it's the redesigned flow around it. Each review is tagged by urgency and sent to the right team with a concrete action. Tap a tier.

Tier 01Critical

OwnerCustomer Service

Defective or wrong item received. Immediate recovery — escalate within hours before the relationship is lost.

→

Tier 02Actionable

OwnerProduct & Merchandising

Recurring product issue — sizing, fit, durability. Fix the root cause to cut returns at the source.

→

Tier 03Monitor

OwnerProduct Analytics

Isolated issue, not yet a pattern. Track for emerging trends before it scales.

→

Tier 04Insight-Only

OwnerMarketing

Positive feedback and advocacy. Leverage for acquisition and social proof.

→

⊞

Human-in-the-loop, by design. A validation gate sits before any automation — CX teams confirm sentiment, urgency and tone, logging every correction as a labelled example. The system proposes; people decide. Trust is earned before scale.

05 / The Value

From reactive support to proactive decisions.

→

Decision latency, collapsed.

Feedback-to-insight drops from days or weeks to same-day — early enough to act before issues scale.

→

Configuration-driven scalability.

New categories, departments, or classification schemes can be introduced through prompt and workflow configuration

→

Brand voice, at scale.

Every reply reflects the customer's specific experience while holding a consistent tone across thousands of responses.

→

A cost structure that invites experiments.

Testing a new category is effectively free — no budget-approval cycle to learn something.

Processing economics

Per 1,000 reviews$0.24

Full dataset · ~23K reviews~$5

Annual API ceiling<$100

The real investment is operational — embedding the workflow into teams — not the technology.

48%Fit & sizing

48% Fit & sizing

24% Fabric / material

18% Design

9% Quality

The system's own output names the primary cost driver: fit & sizing — the single highest-leverage fix in apparel retail.

06 / Under the Hood

Built for production, not demos.

Six prompt configurations — zero-shot, few-shot and chain-of-thought, each in baseline and business-aware versions — were scored against two LLM-as-judge rubrics.

Few-shot won on consistency, not peak score. Chain-of-thought reached the highest single scores but proved unstable under strict, business-aligned evaluation — the wrong trade-off for high-volume production. Choosing the reliable option over the flashy one is the call that matters.

Download the full case study (PDF) ↓

Recommendation-prediction accuracy on a held-out task the model was never optimised for.

SD 0

Lowest variance of any configuration tested — the basis for the production choice.

CustomerFeedback Intelligence

It isn't volume.It's ambiguity.

Fragmentation

Complexity

Delay

Same rating.Different customers.

One review in.Five decisions out.

Sentiment

Category

Summary

Message

Insight + urgency

Every signal routed tothe team that owns it.

From reactive support to proactive decisions.

Decision latency, collapsed.

Configuration-driven scalability.

Brand voice, at scale.

A cost structure that invites experiments.

Built for production, not demos.

It isn't volume.
It's ambiguity.

Same rating.
Different customers.

One review in.
Five decisions out.

Every signal routed to
the team that owns it.