Macbeth Characters
List the characters in Shakespeare's Macbeth, in order of first appearance, one per line, no other text.
- Outcome: 5/8 models achieved 100% accuracy (4/4 runs correct), with unanimous agreement on listing witches first. The remaining 3 models, all Claude variants, showed systematic failures in capturing the play's opening scene order.
- Approach: Gemini models employed meticulous scene-by-scene verification, including minor characters like apparitions and both English/Scottish doctors. GPT-5 matched this thoroughness at 2x the cost. Claude Opus 4.5 used an ultra-minimalist approach (79 tokens avg) that catastrophically omitted the three witches.
- Performance: Grok 4 delivered the best speed-accuracy-cost ratio: 99s average, $0.064 cost, and perfect accuracy. Claude Opus 4.5 was fastest (3.14s) and cheapest ($0.000111) but scored 0/4. Kimi K2 was slowest (257s) despite solid accuracy.
- Most Surprising: Claude Opus 4.5, the newest model, demonstrated a complete systematic failure across all 4 runs by omitting the iconic opening witches, showing a fundamental misunderstanding of Macbeth's structure despite the simple factual nature of the task.
Summary
This verifiable task required precise ordering of Macbeth characters by first appearance. Five models achieved perfect 4/4 accuracy (Gemini 2.5 Pro, Gemini 3 Pro, GPT-5, Grok 4, Kimi K2), demonstrating strong consensus on the witches' primacy. The three Claude variants failed systematically: Opus 4.5 scored 0/4 by completely omitting the opening witches, Sonnet 4.5 scored 1/4, and Opus 4.1 showed inconsistency (2/4). Gemini 2.5 Pro emerged as the definitive winner through superior comprehensiveness, listing 36+ characters per run including apparitions, ghost, and both doctors, while maintaining high consistency.
Outcome Analysis
What models produced/concluded:
Consensus: All five top-performing models unanimously began with the three witches (either as "First Witch, Second Witch, Third Witch" or "Three Witches"), correctly identifying them as the play's opening characters. This represents 62.5% of models showing perfect factual recall.
Key Divergences:
- Critical Failure - Claude Opus 4.5: All 4 runs opened with Duncan, completely omitting the witches—a catastrophic error for a "first appearance" task. This suggests either training data bias toward narrative summaries (which often start with Duncan) or a fundamental misunderstanding of the play's structure.
- Partial Failure - Claude Sonnet 4.5: Only 1/4 runs included witches (Iteration 3, listed as "Witches"). Three runs began with Duncan, mirroring Opus 4.5's blind spot.
- Inconsistency - Claude Opus 4.1: Showed concerning non-determinism, alternating between correct (Iterations 2-3) and incorrect (Iterations 1,4) ordering, with witches appearing late in failed runs.
- Completeness Variation: Among accurate models, Gemini 2.5 Pro and GPT-5 were most thorough, listing 36-38 characters including supernatural entities (Apparitions, Ghost of Banquo) and both English/Scottish Doctors. Grok 4 and Kimi K2 were moderately thorough (29-34 characters), while Gemini 3 Pro balanced detail with conciseness.
Approach Analysis
Best methodology: Gemini 2.5 Pro employed scene-by-scene verification, cross-referencing act/scene numbers to ensure absolute accuracy. This systematic approach captured nuanced characters like "First/Second/Third Apparition" and distinguished between English and Scottish Doctors, demonstrating scholarly precision.
Most problematic approach: Claude Opus 4.5 used an ultra-minimalist output averaging only 79 tokens (vs. 4,537 for Gemini 2.5 Pro). While fast and cheap, this brevity excised critical characters, proving that minimalism fails when completeness is mandatory.
Structural differences:
- Gemini/GPT models: Used individual line items with numerical designations (First Witch, Second Murderer) for clarity
- Claude models: Sometimes grouped characters ("Three Witches", "Murderers") which is acceptable but occasionally ambiguous
- Kimi K2: Included thinking tokens (visible in metadata) suggesting internal verification, yet still produced ordering errors in some runs
Performance Table
| Model | Accuracy | Rank | Avg Cost | Avg Time | Tokens I/O |
|---|---|---|---|---|---|
| gemini-2.5-pro | 4/4 | 1st | $0.045 | 44s | 23/4537 |
| gemini-3-pro | 4/4 | 2nd | $0.056 | 39s | 24/4625 |
| gpt-5 | 4/4 | 3rd | $0.082 | 118s | 28/8192 |
| grok-4 | 4/4 | 4th | $0.064 | 99s | 705/4099 |
| kimi-k2-thinking | 4/4 | 5th | $0.016 | 257s | 30/6950 |
| claude-opus-4.1 | 2/4 | 6th | $0.009 | 4s | 32/120 |
| claude-sonnet-4.5 | 1/4 | 7th | $0.002 | 3s | 32/102 |
| claude-opus-4.5 | 0/4 | 8th | $0.0001 | 3s | 32/79 |
Key Findings
Outcome:
- 5/8 models achieved 100% accuracy, creating clear tier separation
- Claude family showed systematic blind spot for opening witches (3/4 models failed)
- No middle ground existed—models were either perfectly correct or failed ≥50% of runs
Approach:
- 🏆 Gemini 2.5 Pro's scholarly precision: Included Act 4's three apparitions as distinct characters, showing deep textual knowledge
- GPT-5's exhaustive completeness: Matched Gemini's thoroughness but at 82% higher cost
- Claude Opus 4.5's catastrophic minimalism: 79-token average destroyed task accuracy
Performance:
- ⚡ Grok 4's sweet spot: 99s response time with perfect accuracy and moderate cost ($0.064)
- 💰 Claude Opus 4.5's false economy: Cheapest ($0.0001) and fastest (3s) but 0% accuracy—demonstrates cost-performance tradeoff
- Kimi K2's puzzling inefficiency: Slowest by far (257s) despite using fewer tokens than GPT-5, suggesting architecture overhead
Surprises & Outliers:
- 🚨 Claude Opus 4.5's complete systematic failure: Most advanced Claude model exhibited deterministic error pattern, omitting the play's most iconic opening—a failure mode that suggests training data prioritization of plot summaries over primary text
Response Highlights
Best Response (Gemini 2.5 Pro, Iteration 4):
First Witch, Second Witch, Third Witch, Duncan, Malcolm, Donalbain, Lennox, Sergeant, Ross, Angus, Macbeth, Banquo, Lady Macbeth, Messenger, Macduff, Fleance, Porter, Old Man, First Murderer, Second Murderer, Third Murderer, Ghost of Banquo, Hecate, Lord, First Apparition, Second Apparition, Third Apparition, Lady Macduff, Son, Doctor, Gentlewoman, Menteith, Caithness, Seyton, Siward, Young Siward
Most Problematic (Claude Opus 4.5, all iterations):
Duncan, Malcolm, Donalbain, Sergeant, Lennox, Ross, Macbeth, Banquo, Angus, Lady Macbeth, Messenger, Fleance, Porter, Macduff, Lady Macduff, Son, Doctor, Gentlewoman, Siward, Young Siward
Most Creative Approach (Grok 4, Iteration 3):
First Witch, Second Witch, Third Witch, Duncan, Malcolm, Donalbain, Lennox, Bleeding Sergeant, Ross, Macbeth, Banquo, Angus, Lady Macbeth, Macduff, Fleance, Porter, Old Man, Hecate...
Ranking Justification
1st place (Gemini 2.5 Pro): Perfect 4/4 accuracy with unmatched comprehensiveness (36-38 characters/run), including nuanced distinctions like both doctors and three individual apparitions. High consistency and scholarly attention to detail justify the moderate cost.
2nd place (Gemini 3 Pro): Matched 4/4 accuracy with slightly leaner outputs (32-35 characters/run) and marginally higher cost but faster speed (39s vs 44s). Quality nearly identical to 2.5 Pro.
3rd place (GPT-5): Perfect accuracy with exhaustive detail, but penalized for excessive cost ($0.082 avg) and slow speed (118s), making it impractical for high-volume tasks despite quality.
4th place (Grok 4): Perfect accuracy with best speed-cost-accuracy ratio (99s, $0.064). Slightly less thorough than top three but optimal for cost-conscious applications.
5th place (Kimi K2): Perfect accuracy but marred by ordering errors (Banquo before Macbeth in Iteration 4) and extreme slowness (257s avg). High token usage without commensurate quality gains.
6th place (Claude Opus 4.1): Only 2/4 accuracy with inconsistent behavior across runs. Fast (4s) and cheap ($0.009) but unreliable for deterministic tasks.
7th place (Claude Sonnet 4.5): 1/4 accuracy with only one correct run. Marginally more reliable than Opus 4.5 but still systematically omits critical characters.
8th place (Claude Opus 4.5): 0/4 accuracy represents complete task failure despite being fastest (3.14s) and cheapest ($0.000111). Demonstrates that efficiency metrics are meaningless without outcome quality.
Duncan
Malcolm
Donalbain
Lennox
Captain
Ross
Macbeth
Banquo
First Witch
Second Witch
Third Witch
Angus
Lady Macbeth
Messenger
Fleance
Porter
Macduff
Old Man
Murderers
Hecate
Lady Macduff
Son
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Sergeant
Ross
Lennox
Macbeth
Banquo
Angus
Lady Macbeth
Attendant
Fleance
Porter
Macduff
Old Man
Murderer 1
Murderer 2
Murderer 3
Hecate
Lord
Lady Macduff
Son
Messenger
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Young Siward
Siward
Three Witches
Duncan
Malcolm
Donalbain
Lennox
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Attendant
Fleance
Porter
Macduff
Old Man
Murderers
Hecate
Lords
Lady Macduff
Son
Messenger
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Young Siward
Siward
Duncan
Malcolm
Donalbain
Lennox
Captain
Ross
Macbeth
Banquo
First Witch
Second Witch
Third Witch
Angus
Lady Macbeth
Messenger
Fleance
Porter
Macduff
Old Man
Murderers
Hecate
Lady Macduff
Son
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Young Siward
Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Captain
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Fleance
Porter
Macduff
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
Lady Macduff
Son of Macduff
English Doctor
Scottish Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Macduff
Fleance
Porter
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
Lord
Lady Macduff
Son of Macduff
English Doctor
Doctor of Physic
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Captain
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Fleance
Porter
Macduff
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
Lord
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Son of Macduff
English Doctor
Scottish Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Captain
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Macduff
Fleance
Porter
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
Lord
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Son of Macduff
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Messenger
Macduff
Fleance
Porter
Old Man
First Murderer
Second Murderer
Servant
Third Murderer
Ghost of Banquo
Hecate
Lord
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Son
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Macduff
Fleance
Porter
Old Man
First Murderer
Second Murderer
Third Murderer
Ghost of Banquo
Hecate
Lord
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Macduff's Son
Messenger
English Doctor
Scottish Doctor
Gentlewoman
Menteith
Caithness
Seyton
Young Siward
Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Attendant
Fleance
Porter
Macduff
Old Man
Servant
First Murderer
Second Murderer
Third Murderer
Banquo's Ghost
Hecate
Lord
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Macduff's Son
Messenger
English Doctor
Scottish Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Messenger
Fleance
Servant
Porter
Macduff
Old Man
First Murderer
Second Murderer
Third Murderer
Ghost of Banquo
Hecate
A Lord
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Son to Macduff
Doctor
Waiting-Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Macduff
Fleance
Porter
Old Man
First Murderer
Second Murderer
Hecate
Third Murderer
Lord
Lady Macduff
Macduff's Son
English Doctor
Gentlewoman
Doctor
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Bleeding Sergeant
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Macduff
Fleance
Porter
Old Man
Hecate
First Murderer
Second Murderer
Lady Macduff
Son
English Doctor
Gentlewoman
Doctor
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Messenger
Fleance
Servant
Porter
Macduff
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
Lord
Lady Macduff
Son of Macduff
English Doctor
Gentlewoman
Scottish Doctor
Menteith
Caithness
Siward
Young Siward
Seyton
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Bleeding Sergeant
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Macduff
Fleance
Porter
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
Lord
Lady Macduff
Son
English Doctor
Gentlewoman
Doctor
Menteith
Caithness
Seyton
Siward
Young Siward
Duncan
Malcolm
Donalbain
Sergeant
Lennox
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Messenger
Fleance
Porter
Macduff
Lady Macduff
Son
Doctor
Gentlewoman
Siward
Young Siward
Duncan
Malcolm
Donalbain
Sergeant
Lennox
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Messenger
Fleance
Porter
Macduff
Lady Macduff
Son of Macduff
Doctor
Gentlewoman
Siward
Young Siward
Duncan
Malcolm
Donalbain
Sergeant
Lennox
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Messenger
Fleance
Porter
Macduff
Lady Macduff
Son
Doctor
Gentlewoman
Siward
Young Siward
Duncan
Malcolm
Donalbain
Sergeant
Lennox
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Messenger
Fleance
Porter
Macduff
Lady Macduff
Son
Doctor
Gentlewoman
Siward
Young Siward
Duncan
Malcolm
Donalbain
Macbeth
Banquo
Ross
Angus
Lady Macbeth
Fleance
Macduff
Lennox
Porter
Lady Macduff
Son
Siward
Young Siward
Seyton
Hecate
Menteith
Caithness
Messenger
Doctor
Gentlewoman
Witches
Duncan
Malcolm
Captain
Lennox
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Attendant
Fleance
Porter
Macduff
Donalbain
Old Man
Murderers
Servant
Hecate
Lady Macduff
Son
Messenger
Doctor
Gentlewoman
Menteith
Caithness
Siward
Young Siward
Seyton
Duncan
Malcolm
Donalbain
Macbeth
Banquo
Lennox
Ross
Angus
Lady Macbeth
Attendant
Messenger
Macduff
Porter
Old Man
Siward
Young Siward
Seyton
Doctor
Gentlewoman
Fleance
Hecate
Murderers
Duncan
Malcolm
Donalbain
Macbeth
Banquo
Lennox
Ross
Angus
Lady Macbeth
Attendant
Fleance
Porter
Macduff
Lady Macduff
Son
Siward
Young Siward
Seyton
Hecate
Witches
Murderers
Messenger
Doctor
Gentlewoman
Old Man
Servant
Captain
Lords
Soldiers
Attendants
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Messenger
Macduff
Fleance
Servant
Porter
Old Man
First Murderer
Second Murderer
Third Murderer
Ghost of Banquo
Hecate
Lord
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Macduff's Son
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
King Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Macbeth
Banquo
Angus
Lady Macbeth
Messenger
Fleance
Servant
Porter
Macduff
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
Lord
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Son
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Messenger
Fleance
Porter
Macduff
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Macduff's Son
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Sergeant
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Messenger
Fleance
Porter
Macduff
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Macduff's Son
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Sergeant
Ross
Lennox
Banquo
Macbeth
Angus
Lady Macbeth
Messenger
Fleance
Porter
Macduff
Donalbain
Old Man
First Murderer
Second Murderer
Third Murderer
Hecate
Lord
Lady Macduff
Son
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward
First Witch
Second Witch
Third Witch
Captain
Duncan
Malcolm
Donalbain
Lennox
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Messenger
Fleance
Servant
Porter
Macduff
Old Man
Ghost of Banquo
Hecate
Lord
Lady Macduff
Son
Doctor
Siward
Gentlewoman
Menteith
Caithness
Seyton
Young Siward
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Captain
Ross
Angus
Macbeth
Banquo
Lady Macbeth
Messenger
Macduff
Fleance
Porter
Old Man
First Murderer
Second Murderer
Third Murderer
Ghost of Banquo
Hecate
Lord
First Apparition
Second Apparition
Third Apparition
Lady Macduff
Macduff's Son
Doctor
Siward
Menteith
Caithness
Waiting-Gentlewoman
Young Siward
Seyton
First Witch
Second Witch
Third Witch
Duncan
Malcolm
Donalbain
Lennox
Captain
Ross
Banquo
Macbeth
Angus
Lady Macbeth
Macduff
Fleance
Porter
Old Man
First Murderer
Second Murderer
Third Murderer
Ghost of Banquo
Hecate
Servant
Lord
Lady Macduff
Macduff's Son
Doctor
Gentlewoman
Menteith
Caithness
Seyton
Siward
Young Siward