Question 1

Are you smarter than ChatGPT?

Accepted Answer

On certain types of questions — yes, most humans are smarter than ChatGPT. AI language models systematically fail at character counting (e.g., 'how many R's in strawberry'), novel CRT math puzzles, logic riddles with hidden assumptions, and ARC-AGI visual pattern recognition. GPT-4o (without extended thinking) scores around 1/10 on the 10 challenges in this test.

Question 2

What is ARC-AGI and why does AI fail it?

Accepted Answer

ARC-AGI (Abstraction and Reasoning Corpus) is a benchmark designed by François Chollet to test general intelligence — the ability to learn new rules from a few examples. Base frontier AI models score near 0% on ARC-AGI-2 (launched March 2025), while humans average 60%. AI fails because it relies on pattern memorization from training data, not genuine on-the-fly reasoning.

Question 3

Why does ChatGPT get the strawberry question wrong?

Accepted Answer

ChatGPT processes text as 'tokens' (chunks of characters), not individual letters. 'Strawberry' may be tokenized as 'straw' + 'berry', hiding its internal character composition. This is why AI language models systematically fail at counting specific letters — they don't actually 'see' individual characters the way humans do.

Can You Spot What ChatGPT Misses?

You scored 0 / 10

Why Does ChatGPT Fail These Questions?

What Is ARC-AGI?

How Do You Score vs. ChatGPT?

Want to Level Up Your Reasoning?

Try the Full ARC-AGI Test