You vs ChatGPT — Are You Smarter Than AI?
AI Blind Spot Challenge · Free · No Signup
🧠
You
VS
🤖
ChatGPT

Can You Spot What ChatGPT Misses?

A fast reasoning challenge built around classic AI blind spots: letter counting, hidden assumptions, math traps, and one visual pattern puzzle.

🤖 GPT-4o's documented score on these questions: 1 / 10
10
Questions
4
Categories
1/10
AI's Score
Free
Always

Why Does ChatGPT Fail These Questions?

AI language models like ChatGPT (GPT-4o) are trained to predict the next token in a sequence — not to reason from first principles. This makes them surprisingly bad at tasks that seem simple to humans:

What Is ARC-AGI?

ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence) was designed by deep learning pioneer François Chollet as a benchmark for general intelligence — not just memorized knowledge. Each puzzle shows a few input→output grid examples, and you must infer the rule to complete a new test case.

When ARC-AGI-2 launched in March 2025, every frontier AI model — GPT-4o, Claude 3.7, Gemini 2.0 Flash — scored between 0% and 1.3%. The human average is 60%. Only with extremely expensive multi-attempt scaffolding (costing $30–$77 per question) did AI systems approach human performance. The ARC-AGI dataset is released under the Apache 2.0 license by the ARC Prize Foundation.

How Do You Score vs. ChatGPT?

GPT-4o (without extended thinking) scores approximately 1/10 on the questions in this challenge, based on documented research and widely reported failure cases from 2024–2025. The one question it reliably gets right (the 28-day riddle) is now well-known enough that it appears in GPT's training data. On novel variants of all the other questions, failure rates range from 40–100%.

Most humans who approach these questions carefully score 6–8/10. The ARC puzzle and the "portrait" logic riddle are the hardest — they trip up humans and AI alike, though for different reasons.

Want to Level Up Your Reasoning?

NerdSip teaches logic, cognitive biases, and AI literacy through 5-minute gamified micro-lessons. No fluff. Just the good stuff.

Download NerdSip Free

Try the Full ARC-AGI Test

5 original ARC-AGI-style puzzles — the benchmark that stumps frontier AI. Humans average 60%. Base AI scores ~0%. Free.