Why We Built Free Learning Tools
NerdSip exists to make learning effortless. Our app delivers bite-sized courses on science, psychology, history, and hundreds of other topics. But we also believe that some of the best learning moments happen outside of any app — during a study break, a slow afternoon, or a moment of curiosity.
That's why we built these tools. They're completely free, require no account, and work instantly in any browser. Each one is designed around a simple idea: make it absurdly easy to learn something new or study more effectively.
How These Tools Help You Learn
Random Fact Generator
Our fact generator pulls from the same curated knowledge base that powers our 3,000+ micro-lessons. Every fact is verified, fascinating, and designed to stick. Research on the spacing effect shows that encountering information in short, random bursts over time leads to significantly better long-term retention than concentrated study. One fact at a time, day after day, builds genuine breadth of knowledge.
General Knowledge Test
The testing effect is one of the most robust findings in learning science: taking a quiz is more effective for retention than re-reading material. Our knowledge test doesn't just measure what you know — it actively strengthens your memory of the topics covered. The category breakdown helps you identify blind spots, so you know exactly where to focus your learning.
Pomodoro Focus Timer
The Pomodoro Technique is backed by decades of cognitive science research. Brief, timed work intervals with regular breaks maintain peak focus and prevent the diminishing returns of marathon sessions. Our twist: during every break, you learn a new brain or science fact. Over a typical study day, that's 8-12 new facts you pick up without any extra effort.
Cognitive Bias Detector
Cognitive biases are systematic errors in thinking that affect every decision we make. Psychologists Daniel Kahneman and Amos Tversky identified dozens of these biases through decades of research, earning Kahneman a Nobel Prize in 2002. Our detector presents real-world scenarios and challenges you to identify which bias is at play. Recognizing biases in controlled settings trains you to spot them in your own daily decisions — from anchoring effects in salary negotiations to sunk cost fallacies in project management.
AI Benchmarks: How We Measure Machine Intelligence
Two of our tools — You vs ChatGPT and ARC-AGI Test — are built around the benchmarks that researchers use to measure AI reasoning. Understanding how these benchmarks evolved helps explain why certain questions still trip up the most advanced AI systems.
The Evolution of AI Benchmarks
AI evaluation has undergone a dramatic shift in just a few years. Early benchmarks focused on knowledge recall — could a model answer factual questions? As models mastered those, researchers moved to harder challenges:
- MMLU (2020) — 57 academic subjects, from elementary math to professional law. GPT-4 scored 86.4% when it launched. By 2025, frontier models routinely hit 90%+. MMLU is now considered saturated.
- GPQA Diamond (2023) — Graduate-level questions written by domain PhDs, hard enough that other PhDs outside the specialty score only ~34%. Claude 3.5 Sonnet scored 59.4%, approaching expert-level performance.
- SWE-bench Verified (2024) — Real GitHub issues from production codebases. Tests whether AI can actually fix bugs in complex software. This shifted benchmarking from "can it answer questions" to "can it do real work."
- ARC-AGI (2019, updated 2024) — Visual pattern puzzles designed by François Chollet to test fluid intelligence. Each puzzle requires learning a new rule from just two examples. Base language models scored near 0% for years. ARC-AGI-2 (2025) raised the bar further with harder novel patterns.
- FrontierMath (2024) — Original research-level mathematics problems. Even the best models solve less than 2% — a humbling reminder that genuine mathematical reasoning remains beyond current AI.
Why Some Questions Still Stump AI
Our "You vs ChatGPT" challenge is built around documented failure modes — not obscure trivia, but systematic weaknesses in how language models process information. Tokenization causes counting errors (the infamous "how many R's in strawberry"). Chain-of-thought reasoning fails on problems with deliberate misdirection. These aren't bugs that will be patched; they reflect fundamental architectural choices in how models represent language.
ARC-AGI tests a different dimension entirely: the ability to learn a novel concept from minimal examples and generalize it. Humans do this effortlessly — a child seeing two examples of a grid transformation can often infer the rule. AI models that rely on pattern-matching from trillions of training tokens struggle precisely because ARC puzzles are designed to be unlike anything in the training data.
Where We Are in March 2026
The landscape is shifting fast. OpenAI's o3 system (with high compute) scored 87.5% on ARC-AGI-1 in late 2024, proving that with enough reasoning budget, frontier models can tackle novel pattern tasks. Claude Opus 4.6 and GPT-5.4 have pushed GPQA scores past 70%. But ARC-AGI-2 remains largely unsolved, and FrontierMath continues to expose the gap between statistical pattern matching and genuine mathematical reasoning.
Our tools let you experience these benchmarks firsthand. When you solve an ARC puzzle that stumped GPT-4 for years, you're demonstrating a form of intelligence that AI researchers are still trying to replicate.
The Science Behind Microlearning
All of our tools are built on the same principles that power the NerdSip app:
- Spacing effect — Small doses of information spread over time beat cramming every time
- Testing effect — Retrieving information from memory strengthens the neural pathways to that memory
- Curiosity-driven learning — You remember what you're genuinely interested in, which is why random facts work so well
- Interleaving — Mixing topics (science, history, psychology) in a single session improves overall retention
- Metacognition — Recognizing your own thinking patterns (like cognitive biases) improves decision-making across every domain
- Cognitive breaks — Brief diversions from focused work dramatically improve sustained attention
Explore More NerdSip Learning Paths
The tools are the quickest way to start. For deeper learning, use the main NerdSip hubs below.
Go Deeper With NerdSip
These tools give you the appetizer. The app gives you the full meal: 500+ interactive courses, AI-generated lessons on any topic, XP and leveling, and a community of curious minds.
Download NerdSip Free