From AlphaGo to chain-of-thought

Rethinking intelligence in LLMs

Feb 18, 2025

⚠️ ᴀᴛᴛᴇɴᴛɪᴏɴ! ᴛʜɪꜱ ᴀʀᴛɪᴄʟᴇ ɪꜱ ᴀ ᴘᴇʀꜱᴏɴᴀʟ ᴏᴘɪɴɪᴏɴ! ⚠️
ʟᴇᴛ’ꜱ ᴄʜᴀɴɢᴇ ɪᴅᴇᴀꜱ ᴛᴏ ɪɴᴄʀᴇᴀꜱᴇ ᴛʜᴇ ᴋɴᴏᴡʟᴇᴅɢᴇ

In 2016, AlphaGo shocked the world by defeating Lee Sedol, one of the greatest Go players of all time. The victory wasn’t just a win in a board game, it was a profound demonstration that artificial intelligence could tackle a search space far beyond human comprehension.

AlphaGo’s success stemmed from learning through self-play, developing an intuition built on millions of experiences. This was real, dynamic intelligence.

Fast forward to today, and we find ourselves in the midst of another revolution in AI: large language models (LLMs).

Critics claim that LLMs are nothing more than “statistical parrots” that merely predict the next word without true understanding.

They argue that, unlike AlphaGo, LLMs haven’t “played” the world—they’ve just absorbed terabytes of text without the richness of sensory experience or active learning.

But recent research on competitive programming with large reasoning models is challenging this narrative.

The Evolution of reasoning in AI
Are LLMs just guessing?
“Real” intelligence
My take
Debate

a group of bubbles floating on top of a body of water — Photo by DL314 Lin on Unsplash

The Evolution of reasoning in AI

A groundbreaking paper titled Competitive Programming with Large Reasoning Models reveals how reinforcement learning (RL) is transforming LLMs from simple text predictors into systems capable of deep reasoning.

Models like OpenAI’s o1 and its successors, o1-ioi and o3, are trained with RL to generate extended chains of thought.

This isn’t just about predicting words; it’s about methodically working through complex problems, verifying solutions by executing code, and even refining their approach based on feedback.

Key breakthroughs

Chain-of-Thought reasoning: Unlike traditional LLMs that output the next word, these models generate a sequence of internal reasoning steps. This resembles a human problem-solver carefully considering each move before arriving at a final decision.
Self-correction through RL: Just as AlphaGo learned strategy through self-play, models like o3 develop emergent strategies during test-time. They might even generate a brute-force solution to validate a more optimized approach, catching potential errors in real-time.
Outperforming domain-specific systems: While o1-ioi was fine-tuned with hand-crafted strategies specifically for competitive programming, the general-purpose o3 surpasses it without any domain-specific tweaks. This suggests that scaling up RL is a robust path to achieving superior reasoning capabilities.

Are LLMs just guessing?

The common refrain—“LLMs just predict words”—misses the nuance of these advancements. Yes, at their core, LLMs operate by predicting text.

But when combined with reinforcement learning, they transcend simple memorization. They start to plan, verify, and iteratively improve, much like a human expert tackling a challenging problem.

Consider competitive programming, a domain that demands not only the ability to write code but also to solve complex algorithmic puzzles under strict constraints.

The research shows that models like o3 achieve performance levels on platforms such as CodeForces that rival elite human competitors.

This is not the behavior of a model that is merely guessing; it’s evidence of a sophisticated internal process that mirrors strategic reasoning.

“Real” intelligence

What exactly is “real” intelligence?

Is it defined solely by sensory experience and interaction with the physical world? Or does it include the profound ability to articulate thoughts, reason abstractly, and communicate effectively through language?
Consider these points:

Sensory vs. Symbolic Intelligence:
– A blind person can still be deeply intelligent, communicating and reasoning without sight.
– My cat, despite lacking language, demonstrates intelligence through her behavior.
Language as a Marker of Intelligence:
– Vacslav Glukhov reminds us that language is not just a tool for communication—it’s a window into our ability to think, reason, and learn.
– In this light, LLMs, which excel in language processing, might be closer to “real” intelligence than we’ve given them credit for.

I challenge you to rethink what you define as intelligence. If an LLM can develop complex internal strategies and self-correct in a way that mirrors human reasoning, can we really say it's just "guessing"? Or is it, in its own way, exhibiting a form of intelligence, one that we’re only beginning to understand?

My take

Don’t get me wrong—I’ve seen enough buzzwords and “magic” claims in AI to last a lifetime.

But dismissing LLMs as mere guessers overlooks the real progress that’s happening under the hood.

The truth is, we’re not stuck in a loop of statistical predictions anymore.

We’re witnessing an evolution where reinforcement learning transforms these models into systems that plan, verify, and even self-correct. This isn’t just incremental improvement—it’s a paradigm shift.

The future of AI isn’t about scaling up data and hoping for magic; it’s about engineering systems that learn to reason, much like a human would. And if that means shedding some of the old myths and embracing a more nuanced reality, so be it.

Debate

I’m not here to feed you another AI bubble.
Instead, I want to spark a debate.

Are we still over-hyping LLMs by reducing them to mere word predictors? Or are we ready to recognize the rising intelligence in these systems, intelligence that’s not only impressive in contests but also capable of solving real-world problems?

Drop your thoughts below.

Let’s challenge the hype and build AI systems that truly work for us, not just on paper, but in the messy, unpredictable real world.

Hyperplane

Discussion about this post