Hey, have you ever thought about buying a product without searching for it? Ordering food without doing anything? What if your homework was done in seconds? Or your dinner reservation just magically appeared on your calendar with no clicks, no typing, no asking? Sounds wild, right? Like some sci-fi magic? Nope. Not anymore. ReAct is the AI method that makes all of this possible. And it’s not just theory, it’s already happening.
Before we dive into the details, let’s rewind a bit and set the stage. You’ve probably heard of ChatGPT by now. It’s part of a new generation of AI models that are everywhere. These are called large language models (LLMs), and they’re trained on mountains of text, kind of like how we learn in school by reading and absorbing information. In simpler terms: it’s like having a brain made of data. One that’s good at solving problems, answering questions, and now, even making decisions.
Think Like a Human, Act Like One Too
Imagine you’re in the kitchen, trying to whip up a meal. You’ve chopped the veggies, and your brain kicks in with, “Now that everything’s ready, I should heat the water.” Then you check the spice rack and realize you’re out of salt. Do you panic? Nope. You adapt, thinking, “Okay, no salt, let’s throw in some soy sauce and pepper instead.” Just as you’re feeling clever, another thought hits you, “Wait, how do I even make dough?” So naturally, you pull out your phone and search it up. While all this is happening, you’re not just standing around thinking. You’re opening the fridge, flipping through a cookbook, maybe sniffing around to see what’s still good. This whole mental juggling act of thinking through your next move while also doing stuff is how humans handle tasks. It’s messy, flexible, and brilliant. You reason, act, observe, rethink, and repeat. And that’s what helps us survive unpredictable situations, like making dinner with no plan and a half-stocked kitchen.
Reasoning 101: What Happens When AI Talks It Through
So let’s talk about reasoning. Not just giving answers, but actually thinking things through. That’s a big part of how we solve problems, right? We don’t just magically know the answer to everything, we work through it step by step, especially when things get tricky. Researchers wanted language models to do the same thing. To not just know stuff, but to think like us. And that’s where something clever comes in. This is where Chain-of-Thought prompting (CoT) comes in. It’s one of the first big moves in teaching AI to reason like we do. Here is a simple example. You guide the model to solve a math problem by walking through each step of the calculation instead of jumping straight to the answer. It is like showing your work one step at a time, just like in the figure below.

Not Just Smart – Now AI Can Decide Stuff Too
Just like we taught language models to reason through problems, we can also teach them how to make decisions. Not just answering questions, but actually figuring out what to do next in order to reach a goal. Imagine an AI that does more than talk, what if it acts? That means choosing the right sequence of steps, whether it’s navigating a room, booking a table, or solving a task online. Early approaches, like SayCan, tried to go straight from goal to action. It worked sometimes, but often the model would stumble because there was no real plan behind what it was doing. That’s when researchers had a lightbulb moment. What if the model paused for a second and thought before it did anything? That idea turned into frameworks like Inner Monologue, where the AI has an internal voice that reasons through each step before taking action. This shift changed everything. It proved that blending thought and action makes AI way more capable. But it also opened up a new question:- how do you design that back and forth between thinking and doing in the smartest way possible? And that’s exactly where things started to get really interesting.
ReAct: Like ChatGPT But With Legs
To give you a better idea of where all this is going, let’s rewind for a second. I started out talking about how language models can reason, like in the Chain of Thought method. That’s where the model doesn’t just spit out an answer but actually thinks through the steps, kind of like solving a math problem on paper.
Then I leveled it up. Because reasoning is cool, but what if the model needs to do something? That’s where decision-making comes in. I showed how earlier methods like SayCan tried to go straight from goal to action and sometimes with hilarious failures. Then came Inner Monologue, where the model first thinks, then acts. That simple pause to think made a huge difference.
So now we’ve got models that can reason and models that can act. The next question is like what happens when we make them do both at once?
And now this is where things really get exciting. With ReAct (Reasoning + Acting), we can prompt language models to not just think through a task, but also take actions along the way. It’s like giving the model both an inner voice and a pair of hands. ReAct lets the model come up with thoughtful plans before acting and also gather more info from the environment when it needs to think better. It reasons so it can act, and sometimes it acts so it can reason. That back and forth is what makes it feel way more human.
Technical Details
(Feel free to skip this if you’re just here to vibe with the concept)
But if you’re the kind of person who enjoys seeing how the magic works behind the scenes, this part’s for you. We’re about to break down how the ReAct framework actually ticks step by step, no fluff. Think of it like looking at the blueprint after seeing the building.
Consider a general setup of an agent interacting with an environment for task solving. At each time step t, the agent gets an observation oₜ, which is basically what it sees or perceives at that moment.
The agent then takes an action aₜ, chosen from a combined space B, where:
- A is the action space – things the agent can physically or digitally do (like “walk to the fridge” or “open a drawer”)
- L is the language space – internal thoughts or reasoning steps (like “maybe the soda is on the table”)
So, B = A ∪ L means the agent can either act or think at any given step.
The decision about what to do next is made using a policy:
π(aₜ | cₜ) – a fancy way of saying, “pick the next action or thought based on everything that’s happened so far.”
Here, the context cₜ is made up of all previous observations and actions:
cₜ = (o₁, a₁, …, oₜ₋₁, aₜ₋₁)
This policy is powered by a frozen large language model (specifically PaLM, in the ReAct paper).
The model doesn’t learn on the fly. Instead, it’s shown a few in-context examples (handcrafted demonstrations), and then it generates the next output one token at a time, deciding whether to think or act at each step.
Importantly:
- If the agent chooses an action (aₜ ∈ A), the environment responds with a new observation (oₜ₊₁)
- If it chooses a thought (aₜ ∈ L), the environment stays the same – the agent is just thinking, not changing anything externally
From Plan A to Plan Better: ReAct at Work
Now, if you’re wondering what this actually looks like in action, let’s go back to our soda-fetching example but this time with a twist. What if the soda isn’t in the fridge like the model expected? Instead, it’s sitting on the dining table. This is where ReAct really shines. Instead of blindly following a plan and failing, the model can think, adjust, and try again. It reasons, takes an action, observes what happened, and then loops that process until it figures things out. Check out the flow in the figure below to see how the model handles this little soda surprise.
From Language to Logic to Life
And that’s the ride. We started with language models that could think through problems like humans. Chain of Thought gave them that inner monologue. Then we pushed them into decision-making territory, where SayCan and Inner Monologue tried to teach models how to actually do things in the real (or digital) world.
But ReAct changes the game. It’s not just thinking. It’s not just doing. It’s both, bouncing back and forth like a real human trying to figure things out on the fly. It can plan, adjust, reason, act, observe, rethink, and act again. That loop is what makes ReAct feel less like a script and more like a mind.
This isn’t just about solving toy tasks. It’s a blueprint for smarter agents that can explore, adapt, and operate in messy, unpredictable environments like whether that’s ordering dinner, helping with homework, or finding that stubborn soda can.
ReAct shows that giving models both a voice and agency makes them way more powerful. The future of AI isn’t just intelligence. It’s intelligent interaction.
References: