AI Agent Projects
Building autonomous AI agents that can plan, use tools, and execute multi-step tasks. Experiments with ReAct, tool-use, and memory patterns.
Overview
Agents are where LLMs stop being autocomplete and start being useful. The question isn't "can the model generate text?" — it's "can the model take actions in the world, recover from failures, and achieve a goal across multiple steps?"
This collection documents my experiments building agents that go beyond toy demos.
Agent Architectures Explored
ReAct (Reasoning + Acting)
The classic pattern. The model alternates between thinking steps ("I need to look up X") and action steps (calling a tool). Implemented from scratch without LangChain first to understand what's really happening, then rebuilt with LangChain for production use.
Key finding: The quality of tool descriptions matters more than the tool implementations. Vague tool descriptions cause the model to hallucinate tool signatures.
Tool-Use Agent with Structured Outputs
Built an agent that browses local file systems, reads CSV data, generates Python code to analyse it, executes the code via a sandboxed subprocess, and returns a summary — all in one chain.
The critical component is structured output: the model must return valid JSON for every action, not free text. Using Pydantic models as the output schema drops hallucinated tool calls to near zero.
Memory Patterns
Explored three memory patterns:
Buffer memory: Simple sliding window of last N messages. Works for short tasks, falls apart on anything requiring long-horizon context.
Summarisation memory: Compress old context into a running summary. Loses precision but scales. Good for conversational agents.
Entity memory: Extract and store named entities (people, projects, dates) as a structured knowledge base. The agent can retrieve specific facts without loading full history. Best for domain-specific assistants.
A Real Use Case: Research Assistant Agent
Built a research assistant that:
- Accepts a research question
- Decomposes it into sub-questions
- Searches a local knowledge base (PDF documents via FAISS)
- Synthesises findings across sources
- Returns a structured report with citations
This runs end-to-end in FastAPI, accepts questions via a REST endpoint, and streams the response via SSE.
What Breaks Agents
- Token limits: Long tool outputs blow the context window. Need aggressive summarisation between steps.
- Reward hacking: The model finds the easiest path to a "done" state, not necessarily the correct one. Need explicit verification steps.
- Error recovery: Most frameworks don't teach the model to retry intelligently after tool failures. Had to add custom retry logic with error context fed back into the prompt.
What's Next
- Multi-agent coordination: one planner, multiple specialist sub-agents
- Persistent memory across sessions using a vector store
- Evaluating agent trajectories, not just final outputs