AGENTSIN PROGRESSINTERMEDIATE · 2025-03

AI Agent Projects

Building autonomous AI agents that can plan, use tools, and execute multi-step tasks. Experiments with ReAct, tool-use, and memory patterns.

LangChainOpenAIPythonFastAPI

Overview

Agents are where LLMs stop being autocomplete and start being useful. The question isn't "can the model generate text?" — it's "can the model take actions in the world, recover from failures, and achieve a goal across multiple steps?"

This collection documents my experiments building agents that go beyond toy demos.

Agent Architectures Explored

ReAct (Reasoning + Acting)

The classic pattern. The model alternates between thinking steps ("I need to look up X") and action steps (calling a tool). Implemented from scratch without LangChain first to understand what's really happening, then rebuilt with LangChain for production use.

Key finding: The quality of tool descriptions matters more than the tool implementations. Vague tool descriptions cause the model to hallucinate tool signatures.

Tool-Use Agent with Structured Outputs

Built an agent that browses local file systems, reads CSV data, generates Python code to analyse it, executes the code via a sandboxed subprocess, and returns a summary — all in one chain.

The critical component is structured output: the model must return valid JSON for every action, not free text. Using Pydantic models as the output schema drops hallucinated tool calls to near zero.

Memory Patterns

Explored three memory patterns:

Buffer memory: Simple sliding window of last N messages. Works for short tasks, falls apart on anything requiring long-horizon context.

Summarisation memory: Compress old context into a running summary. Loses precision but scales. Good for conversational agents.

Entity memory: Extract and store named entities (people, projects, dates) as a structured knowledge base. The agent can retrieve specific facts without loading full history. Best for domain-specific assistants.

A Real Use Case: Research Assistant Agent

Built a research assistant that:

Accepts a research question
Decomposes it into sub-questions
Searches a local knowledge base (PDF documents via FAISS)
Synthesises findings across sources
Returns a structured report with citations

This runs end-to-end in FastAPI, accepts questions via a REST endpoint, and streams the response via SSE.

What Breaks Agents

Token limits: Long tool outputs blow the context window. Need aggressive summarisation between steps.
Reward hacking: The model finds the easiest path to a "done" state, not necessarily the correct one. Need explicit verification steps.
Error recovery: Most frameworks don't teach the model to retry intelligently after tool failures. Had to add custom retry logic with error context fed back into the prompt.

What's Next

Multi-agent coordination: one planner, multiple specialist sub-agents
Persistent memory across sessions using a vector store
Evaluating agent trajectories, not just final outputs