After class · at home Workshop

Build a research assistant agent

Take everything from class and build something you'll actually use: an agent that researches a topic for your work. You'll grow it in six levels — skeleton, a simple tool, a real tool, parallel researchers you merge into one summary, then the harness pieces that make it real: observability and evals.

🎯 The goal

Give the agent a research question (“What are the trade-offs of vector databases for RAG?”, “Summarize recent work on X for my thesis”). It splits the question into subtopics, researches each — in parallel — and returns one clean, structured brief. Build it with Copilot at your side, using the same ask → test → improve loop from class.

Before you start

Have your free GEMINI_API_KEY set (see the Agents page), and pip install pydantic-ai. Keep a request cap on every run so you never blow through the free tier.

The six levels

  1. Level 1 · Skeleton

    A minimal agent that returns structured output

    Start with the smallest thing that runs. Define the shape of a research brief and get the agent to fill it in — no tools yet.

    research_agent.py
    from pydantic import BaseModel, Field
    from pydantic_ai import Agent
    from pydantic_ai.usage import UsageLimits
    
    class Brief(BaseModel):
        topic: str
        key_points: list[str] = Field(description="3-5 concise findings")
        summary: str
    
    agent = Agent(
        "google-gla:gemini-2.0-flash",
        output_type=Brief,
        system_prompt="You are a rigorous research assistant. Be concise and factual.",
    )
    
    if __name__ == "__main__":
        out = agent.run_sync(
            "Give me a research brief on vector databases for RAG.",
            usage_limits=UsageLimits(request_limit=5),
        )
        print(out.output)

    Ask Copilot: “Explain what output_type does here and what happens if the model returns invalid data.”

  2. Level 2 · A simple tool

    Give it its first action

    Add one easy tool so the model stops relying only on memory. Start with something trivial to prove the loop works — then you'll trust it with a real one.

    research_agent.py — add a tool
    from datetime import date
    
    @agent.tool_plain
    def today() -> str:
        """Return today's date as YYYY-MM-DD, for grounding time-sensitive claims."""
        return date.today().isoformat()

    Ask Copilot: “Write a quick test that runs the agent and asserts the Brief.topic is non-empty.” Then run it and watch the model decide whether to call today().

  3. Level 3 · A real tool

    Let it reach the outside world

    Now a tool that actually fetches information — a web search. Use any search API you like (Tavily, Brave, DuckDuckGo, SerpAPI). The agent calls it, reads the results, and grounds its brief in them.

    research_agent.py — real tool
    import os, httpx
    
    @agent.tool_plain
    async def web_search(query: str) -> list[str]:
        """Search the web and return the top result snippets for the query."""
        resp = httpx.post(
            "https://api.tavily.com/search",
            json={"api_key": os.environ["TAVILY_API_KEY"],
                  "query": query, "max_results": 5},
            timeout=30,
        )
        resp.raise_for_status()
        return [r["content"] for r in resp.json().get("results", [])]

    Vibe-code it: ask Copilot to handle the case where the API returns no results by raising ModelRetry("No results — try a broader query.") so the agent reformulates instead of crashing. This is the “ask → test → improve” loop from class, on your own tool. (Tavily has a free tier; any search API works.)

  4. Level 4 · Parallel + aggregate

    Many researchers at once, one merged answer

    The real power move: split the question into subtopics, run a research agent on each concurrently with asyncio.gather, then feed all the briefs to a final agent that synthesizes one report. Parallel means it finishes in the time of the slowest subtopic, not the sum of all of them.

    parallel_research.py
    import asyncio
    from pydantic import BaseModel
    from pydantic_ai import Agent
    from pydantic_ai.usage import UsageLimits
    
    # ... (Brief model + `agent` with web_search tool from Levels 1-3) ...
    
    class Report(BaseModel):
        question: str
        briefs: list[str]
        final_summary: str
    
    # A separate agent whose only job is to merge findings.
    synthesizer = Agent(
        "google-gla:gemini-2.0-flash",
        output_type=Report,
        system_prompt="Merge the research briefs into one coherent, non-repetitive report.",
    )
    
    async def research_one(subtopic: str) -> Brief:
        out = await agent.run(
            f"Research this subtopic and return a brief: {subtopic}",
            usage_limits=UsageLimits(request_limit=6),
        )
        return out.output
    
    async def main(question: str, subtopics: list[str]) -> Report:
        # 1. Fan out: all subtopics researched at the same time.
        briefs = await asyncio.gather(*(research_one(s) for s in subtopics))
    
        # 2. Fan in: hand every brief to the synthesizer to combine.
        joined = "\n\n".join(f"## {b.topic}\n{b.summary}" for b in briefs)
        out = await synthesizer.run(
            f"Question: {question}\n\nBriefs:\n{joined}",
            usage_limits=UsageLimits(request_limit=5),
        )
        return out.output
    
    if __name__ == "__main__":
        report = asyncio.run(main(
            "Should my team adopt a vector database for RAG?",
            ["performance & scaling", "cost", "alternatives to a dedicated vector DB"],
        ))
        print(report.final_summary)
    ✅ You just built a mini research pipeline

    Fan out (many agents in parallel) → fan in (one agent merges). That pattern scales from 3 subtopics to 30. Cap every run, and log how long the parallel version takes vs. running them one by one.

  5. Level 5 · See inside it

    Observability — trace every step

    Right now your agent is a black box. Add tracing so you can see every prompt, tool call, retry, token count, and error. This is the harness piece that turns “it's broken somewhere” into “here's the exact call that failed.” Two lines with Logfire (free tier, made by the PydanticAI team):

    research_agent.py — top of file
    import logfire
    
    logfire.configure()             # sign in once with `logfire auth`
    logfire.instrument_pydantic_ai()  # now every agent run is traced
    
    # ...define your agents and tools as before...

    Run the agent, then open your Logfire dashboard and watch the whole tree: the orchestrator, each parallel researcher, every web_search call. Ask Copilot: “Where is most of the time spent?” — and read it off the trace.

  6. Level 6 · Prove it works

    A tiny eval — catch regressions before they ship

    “Seemed fine” isn't good enough. Write a handful of cases and check the agent still passes them every time you change a prompt or a tool. Start dead simple:

    eval_agent.py
    import asyncio
    
    # (input question, a keyword the good answer should contain)
    CASES = [
        ("Research briefly: what is RAG?", "retrieval"),
        ("Research briefly: what is a vector database?", "embedding"),
    ]
    
    async def run_evals():
        passed = 0
        for question, must_contain in CASES:
            out = await agent.run(question)
            text = out.output.summary.lower()
            ok = must_contain in text
            print(("PASS" if ok else "FAIL"), "-", question)
            passed += ok
        print(f"{passed}/{len(CASES)} cases passed")
    
    asyncio.run(run_evals())

    Keyword checks are a starting point. When you outgrow them, look at pydantic-evals — or add an LLM-as-judge that scores each answer. Either way: an eval you can re-run is what makes improvement measurable instead of vibes.

Stretch goals

📎

Cite sources

Have web_search return URLs too, and make the Brief include a sources list.

🧭

Auto-plan subtopics

Add a planner agent that turns the question into the subtopic list — so you only pass the question.

🔁

Self-check

Add a tool or step that flags weak/contradictory findings and re-researches them.

💾

Save the report

Write the final report to a Markdown file you can drop into your notes.

Deliverable checklist

Aim to tick all six levels. Your progress is saved in this browser.

0 / 8 done

🚀 The takeaway

You vibe-coded a real, useful agent — from a one-shot skeleton to a parallel research pipeline — the same way professionals do: small steps, tools, tests, and a tight loop with the AI. That's the whole course in one project.