</>

Advanced Programming for Data Scientists

Vibe Coding &
Agents Building

Coding with AI, and building AI that acts.

Tel Aviv University · 3-hour session

Roadmap

Where we're going today

1 · Prepare — GitHub Student Pack, VS Code, Copilot · at home
2 · Vibe coding — solve a real PS1 question with AI · in class
3 · Agents — build AI that uses tools & loops · in class
4 · Workshop — a research-assistant agent · at home

Before class

You already set up your machine ✅

GitHub Student Pack (with your @mail.tau.ac.il email) → Copilot Pro → VS Code → the Copilot extension. If not — do it now, it's on the course site.

02

Vibe Coding

Ask → visualize → test → fix → iterate.

Warm-up ispow2(n)

Give the AI the constraints, not just the goal

sol1.py

# no bin / len / logs / strings — bit ops only. 0 is NOT a power of two.
def ispow2(n):
    return n > 0 and (n & (n - 1)) == 0

Tell it the forbidden built-ins up front — or it reaches for bin().

The mental model

The vibe-coding loop

Ask→ Visualize→ Test→ Refactor safely↺ Iterate

You stay the engineer. The AI is a very fast, very literal pair-programmer.

Demo longest_run(n)

Watch the loop on a real question

longest_run(n) → length of the longest run of consecutive 1 bits.

Turn 1 — ask (it uses forbidden bin() 👀)
Turn 2 — visualize the bits
Turn 3 — write tests → green
Turn 4 — catch the rule break → refactor → still green
Turn 5 — push on edge cases & complexity

▶ Live now longest_run

🎬 Switch to VS Code

Full walkthrough, live — ask, visualize, test, fix, iterate.

The takeaway

Passing tests ≠ acceptable solution

🎯 Constraints first

The AI happily breaks the rules unless you state them. bin() worked — but was forbidden.

🧪 Tests = freedom

Once you have tests, you can let the AI rewrite the code and know instantly if it broke.

Before your turn

How to talk to the AI

🎯 Be specific — exact behavior, not vibes
🚧 State constraints — what's forbidden & required
🔢 Give examples — input → output pairs
🪜 Small steps — solve → test → edge cases
🕵️ Read every line — confident ≠ correct

Your turn

reverse_bits(x, num_bits) — 30 min

Reverse the low num_bits bits of x. Same loop: ask (with constraints) → visualize → test → fix → improve.

Timer + visualizer are on the course site → In Class.

03

Building AI Agents

with PydanticAI

The core idea

A single answer vs. an agent that acts

🗨️ One LLM request

Only its training data
No live info, no your data
Can't verify itself
One shot

vs

🤖 Agent + tools

Fetches real info via tools
Grounds in your sources
Retries, loops, checks
Structured output

What makes it an agent

Reason → act → observe → repeat

Prompt

→

LLMdecide

→

Call toolsearch, run…

→

Read result

↺

Answerstructured

Remove the loop and the tools → you're back to one chat message.

What is a tool?

A normal typed function the model can call

tool

@agent.tool_plain
def word_count(text: str) -> int:
    """Return the number of words in the given text."""
    return len(text.split())

The docstring is the tool's instruction manual for the model.

Why bother

Tools + a loop beat better prompting

An LLM alone is a brilliant intern with no phone, no internet, no notebook. Tools give it the phone and the notebook.

SWE-benchGAIAτ-bench

On "doing" tasks, tool-using agents complete far more than single prompts.

Your stack = 3 choices

Framework · Model · Harness

🧱 Framework

PydanticAI, LangGraph, LlamaIndex…
We use PydanticAI.

🧠 Model

Gemini, Claude, GPT, Llama…
Swappable in one line. Gateway = API / OpenRouter / Ollama.

🛠️ Harness

Tools · observability · tests · guardrails.
Where the real work is.

Model & framework are easy swaps. The harness is what makes an agent good.

The point

The harness is the agent

Tools Observability Tests & evals Guardrails Memory

Same model, same framework — a strong harness is the difference between a demo and something you'd ship.

Architectures · first cut

Workflow vs. agent

🧭 Workflow

You wire the steps in code
Path is predefined
Predictable, cheap, debuggable

vs

🤖 Agent

The model decides next step
Path emerges at runtime
Flexible — but slower & pricier

Same brick underneath: the augmented LLM (model + tools + memory).

Architectures · the ladder simple → complex

Six patterns you compose

1 · Single agent — one loop + tools · start here
2 · Prompt chain — sub-agents in a fixed flow, A → B → C
3 · Routing — classify, hand off to the right specialist
4 · Parallelization — fan-out / fan-in (split, or vote)
5 · Orchestrator + specialists — boss plans & delegates · the workhorse
6 · Evaluator–optimizer — generate ⇄ critique, loop

Scaling up · & the one rule

Topologies — and start simple

Hierarchical

Supervisors of workers. High control, big tasks.

Swarm

Peers, no boss. Exploration at scale.

Mesh

3–8 peers, tight loops on one artifact.

Rule: every layer adds cost, latency & failure points. Use the fewest pieces that solve it.

Tools at scale

MCP — “USB-C for AI tools”

One open standard. Plug your agent into ready-made servers — GitHub, Slack, databases, filesystem, browser — and it gains all their tools at once.

Write onceReuse across modelsClient & server

No credit card needed

Free tokens — and a cap so the loop can't run away

🟢 Gemini free tier — key from aistudio.google.com, generous requests/day
🔵 OpenRouter — one key, many models tagged :free
🚦 UsageLimits(request_limit=5) → raises UsageLimitExceeded

Because agents loop, always cap the number of requests.

▶ Live first_agent.py

One tool · free model · structured output · capped

first_agent.py

class Answer(BaseModel):
    result: int

agent = Agent("google-gla:gemini-2.0-flash", output_type=Answer)

@agent.tool_plain
def add(a: int, b: int) -> int:
    """Add two integers."""
    return a + b

out = agent.run_sync("What is 21 + 21? Use the add tool.",
                     usage_limits=UsageLimits(request_limit=5))
print(out.output.result)   # -> 42

Assume it misbehaves

Failure modes → guardrails

Runaway loop → UsageLimits
Bad tool call → ModelRetry
Hallucination → structured output + eval
Prompt injection → treat tool output as untrusted
Unsafe action → human-in-the-loop
Silent regressions → tracing + evals

At home · workshop

Build a research assistant agent

Skeleton→ Simple tool→ Real tool→ Parallel + merge→ Observe→ Eval

Fan out with asyncio.gather, fan in with a synthesizer — then trace it and eval it. Full guide on the site.

Recap

Ask · Visualize · Test
Fix · Iterate

The same loop — from a bit trick to a parallel agent pipeline.

Course site has everything: prep · demo · challenge · workshop