cs203-lab08-litellm

LiteLLM Chat/Completion API — 2‑Hour Hands‑On Lab (Python)

Course slot: Lab (2h) following the lecture

Aligned with slides: environment → API mastery → prompting workshop → applications (summarization/translation/rewriting/parameter tuning) → wrap‑up.

Learning outcomes (today)

By the end of this lab you can:

Install & configure LiteLLM and a free API key provider
Make chat/completion calls and read choices[…], usage & errors
Tune parameters (temperature, top_p, max_tokens)
Apply prompting patterns: zero‑shot, few‑shot, role prompting, CoT
Build mini apps: summarizer, translator, style rewriter

Timeline (120 minutes)

Part 1 — Environment Setup (20m)
Part 2 — API Mastery (30m)
Part 3 — Prompting Workshop (40m)
Part 4 — Applications (30m)

If you finish early: try the Stretch Goals at the end.

Part 1 — Environment Setup (20m)

1) Project & venv

mkdir lab08-litellm && cd lab08-litellm
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

2) Install

pip install litellm python-dotenv

3) Choose a free provider & get an API key

You only need one of these. (Use multiple if you want to compare models.)

Groq (Llama/Gemma etc.) https://console.groq.com/home 1) Sign up / sign in → Dashboard → API Keys
2) Create key → copy 3) Save as GROQ_API_KEY in your .env
Google AI Studio (Gemini) https://aistudio.google.com/ 1) Sign in with Google → Get API key
2) Create API key (Server) → copy
3) Save as GEMINI_API_KEY
OpenRouter (:free models, rate‑limited) https://openrouter.ai/ 1) Sign up → Profile → API Keys
2) Create key → copy
3) Save as OPENROUTER_API_KEY

Instructor tip: Prepare a fallback key in case a service rate‑limits the class.

4) .env template

Create a file named .env in your project root:

# Add only the provider(s) you use
GROQ_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
OPENROUTER_API_KEY=your_key_here

# Pick ONE default model id AFTER you confirm it in your provider’s catalog
# (Examples below are placeholders; choose any supported chat/completion model)
MODEL=provider/model-id

See model list available from https://docs.litellm.ai/docs/providers
Grog https://console.groq.com/docs/models
gemini https://ai.google.dev/gemini-api/docs/pricing
openrouter https://openrouter.ai/models?fmt=cards&input_modalities=text&max_price=0

5) Minimal config loader (config.py)
import os
from dotenv import load_dotenv
load_dotenv()
PROVIDER_MODEL = os.getenv("MODEL")
assert PROVIDER_MODEL, "Please set MODEL in your .env to a supported model id"

6) First call smoke test (5m)

Create hello.py:

from litellm import completion
from config import PROVIDER_MODEL as MODEL

resp = completion(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "Say hello in 5 words."}
    ],
    max_tokens=32,
)
print("REPLY:
", resp.choices[0].message["content"]) 
print("USAGE:", getattr(resp, "usage", {}))

Run:

python hello.py

Checkpoint A: Do you see a reply + usage? If not, re‑check API key and MODEL string.

Part 2 — API Mastery (30m)

2.1 Parameters quick lab (10m)

Create parameters.py:

from litellm import completion
from config import PROVIDER_MODEL as MODEL

prompt = "Give 3 creative names for a smart water bottle."
for temp in [0.0, 0.5, 1.0]:
    r = completion(
        model=MODEL,
        messages=[{"role":"user","content":prompt}],
        temperature=temp,
        top_p=1.0,
        max_tokens=150,
    )
    print("
--- temperature =", temp)
    print(r.choices[0].message["content"]) 

Task: Change top_p → 0.6 and note differences.

2.2 Error handling & timeouts (10m)

Create robust.py:

import time, random
from litellm import completion, exceptions
from config import PROVIDER_MODEL as MODEL

for attempt in range(1,4):
    try:
        r = completion(
            model=MODEL,
            messages=[{"role":"user","content":"Two bullets on gradient descent."}],
            timeout=20,
            max_tokens=120,
        )
        print(r.choices[0].message["content"]) 
        print("USAGE:", getattr(r, "usage", {}))
        break
    except exceptions.RateLimitError:
        wait = (2 ** attempt) + random.random()
        print(f"Rate limited. Retrying in {wait:.1f}s…")
        time.sleep(wait)
    except Exception as e:
        print("Unexpected:", type(e).__name__, str(e))
        break

Task: Set timeout=0.001 to force a timeout and observe behavior.

2.3 (Optional) Streaming (10m)

from litellm import completion
from config import PROVIDER_MODEL as MODEL
stream = completion(model=MODEL, messages=[{"role":"user","content":"Write 3 sentences about a traveling cat."}], stream=True)
for chunk in stream:
    delta = chunk.choices[0].delta.get("content") if chunk.choices and chunk.choices[0].delta else None
    if delta: print(delta, end="", flush=True)
print()

Checkpoint B: Where do you read token usage? (Hint: response.usage).

Part 3 — Prompting Workshop (40m)

Match the slide flow: zero‑shot → few‑shot → role prompting → chain‑of‑thought (CoT).

3.1 Zero‑shot

from litellm import completion
from config import PROVIDER_MODEL as MODEL

r = completion(model=MODEL, messages=[{"role":"user","content":"Explain APIs in one sentence."}], temperature=0.3, max_tokens=60)
print(r.choices[0].message["content"]) 

3.2 Few‑shot

from litellm import completion
from config import PROVIDER_MODEL as MODEL

shots = (
    "Review: 'Amazing product!' → Positive\n"
    "Review: 'Waste of money.' → Negative\n"
    "Review: 'It's okay.' → Neutral\n"
)
q = "Review: 'Loved the build quality!' →"
msg = f"Classify sentiment.
Examples:
{shots}
Now continue:
{q}"
resp = completion(model=MODEL, messages=[{"role":"user","content":msg}], temperature=0.2)
print(resp.choices[0].message["content"]) 

3.3 Role prompting

from litellm import completion
from config import PROVIDER_MODEL as MODEL

system = "You are a senior Python tutor. Be precise and brief."
user = "Show a for‑loop example that sums numbers 1..5."
resp = completion(model=MODEL, messages=[{"role":"system","content":system},{"role":"user","content":user}], temperature=0.2)
print(resp.choices[0].message["content"]) 

3.4 CoT (reasoning trace in output)

from litellm import completion
from config import PROVIDER_MODEL as MODEL

problem = "A store sold 42, 38, and 51 pizzas on Mon/Tue/Wed at $18 each. Total revenue?"
prompt = (
    "Solve step‑by‑step.
"
    "1) Sum pizzas 2) Multiply by price 3) State final.
"
    f"Problem: {problem}"
)
resp = completion(model=MODEL, messages=[{"role":"user","content":prompt}], temperature=0.2)
print(resp.choices[0].message["content"]) 

Part 4 — Applications (30m)

Build three small utilities from the slides: summarizer, translator, style rewriter.

4.1 Summarizer (`summarize.py`)

from litellm import completion
from config import PROVIDER_MODEL as MODEL

def summarize(text, length="brief"):
    lengths = {"brief":"in 1–2 sentences","medium":"in 3–4 sentences","detailed":"in 5–6 sentences with key points"}
    r = completion(
        model=MODEL,
        messages=[
            {"role":"system","content":f"You are an expert summarizer. Summarize {lengths.get(length,'in 2–3 sentences')}"},
            {"role":"user","content":text}
        ],
        temperature=0.3, max_tokens=180,
    )
    return r.choices[0].message["content"].strip()

if __name__ == "__main__":
    sample = """Recent advances in AI… (paste any article here)"""
    print(summarize(sample, "brief"))

4.2 Translator (`translate.py`)

from litellm import completion
from config import PROVIDER_MODEL as MODEL

def translate(text, target_lang):
    r = completion(
        model=MODEL,
        messages=[
            {"role":"system","content":f"You are a professional translator to {target_lang}. Keep tone & meaning."},
            {"role":"user","content":text}
        ],
        temperature=0.2, max_tokens=220,
    )
    return r.choices[0].message["content"].strip()

if __name__ == "__main__":
    print(translate("Hello, how are you today?", "French"))

4.3 Style rewriter (`rewrite.py`)

from litellm import completion
from config import PROVIDER_MODEL as MODEL

def rewrite(text, style):
    styles = {
        "formal":"formal, business‑appropriate",
        "casual":"friendly, conversational",
        "technical":"precise technical writing",
        "marketing":"persuasive, benefits‑led"
    }
    r = completion(
        model=MODEL,
        messages=[
            {"role":"system","content":f"Rewrite in {styles.get(style,'clear and concise')} style while preserving meaning."},
            {"role":"user","content":text}
        ],
        temperature=0.4, max_tokens=200,
    )
    return r.choices[0].message["content"].strip()

if __name__ == "__main__":
    print(rewrite("Our new update improves performance and UX.", "marketing"))

Deliverables

1) Working scripts: hello.py, parameters.py, robust.py, summarize.py, translate.py, rewrite.py
2) Short README: how to run, observations on parameters, 1–2 screenshots

Grading (10 pts)

(2) Setup & first call works
(2) Parameters explored with notes
(2) Prompting (zero/few‑shot, role, CoT) shown
(2) Two apps functioning (any two of 4.1/4.2/4.3)
(1) Error handling demo
(1) Usage printed/logged

Troubleshooting

401/unauthorized → check the right env var for your provider and that .env is loaded
Model not found → pick a valid MODEL id from your provider’s catalog (update .env)
Rate limit → wait/backoff; shorten prompts; switch provider if needed
Cut‑off outputs → increase max_tokens or ask for shorter responses

Stretch Goals

Model compare: run the same prompt across 2+ providers and diff outputs
JSON output: ask for strict JSON and json.loads it (validate keys)
Simple CLI: python qa.py --system 'You are..' --temperature 0.3 (add --stream)
Tiny web demo: Flask/FastAPI endpoint returning the summary/translation

Appendix — Provider quick notes

Groq: export GROQ_API_KEY=...; choose a supported chat model id and set MODEL=groq/<model-id>
Gemini: export GEMINI_API_KEY=...; set MODEL=gemini/<model-id>
OpenRouter: export OPENROUTER_API_KEY=...; set MODEL=openrouter/<provider>/<model-id>:free

Exact model strings change over time. Always copy the id from the provider’s current model list, then set it in .env as MODEL=....