Course slot: Lab (2h) following the lecture
Aligned with slides: environment → API mastery → prompting workshop → applications (summarization/translation/rewriting/parameter tuning) → wrap‑up.
By the end of this lab you can:
choices[…]
, usage
& errorstemperature
, top_p
, max_tokens
)Part 1 — Environment Setup (20m)
Part 2 — API Mastery (30m)
Part 3 — Prompting Workshop (40m)
Part 4 — Applications (30m)
If you finish early: try the Stretch Goals at the end.
mkdir lab08-litellm && cd lab08-litellm
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install litellm python-dotenv
You only need one of these. (Use multiple if you want to compare models.)
Groq (Llama/Gemma etc.) https://console.groq.com/home
1) Sign up / sign in → Dashboard → API Keys
2) Create key → copy
3) Save as GROQ_API_KEY
in your .env
Google AI Studio (Gemini) https://aistudio.google.com/
1) Sign in with Google → Get API key
2) Create API key (Server) → copy
3) Save as GEMINI_API_KEY
OpenRouter (:free models, rate‑limited) https://openrouter.ai/
1) Sign up → Profile → API Keys
2) Create key → copy
3) Save as OPENROUTER_API_KEY
Instructor tip: Prepare a fallback key in case a service rate‑limits the class.
Create a file named .env
in your project root:
# Add only the provider(s) you use
GROQ_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
OPENROUTER_API_KEY=your_key_here
# Pick ONE default model id AFTER you confirm it in your provider’s catalog
# (Examples below are placeholders; choose any supported chat/completion model)
MODEL=provider/model-id
See model list available from https://docs.litellm.ai/docs/providers
Grog https://console.groq.com/docs/models
gemini https://ai.google.dev/gemini-api/docs/pricing
openrouter https://openrouter.ai/models?fmt=cards&input_modalities=text&max_price=05) Minimal config loader (
config.py
)import os from dotenv import load_dotenv load_dotenv() PROVIDER_MODEL = os.getenv("MODEL") assert PROVIDER_MODEL, "Please set MODEL in your .env to a supported model id"
Create hello.py
:
from litellm import completion
from config import PROVIDER_MODEL as MODEL
resp = completion(
model=MODEL,
messages=[
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Say hello in 5 words."}
],
max_tokens=32,
)
print("REPLY:
", resp.choices[0].message["content"])
print("USAGE:", getattr(resp, "usage", {}))
Run:
python hello.py
Checkpoint A: Do you see a reply + usage? If not, re‑check API key and MODEL
string.
Create parameters.py
:
from litellm import completion
from config import PROVIDER_MODEL as MODEL
prompt = "Give 3 creative names for a smart water bottle."
for temp in [0.0, 0.5, 1.0]:
r = completion(
model=MODEL,
messages=[{"role":"user","content":prompt}],
temperature=temp,
top_p=1.0,
max_tokens=150,
)
print("
--- temperature =", temp)
print(r.choices[0].message["content"])
Task: Change top_p
→ 0.6
and note differences.
Create robust.py
:
import time, random
from litellm import completion, exceptions
from config import PROVIDER_MODEL as MODEL
for attempt in range(1,4):
try:
r = completion(
model=MODEL,
messages=[{"role":"user","content":"Two bullets on gradient descent."}],
timeout=20,
max_tokens=120,
)
print(r.choices[0].message["content"])
print("USAGE:", getattr(r, "usage", {}))
break
except exceptions.RateLimitError:
wait = (2 ** attempt) + random.random()
print(f"Rate limited. Retrying in {wait:.1f}s…")
time.sleep(wait)
except Exception as e:
print("Unexpected:", type(e).__name__, str(e))
break
Task: Set timeout=0.001
to force a timeout and observe behavior.
from litellm import completion
from config import PROVIDER_MODEL as MODEL
stream = completion(model=MODEL, messages=[{"role":"user","content":"Write 3 sentences about a traveling cat."}], stream=True)
for chunk in stream:
delta = chunk.choices[0].delta.get("content") if chunk.choices and chunk.choices[0].delta else None
if delta: print(delta, end="", flush=True)
print()
Checkpoint B: Where do you read token usage? (Hint: response.usage
).
Match the slide flow: zero‑shot → few‑shot → role prompting → chain‑of‑thought (CoT).
from litellm import completion
from config import PROVIDER_MODEL as MODEL
r = completion(model=MODEL, messages=[{"role":"user","content":"Explain APIs in one sentence."}], temperature=0.3, max_tokens=60)
print(r.choices[0].message["content"])
from litellm import completion
from config import PROVIDER_MODEL as MODEL
shots = (
"Review: 'Amazing product!' → Positive\n"
"Review: 'Waste of money.' → Negative\n"
"Review: 'It's okay.' → Neutral\n"
)
q = "Review: 'Loved the build quality!' →"
msg = f"Classify sentiment.
Examples:
{shots}
Now continue:
{q}"
resp = completion(model=MODEL, messages=[{"role":"user","content":msg}], temperature=0.2)
print(resp.choices[0].message["content"])
from litellm import completion
from config import PROVIDER_MODEL as MODEL
system = "You are a senior Python tutor. Be precise and brief."
user = "Show a for‑loop example that sums numbers 1..5."
resp = completion(model=MODEL, messages=[{"role":"system","content":system},{"role":"user","content":user}], temperature=0.2)
print(resp.choices[0].message["content"])
from litellm import completion
from config import PROVIDER_MODEL as MODEL
problem = "A store sold 42, 38, and 51 pizzas on Mon/Tue/Wed at $18 each. Total revenue?"
prompt = (
"Solve step‑by‑step.
"
"1) Sum pizzas 2) Multiply by price 3) State final.
"
f"Problem: {problem}"
)
resp = completion(model=MODEL, messages=[{"role":"user","content":prompt}], temperature=0.2)
print(resp.choices[0].message["content"])
Build three small utilities from the slides: summarizer, translator, style rewriter.
summarize.py
)from litellm import completion
from config import PROVIDER_MODEL as MODEL
def summarize(text, length="brief"):
lengths = {"brief":"in 1–2 sentences","medium":"in 3–4 sentences","detailed":"in 5–6 sentences with key points"}
r = completion(
model=MODEL,
messages=[
{"role":"system","content":f"You are an expert summarizer. Summarize {lengths.get(length,'in 2–3 sentences')}"},
{"role":"user","content":text}
],
temperature=0.3, max_tokens=180,
)
return r.choices[0].message["content"].strip()
if __name__ == "__main__":
sample = """Recent advances in AI… (paste any article here)"""
print(summarize(sample, "brief"))
translate.py
)from litellm import completion
from config import PROVIDER_MODEL as MODEL
def translate(text, target_lang):
r = completion(
model=MODEL,
messages=[
{"role":"system","content":f"You are a professional translator to {target_lang}. Keep tone & meaning."},
{"role":"user","content":text}
],
temperature=0.2, max_tokens=220,
)
return r.choices[0].message["content"].strip()
if __name__ == "__main__":
print(translate("Hello, how are you today?", "French"))
rewrite.py
)from litellm import completion
from config import PROVIDER_MODEL as MODEL
def rewrite(text, style):
styles = {
"formal":"formal, business‑appropriate",
"casual":"friendly, conversational",
"technical":"precise technical writing",
"marketing":"persuasive, benefits‑led"
}
r = completion(
model=MODEL,
messages=[
{"role":"system","content":f"Rewrite in {styles.get(style,'clear and concise')} style while preserving meaning."},
{"role":"user","content":text}
],
temperature=0.4, max_tokens=200,
)
return r.choices[0].message["content"].strip()
if __name__ == "__main__":
print(rewrite("Our new update improves performance and UX.", "marketing"))
1) Working scripts: hello.py
, parameters.py
, robust.py
, summarize.py
, translate.py
, rewrite.py
2) Short README: how to run, observations on parameters, 1–2 screenshots
.env
is loadedMODEL
id from your provider’s catalog (update .env
)max_tokens
or ask for shorter responsesjson.loads
it (validate keys)python qa.py --system 'You are..' --temperature 0.3
(add --stream
)GROQ_API_KEY=...
; choose a supported chat model id and set MODEL=groq/<model-id>
GEMINI_API_KEY=...
; set MODEL=gemini/<model-id>
OPENROUTER_API_KEY=...
; set MODEL=openrouter/<provider>/<model-id>:free
Exact model strings change over time. Always copy the id from the provider’s current model list, then set it in
.env
asMODEL=...
.