← Research

2026-03-01 (updated 2026-04-01) · Experiment · 10 min read

Building an AI Sommelier That Buys Wine For Me

What happens when you give Claude a cron job, a taste profile, and three retailer logins?

The Experiment

1. The Idea

I like wine but I'm lazy about buying it. I tend to reach for the same regions — South African reds, California Pinot Noir, Rioja — and ignore most of the wine world. I wanted something that would push me out of my comfort zone without requiring me to think about it.

The obvious approach would be a subscription box, but those are generic. They don't know I think 15%+ ABV is too much, that I rated Château Musar 4.5/5, or that my cellar is already drowning in Gusbourne sparkling.

So I built a sommelier instead. Not a web app or a recommendation engine — just a Claude Code session that wakes up once a month, browses the same websites I would, thinks about what I'd enjoy, and orders it.

2. Architecture (Or Lack Thereof)

This is the interesting part. The entire system is a prompt and a shell script. No database, no API, no backend, no recommendation engine. There is no wine-selecting algorithm. There is no structured pipeline. The architecture looks like this:

cron (1st of month) run.sh claude -p "Read CLAUDE.md and pick wines"

Browse retailers (WebFetch) Pick 6 wines purchase.py (Playwright)

Publish to website Notify via Telegram Wines arrive

The launcher script is 15 lines of bash. It sets a lockfile, launches Claude with --dangerously-skip-permissions, and logs everything. The actual sommelier lives in CLAUDE.md — a 150-line prompt that tells Claude who it is, what I like, and what to do.

That's it. There is no code that selects wine. Claude reads the prompt, browses retailer websites, reasons about what I'd enjoy, and makes its picks. The prompt is the program.

This follows a pattern I've been using for other automations on my homelab: the agentic cron job. Instead of writing an application that does a specific task, you write a prompt that describes the task and give Claude the tools to carry it out. The traditional approach would be to build a scraper, a scoring model, a recommendation layer, a checkout integration — hundreds of lines of brittle code per retailer. The agentic approach replaces all of that with a markdown file and general-purpose web browsing.

The tradeoff is control. A traditional pipeline does exactly what you coded. An agentic session does roughly what you described, interpreted through a model's judgement. For wine selection — where taste, serendipity, and cultural knowledge matter more than precision — that tradeoff works in your favour. You want the agent to exercise judgement. That's the whole point.

The key insight

Most "AI agent" projects I see are elaborate Python frameworks orchestrating chains of API calls. This is a CLAUDE.md file and a cron job. The complexity lives in the prompt, not the code. If you can describe a task clearly enough for a competent human to do it, you can describe it clearly enough for Claude to do it autonomously.

What Claude gets

Claude decides how to browse (which categories, which search terms, which "new arrivals" pages to check), what to prioritise (a promising Douro red vs. a safe Rioja), and when to splurge. None of that logic is coded anywhere. It emerges from the interaction between the prompt, the taste profile, and whatever Claude finds on the retailer sites that day.

3. The Purchase Automation

This is the part I was most nervous about. Giving an AI agent the ability to spend real money on your behalf is... a trust exercise.

The purchase script uses Playwright (headless Chromium) to automate checkout on three UK wine retailers. Each retailer has its own purchaser class because every site is different:

Retailer Login Add to Cart Cart URL Status
Majestic #mail_t1 a[id^="add-to-cart"] /customer/cart working
Berry Bros & Rudd #login-modal-email "Add to basket" /cart needs pw reset
The Wine Society #email .js-add-to-basket /basket/ working

Safety nets: a hard budget cap of £250/month enforced in code (the script refuses to checkout above that). My wife rightly pointed out that having my Amex as the default payment method on these accounts could result in a rude surprise. So I switched to a debit card with a hard cap as a secondary backstop.

Good thing, too. During testing, a scraping error caused Claude to misread the price of a £75 Brunello di Montalcino as £15. It identified this as the deal of the century and tried to buy as much of it as it could. An agent that genuinely believes it's found a £75 wine for £15 will behave exactly like a human who believes the same thing — it goes all in. The budget cap caught it, but it was a vivid reminder that the failure modes of an autonomous purchasing agent aren't just "it buys the wrong wine." They're "it finds a too-good-to-be-true deal, correctly reasons that it should maximise, and spends your money with enthusiasm."

Defence in depth

Software budget cap in the purchase script. Debit card spending limit on the account. Two independent layers, because a scraping bug and an eager agent are a combination you don't want to bet your credit card statement on.

The Cloudflare problem

Majestic is behind Cloudflare bot protection. A vanilla headless Chromium gets blocked instantly. The fix was surprisingly simple — disable the AutomationControlled Blink feature flag and remove the navigator.webdriver property. Not sophisticated, but enough for a legitimate use case (I'm buying wine from my own account, not scraping).

4. The Taste Profile

Getting the taste profile right turned out to be the hardest part. My first version was too prescriptive:

What went wrong (v1)

The initial profile listed specific regions and grapes I like, with detailed scoring. Claude interpreted this as a shopping list — all 6 wines came from The Wine Society, heavily weighted toward South Africa and California. The exact opposite of what I wanted.

The rewrite was less about what I like and more about how to surprise me:

What worked (v2)

Known preferences are labelled as "anchors — 1–2 per case max". The profile has a large "Where to Push Him" section listing 12+ discovery directions. An explicit rule says: "Each case should span at least 4 countries."

Lesson

When prompting an agent to make choices on your behalf, the constraints on variety matter more than the constraints on preference. An LLM will happily optimise for your stated preferences until every choice is the same. You have to explicitly tell it to explore.

5. The Feedback Loop

After each month's wines arrive, I send tasting notes via Telegram. A polling script (telegram-feedback.py, running every 15 minutes via cron) watches for messages starting with /wine and saves them to the feedback/ directory.

Next time the sommelier runs, it reads the feedback before picking new wines and updates the taste profile. The idea is that over months, the profile becomes increasingly accurate — not because I sat down and described my palate, but because the system learned from what I actually enjoyed drinking.

No completed feedback cycles yet. This is month one. Check back.

March 2026 — first run

The first real run produced a case I'm genuinely curious about. The agent went heavy on Northern Rhône (two Syrahs at £19 and £54) — which makes sense given the taste profile flags it as my biggest unexplored sweet spot. The wild card was a 25-year-old Oloroso Sherry for £17. The reasoning was interesting: "He rates d'Yquem 5.0 and old Rioja 4.7 — he already loves complex, oxidative styles even if he doesn't consciously frame it that way." That's... not wrong. I hadn't connected those dots myself.

The purchase automation needed live debugging. The Wine Society's product page for Hermitage defaulted to "Case of 3" (£162) instead of a single bottle (£54). The agent spotted the inflated basket total in screenshots, diagnosed the format selector issue, patched the purchase script mid-session to detect and switch case/bottle formats, and retried. It also had to add a JavaScript fallback for the checkout button after Playwright's CSS selectors couldn't find it. Three attempts total before the order went through. This is exactly the kind of brittleness I expected — and exactly the kind of self-repair I hoped an agent session could handle.

Total spend: £135.50 of the £250 budget. A £20 first-order discount from The Wine Society helped. The agent was conservative on its first run — I'd rather it under-spend and learn than over-spend and fail.

April 2026 — second run

The second run demonstrated both the power and the fragility of this approach. Two things broke before a single wine was ordered.

First, Berry Bros & Rudd upgraded their website and invalidated all existing passwords. The login page now redirects to a "Create a new password" flow. The automation hit this wall immediately — the credentials are correct, but the site won't accept them until the password is manually reset. This is exactly the kind of failure I flagged in the "How fragile is the checkout automation?" question above. Answer: very, and in ways you can't predict. A retailer doesn't announce "we're upgrading our auth system next Tuesday." You find out when the bot logs in and fails.

Second, The Wine Society's checkout requires a CVV (card security code) at payment. The automation reached the final "Complete your order" page, filled in the saved card details, but couldn't provide the CVV. This is a fundamentally harder problem than a changed CSS selector — it's a security boundary that shouldn't be automatable. The wines sit in the basket, ready for Henry to enter three digits.

On the selection side, something more interesting happened. Henry's only feedback from March wasn't about any of my picks — it was a note that he'd had Ridge Geyserville 2021 and loved it. Ridge Geyserville is a Zinfandel-dominant field blend from Sonoma, famous for restraint and complexity in a varietal often associated with fruit bombs. The agent used this to calibrate: he values restraint over power and enjoys multi-variety blends where terroir trumps varietal. This directly influenced the April pick of Mas de Daumas Gassac — a Cabernet-dominant Languedoc blend with 7+ grape varieties that's similarly focused on terroir complexity. Indirect feedback turned out to be more revealing than a tasting note.

Total case: £192.50 across 6 countries (Italy, France, South Africa, Greece, Austria, Portugal). More confident spending this month, wider geographic range. The Wine Society applied another £20 discount, bringing the actual total to £172.50.

6. What Broke Along the Way

Vivino API: walled off

I tried to pull my ratings from Vivino programmatically — REST API, headless browser, mobile user-agent. Everything hit Cloudflare's WAF with a 403. The only way to get my data was a GDPR export (which took days) transferred from my phone via scp.

BBR login: wrong everything

Berry Bros & Rudd is a Vue.js SPA. The initial login selectors were wrong (URL, form IDs, button text). Had to inspect with Playwright, fix three things at once, and add networkidle waits for the Vue hydration.

Wine Society: stealth format selectors

Some Wine Society product pages default to "Case of 3" rather than single bottles. The purchase script added one to cart at £162 instead of £54. The agent caught the discrepancy in a basket screenshot, diagnosed the <select> element defaulting to the case format, and patched the script to detect and switch it. A reminder that "add to cart" is never as simple as clicking a button — the page state matters.

Nested Claude sessions

Running claude from inside a Claude Code session fails with a cryptic error. The fix: unset CLAUDECODE in the launcher script before invoking the inner session.

Duplicate sessions

Two sommelier sessions spawned simultaneously from two launch attempts, each picking different wines. The lockfile mechanism caught it for subsequent runs, but the first double-order had to be manually untangled.

7. Is This a Good Idea?

Honestly, I don't know yet. Ask me after 6 months when there's real feedback data. The questions I'm interested in:

The real experiment

This isn't really about wine. It's about whether an LLM agent with internet access and a clear brief can make good taste decisions on your behalf — decisions that require cultural knowledge, personal preference, and the ability to surprise. Wine is just a testbed where the stakes are low and the feedback is fun.

8. Where This Goes

If you squint, the sommelier is a minimal viable example of something more general: a stateless autonomous agent that takes real-world actions on a schedule. No persistent runtime. No memory between sessions beyond a few markdown files. It wakes up, reads its brief, does its job, and shuts down.

I'm deliberately running this with minimal guardrails because I want to see what happens in a relatively low-stakes environment. A bad wine pick costs me £30 and a disappointing Tuesday evening. That's a cheap tuition fee for understanding how autonomous agents actually behave when you let them loose — what they get right, where they go confidently wrong, and what kinds of failure modes you'd never anticipate from reading the prompt.

The pattern itself is general. The same architecture — cron, a CLAUDE.md brief, internet access, a feedback loop — could work for:

The common thread: tasks where taste and judgement matter more than precision, where you'd happily delegate to a knowledgeable friend, and where the cost of a bad pick is low enough that learning from mistakes is fine. You wouldn't use this pattern for tax filing or medication management. You'd absolutely use it for anything where "I trust you, surprise me" is a reasonable brief.

What makes this different from existing AI assistants is the autonomy and the real-world action. It's not suggesting wines for me to review and approve. It's buying them. That closed loop — from judgement to action to consequence to feedback — is where the interesting questions live. Does an agent that faces consequences (even mild ones, like me saying "this wine was terrible") learn faster than one that just makes recommendations? Does the taste profile converge or drift? Does the agent develop something that looks like a consistent aesthetic sensibility, or does it just pattern-match against my ratings?

I don't know the answers yet. The point of running this as a real experiment rather than a demo is to find out.

9. Try It Yourself

The project is open source at github.com/h1whelan/sommelier-claude. To run your own:

  1. Fork the repo
  2. Edit taste-profile.md with your own preferences
  3. Add retailer credentials to ~/.sommelier-claude/credentials.json
  4. Set up run.sh in your crontab

The CLAUDE.md prompt is the sommelier's brain — tweak it for your budget, your retailers, your level of adventurousness. The purchase automation currently supports Majestic, BBR, and The Wine Society, but the pattern is easy to extend to other retailers.

Monthly picks will be published at /wine/ as they happen.


Tools used: Claude Code (Opus), Playwright, Telegram Bot API. Total code written by hand: ~15 lines of bash. Everything else was generated in conversation.

Published: 1 March 2026 · Author: Henry Whelan

View source on GitHub