60% of EY's citations were fake. Nobody checked.

by Luis Rodrigues

What I’m building with AI + one hot take + four links worth your time.

The Take

A friend called me from Australia this week. He's building a proptech company and using AI to help him speed up the process.

He defines documentation, architecture, and basically all specs with Claude. Then moves to Claude Code and it "forgets" to read some documents or invents some processes.

My friend only realizes Claude Code didn't respect the docs when he runs the build, and parts don't match what he wanted. "It feels like I have an intern who just says yes without understanding fully the problem space."

He's not the only one who has had problems with AI this week.

EY Canada retracted a 44-page cybersecurity report after GPTZero found that 60% of the references were fabricated. It had multiple fake footnotes and referenced a McKinsey report that does not exist. They even had contradicting data in the report.

Three people were the "authors", and nobody bothered to check whether the data was real.

We all say AI can't be trusted, then some of us publish without checking.

The worst part is that this EY report was syndicated by 60+ Australian newspapers. GPTZero also confirmed that ChatGPT, Claude, and Perplexity were referencing the wrong data as credible.

Wrong data from Big Four on the internet is basically "poisoning the well". Everyone who goes there to drink the data will be basing a decision on wrong data without knowing.

The solution is not "less AI".

Everyone needs to stop trusting the models and check their own work.

You can use a model to check another model's work. For example, write with Opus and cross-check with Gemini. Use Perplexity to check all the data points before publishing.

If you don't want to pay $20 more per month for an extra model (that you probably should if you're using a lot of AI for work), start a clean context and ask the same model to cross-check.

If you're building with AI and your workflow doesn't include a second opinion of a different model, you don't have a workflow. You have a huge trust problem that will bite you sooner or later.

Together with

Every meeting I have about Thoughtled generates action items, and I always forget some. Now Granola runs in the background, no bot joining the call, transcribes everything. New users get the first month free → https://www.granola.ai

The Build Review

TRMNL’s logistics software broke, and the vendor ghosted them. They ship e-ink displays from their warehouses in Georgia and Berlin, and suddenly had no system.

One person replaced a $20k/year platform with Claude in around 100 hours.

They took screenshots of ShipHero's UI, sent them to Claude Design, and got back high-fidelity mockups that worked the same way as the old interface.

A smart decision because it preserves the muscle memory the team had developed while using the previous software. The fastest way to fail when replacing a tool many people rely on for daily work is to make massive UI/UX changes.

They built something proper: Order locking to prevent multiple people from changing the same order, a Swift printing tool that routes labels to different printers in both warehouses. Automation rules that can be written in Python, Ruby, or Node to select a shipping company based on SKU and destination. Live integrations with UPS, FedEx, DHL, some of these with more than 1k lines of JSON.

With ShipHero, they were spending $0.76 per package. Now, after the change and accounting for platform maintenance, they estimate 2-3x cheaper.

They used the Superpowers skill (a Claude Code planning framework) to help them brainstorm, do deep architectural planning, and build tests before starting development.

What they don't mention is error handling for edge cases. What happens when FedEx changes the API spec?

This is when vibe-coded replacement tools can become dangerous. Imagine after a few months, shipping to Portugal began failing due to minor rule changes.

TRMNL did it right. They ran many tests and ensured the architecture was correct before building anything.

The difference between success and a massive failure with AI is largely determined by whether you trust an output that looks correct versus actually checking it.

If you're thinking of replacing a tool with an in-house build, check their article:
https://trmnl.com/blog/vibe-coding-shiphero

If you’re not building, look at a process that takes you a lot of time and try to build a tool in Claude Code or Lovable, if you prefer something simpler. If it works, it will make your days go by faster.

On My Radar

Grok Build: another terminal coding agent enters the ring

Subagents, worktree isolation, and headless mode. Six months ago, Claude Code was the only serious terminal agent.

Mistral's 20x growth

Their models aren't better, but European enterprises need to know where their data lives. If you're building B2B SaaS in Europe, you'll need this sooner than you think

Codex is beating Claude for non-developers

The packaging. One-click Skills in a single interface vs. three separate Claude products that don't talk to each other.

Claude for Small Business

QuickBooks, PayPal, HubSpot, Canva, DocuSign, and more are connected through Cowork. If you're building B2B SaaS for SMBs, this is the integration bar your product will be measured against.

Know someone without a strong workflow for AI? Forward them this.

New here? Someone forwarded this to you? → Subscribe to Build What Matters: https://luisrodrigues.ai/

How was this issue?

P.S. Subscribers get $5K+ in software credits through our Secret partnership (AWS, Loom, Notion, Lovable, and more). Haven't claimed yours yet? → https://build-what-matters.joinsecret.com/

Follow me on LinkedIn

60% of EY's citations were fake.

The Take

Together with

The Build Review

On My Radar

How was this issue?

Keep Reading

Quick Links

Subscription

Socials