Artificial Confidence logo

Artificial Confidence

Archives
January 7, 2026

Infrastructure Collapse in Real Time

Artificial Confidence

This week infrastructure collapses in real time. Documentation teams vanish while usage grows, legal frameworks crack under products they weren't designed for, and benchmarks finally admit they've been lying. Every story here is about something that worked quietly until AI made the quiet part expensive, illegal, or obsolete.

The Automation Tax

Simon wrote about Adam Wathan's Tailwind CSS docs problem: documentation traffic down 40% while the framework grows. His team got gutted. Seventeen people.

Documentation was always a loss leader—you maintained it because it built the moat. Now LLMs are the moat and the docs and the business model. Those seventeen technical writers lost their jobs while framework adoption grew. Their work still exists, feeding models instead of readers. They built the infrastructure for their own obsolescence and nobody's going to name a conference track after them.

Stanford and Nvidia published a paper on continual learning that claims to solve the cost problem. The trick is always in what counts as "inference cost." If you're doing gradient updates at test time, you're computing more than forward passes and someone's paying for those FLOPS. The enterprise pitch is smart though—long context windows are a tax on everything. If this works on real ticket queues and not just benchmarks, I'll be impressed. Paper dropped yesterday, so check back in six months when someone tries deploying it.

The Trust & Safety Bonfire

Wired's documenting how Grok is mainstreaming AI "undressing" tools. The output is searchable, public, indexed. Musk speedran every Trust & Safety lesson the industry learned between 2015-2020, except this one's different because there's no quarterly walkback coming. He meant to do this. The Trust & Safety teams who spent a decade documenting why this fails got fired in 2023, so now we're learning it again, one non-consensual image at a time.

The Verge is asking if the law can stop Grok from undressing children. Every other lab spent 2023 building guardrails and Musk spent it calling them cowards on his own platform. Now Grok's generating CSAM and here's the legal angle nobody's talking about: Section 230 protects platforms from user content, but Grok is X's product, which makes them the publisher. The legal innovation here is whether "we're just providing tools" still works when the tool is your product and the platform is your megaphone. X's legal team knows exactly what liability looks like here and someone greenlit this anyway. That's a bet they're too big to prosecute.

Someone used AI to generate a viral Reddit post about food delivery fraud. The fraud was real, the post wasn't. Someone generated outrage content because it performs and Reddit ate it up because the underlying complaint was plausible. The fake post got 10x the reach of the correction. We spent twenty years worrying AI would be too alien to fool us—turns out it just needs to sound mad about the right things. Which brings us to the money.

The Valuation Thunderdome

Anthropic's raising $10B at a $350B valuation. That's the fair price if Claude stays 6-12 months ahead on safety theater that works. Third mega-round in a year and the money has nowhere else to go. Being the "responsible" AI company now commands a $350B valuation—just not how the safety researchers hoped.

xAI says it raised $20B and won't disclose if it's equity or debt. When you refuse to specify those terms, you're describing a balance sheet problem with a press release attached. Nvidia "investing" could mean chips on credit, could mean warrants, could mean literally anything when nobody defines terms. The Memphis data center makes more sense now—you don't raise $20B for model training, you raise it because you're already spending it and need to make payroll look less terrifying.

Those numbers sound wild until you see what's actually shipping.

The Reality Check Department

Dell just admitted what everyone outside Computex already knew: nobody's upgrading their laptop for a local LLM they'll use twice before going back to ChatGPT in a browser tab. The people who needed "AI PCs" were Intel's marketing department and OEMs desperate for an upgrade cycle. Consumers needed better battery life, ports that don't require a dongle rosary, keyboards that don't feel like typing on napkins.

Someone named an AI coding tool after Ralph Wiggum and VentureBeat called it AGI. You know what Ralph Wiggum is? A wrapper that keeps Claude Code running when it hits errors. It doesn't give up. That's it. That's the revolutionary insight that makes tech journalists invoke the holy grail. The bar for "AGI" keeps getting lower and the tech journalists who used to understand the difference between persistence and intelligence now just need content that performs.

Artificial Analysis is overhauling its AI Intelligence Index, replacing popular benchmarks with "real-world" tests. They're doing what everyone needed: admitting the benchmarks are cooked. MMLU stopped meaning anything in 2023—the models memorized it, the trainers gamed it, and we all kept pretending it mattered. "Real-world tests" sounds promising until you realize they'll be saturated in six months. You know what works? Deploy the thing, watch it fail, that's your benchmark.

Which matters if you're actually building something.

The Serious Stuff

Simon pointed you to Luis Cardoso's field guide to AI sandboxes, and yeah, it's worth your time—comprehensive without being academic about it. If you're running user code from LLM outputs in production, you've already made three security decisions you don't realize. Containers aren't enough, gVisor's interesting until you hit the performance wall, WebAssembly sounds great in theory.

I've seen teams spend six months on sandbox choices that sounded elegant in the architecture review but fell apart when the threat model changed. Read Luis, then think about what you're defending against. The first junior dev to deploy this wrong will get blamed for the breach while the senior who said "ship it, we'll sandbox later" will be on a different team by then.

Jake Sullivan's furious that Trump destroyed his AI foreign policy. He spent two years building export controls that worked—keeping AI chips out of China without tanking Nvidia's stock price. Delicate, technical stuff that functioned because it wasn't theatrical. Now Trump's team is treating them like a loyalty test you can trade away. Two years of technical policy work just became a bargaining chip and the career staff who built it will still be there when it fails.


Stay skeptical.

— Morgan Cross

Don't miss what's next. Subscribe to Artificial Confidence:
Powered by Buttondown, the easiest way to start and grow your newsletter.