What's the chance of getting Opus 4.5-level models running locally in the future...

dragonwriter · 2026-01-14T21:29:05 1768426145

So, there are two aspects of that:

(1) Opus 4.5-level models that have weights and inference code available, and

(2) Opus 4.5-level models whose resource demands are such that they will run adequately on the machines that the intended sense of “local” refers to.

(1) is probable in the relatively near future: open models trail frontier models, but not so much that that is likely to be far off.

(2) Depends on whether “local” is “in our on prem server room” or “on each worker’s laptop”. Both will probably eventually happen, but the laptop one may be pretty far off.

SOLAR_FIELDS · 2026-01-14T20:53:30 1768424010

Probably not too far off, but then you’ll probably still want the frontier model because it will be even better.

Unless we are hitting the maxima of what these things are capable of now of course. But there’s not really much indication that this is happening

woggy · 2026-01-14T20:56:55 1768424215

I was thinking about this the other day. If we did a plot of 'model ability' vs 'computational resources' what kind of relationship would we see? Is the improvement due to algorithmic improvements or just more and more hardware?

chasd00 · 2026-01-14T21:51:36 1768427496

i don't think adding more hardware does anything except increase performance scaling. I think most improvement gains are made through specialized training (RL) after the base training is done. I suppose more GPU RAM means a larger model is feasible, so in that case more hardware could mean a better model. I get the feeling all the datacenters being proposed are there to either serve the API or create and train various specialized models from a base general one.

ryoshu · 2026-01-14T21:07:49 1768424869

I think the harnesses are responsible for a lot of recent gains.

NitpickLawyer · 2026-01-14T21:11:27 1768425087

Not really. A 100 loc "harness" that is basically a llm in a loop with just a "bash" tool is way better today than the best agentic harness of last year.

Check out mini-swe-agent.

SOLAR_FIELDS · 2026-01-15T01:04:13 1768439053

Everyone is currently discovering independently that “Ralph Wigguming” is a thing

gherkinnn · 2026-01-14T21:13:20 1768425200

Opus 4.5 is at a point where it is genuinely helpful. I've got what I want and the bubble may burst for all I care. 640K of RAM ought to be enough for anybody.

dust42 · 2026-01-14T21:16:35 1768425395

I don't get all this frontier stuff. Up to today the best model for coding was DeepSeek-V3-0324. The newer models are getting worse and worse trying to cater for an ever larger audience. Already the absolute suckage of emoticons sprinkled all over the code in order to please lm-arena users. Honestly, who spends his time on lm-arena? And yet it spoils it for everybody. It is a disease.

Same goes for all these overly verbose answers. They are clogging my context window now with irrelevant crap. And being used to a model is often more important for productivity than SOTA frontier mega giga tera.

I have yet to see any frontier model that is proficient in anything but js and react. And often I get better results with a local 30B model running on llama.cpp. And the reason for that is that I can edit the answers of the model too. I can simply kick out all the extra crap of the context and keep it focused. Impossible with SOTA and frontier.

greenavocado · 2026-01-14T20:58:56 1768424336

GLM 4.7 is already ahead when it comes to troubleshooting a complex but common open source library built on GLib/GObject. Opus tried but ended up thrashing whereas GLM 4.7 is a straight shooter. I wonder if training time model censorship is kneecapping Western models.

sanex · 2026-01-14T21:07:12 1768424832

Glm won't tell me what happened in Tianenman square in 1989. Is that a different type of censorship?

lifetimerubyist · 2026-01-14T23:17:59 1768432679

Never because the AI companies are gonna buy up all the supply to make sure you can’t afford the hardware to do it.

teej · 2026-01-14T20:56:17 1768424177

Depends how many 3090s you have

woggy · 2026-01-14T20:57:37 1768424257

How many do you need to run inference for 1 user on a model like Opus 4.5?

ronsor · 2026-01-14T21:00:36 1768424436

8x 3090.

Actually better make it 8x 5090. Or 8x RTX PRO 6000.

worldsavior · 2026-01-14T21:04:03 1768424643

How is there enough space in this world for all these GPUs

filoleg · 2026-01-14T21:24:08 1768425848

Just try calculating how many RTX 5090 GPUs by volume would fit in a rectangular bounding box of a small sedan car, and you will understand how.

Honda Civic (2026) sedan has 184.8” (L) × 70.9” (W) × 55.7” (H) dimensions for an exterior bounding box. Volume of that would be ~12,000 liters.

An RTX 5090 GPU is 304mm × 137mm, with roughly 40mm of thickness for a typical 2-slot reference/FE model. This would make the bounding box of ~1.67 liters.

Do the math, and you will discover that a single Honda Civic would be an equivalent of ~7,180 RTX 5090 GPUs by volume. And that’s a small sedan, which is significantly smaller than an average or a median car on the US roads.

worldsavior · 2026-01-14T22:08:00 1768428480

What about what's around the GPU? Motherboard etc.

filoleg · 2026-01-16T19:49:29 1768592969

I didn’t do the napkin math on it earlier, because I don’t believe it really matters for making the point I was making.

I don’t care about looking up real numbers, so I will just overestimate heavily. Let’s say that for a large enough number of GPUs, the overhead of all the surrounding equipment would be around 20% (amortized).

So you can just take the number of GPUs I calculated in my previous comment, multiply by 0.8, and you get your answer.

worldsavior · 2026-01-17T20:55:45 1768683345

This is not 20% , it's 100%+.

antonvs · 2026-01-15T06:13:33 1768457613

Now factor in power and cooling...

reactordev · 2026-01-15T06:32:26 1768458746

Don’t forget to lease out idle time to your neighbors for credits per 1M tokens…

Forgeties79 · 2026-01-14T21:12:51 1768425171

Milk crates and fans, baby. Party like it’s 2012.

adastra22 · 2026-01-15T01:47:51 1768441671

48x 3090’s actually.

_flux · 2026-01-15T10:00:21 1768471221

None, if you have time to wait, and a bit of memory on the computer.

kgwgk · 2026-01-14T21:29:27 1768426167

99.99% but then you will want Opus 42 or whatever.

rvz · 2026-01-14T22:26:45 1768429605

Less than a decade.

heliumtera · 2026-01-14T21:29:08 1768426148

RAM and compute is sold out for the future, sorry. Maybe another timeline can work for you?