Hacker Newsnew | past | comments | ask | show | jobs | submit | biddit's commentslogin

I’ve been following Peter and his projects 7-8 months now and you fundamentally mischaracterize him.

Peter was a successful developer prior to this and an incredibly nice guy to boot, so I feel the need to defend him from anonymous hate like this.

What is particularly impressive about Peter is his throughput of publishing *usable utility software*. Over the last year he’s released a couple dozen projects, many of which have seen moderate adoption.

I don’t use the bot, but I do use several of his tools and have also contributed to them.

There is a place in this world for both serious, well-crafted software as well as lower-stakes slop. You don’t have to love the slop, but you would do well to understand that there are people optimizing these pipelines and they will continue to get better.


Yes!

pi is the best-architected harness available. You can do anything with it.

The creator, Mario, is a voice of reason in the codegen field too.

https://shittycodingagent.ai/

https://mariozechner.at/posts/2025-11-30-pi-coding-agent/


> I am writing this because almost no one talks about these issues openly, but everyone yelping about Claude Code.

Not sure where you frequent online, but there is ample discussion of these topics within certain niches on X. Happy to point out where to start if that's of interest to you.

As for CEOs, and I assume you're speaking of frontier model lab CEOs, they're pretty much all cashflow-negative at this point, requiring frequent funding raises. That requires a certain amount of overselling. That said, I feel like I've heard substantially fewer AGI claims the last six months...


Yep, this was a skillset combination I never envisioned. (am married to a Turk). Maybe a good reason to push past the intermediate plateau.

The dialog around AI resource use is frustratingly inane, because the benefits are never discussed in the same context.

LLMs/diffusers are inefficient from a traditional computing perspective, but they are also the most efficient technology humanity has created:

> AI systems (ChatGPT, BLOOM, DALL-E2, Midjourney) and human individuals performing equivalent writing and illustrating tasks. Our findings reveal that AI systems emit between 130 and 1500 times less CO2e per page of text generated compared to human writers, while AI illustration systems emit between 310 and 2900 times less CO2e per image than their human counterparts.

Source: https://www.nature.com/articles/s41598-024-54271-x


> In practice, it'll be incredible slow and you'll quickly regret spending that much money on it instead of just using paid APIs until proper hardware gets cheaper / models get smaller.

Yes, as someone who spent several thousand $ on a multi-GPU setup, the only reason to run local codegen inference right now is privacy or deep integration with the model itself.

It’s decidedly more cost efficient to use frontier model APIs. Frontier models trained to work with their tightly-coupled harnesses are worlds ahead of quantized models with generic harnesses.


Yeah, I think without a setup that costs 10k+ you can't even get remotely close in performance to something like claude code with opus 4.5.


10k wouldn't even get you 1/4 of the way there. You couldn't even run this or DeepSeek 3.2 etc for that.

Esp with RAM prices now spiking.


$10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).

The point in this thread is that it would likely be too slow due to prompt processing. (M5 Ultra might fix this with the GPU's new neural accelerators.)


> $10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).

Please do give that a try and report back the prefill and decode speed. Unfortunately, I think again that what I wrote earlier will apply:

> In practice, it'll be incredible slow and you'll quickly regret spending that much money on it

I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.


> Please do give that a try and report back the prefill and decode speed.

M4 Max here w/ 128GB RAM. Can confirm this is the bottleneck.

https://pastebin.com/2wJvWDEH

I weighed about a DGX Spark but thought the M4 would be competitive with equal RAM. Not so much.


I think the DGX Spark will likely underperform the M4 from what I've read.

However it will be better for training / fine tuning, etc. type workflows.


> I think the DGX Spark will likely underperform the M4 from what I've read.

For the DGX benchmarks I found, the Spark was mostly beating the M4. It wasn't cut and dry.


The Spark has more compute, so it should be faster for prefill (prompt processing).

The M4 Max has double the memory bandwidth, so it should be faster for decode (token generation).


> I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.

One RTX Pro 6000 is not going to be able to run GLM-4.7, so it's not really a choice if that is the goal.


No, but the models you will be able to run, will run fast and many of them are Good Enough(tm) for quite a lot of tasks already. I mostly use GPT-OSS-120B and glm-4.5-air currently, both easily fit and run incredibly fast, and the runners haven't even yet been fully optimized for Blackwell so time will tell how fast it can go.


You definitely could, the RTX Pro 6000 has 96 (!!!) gigs of memory. You could load 2 experts at once at an MXFP4 quant, or one expert at FP8.


No… that’s not how this works. 96GB sounds impressive on paper, but this model is far, far larger than that.

If you are running a REAP model (eliminating experts), then you are not running GLM-4.7 at that point — you’re running some other model which has poorly defined characteristics. If you are running GLM-4.7, you have to have all of the experts accessible. You don’t get to pick and choose.

If you have enough system RAM, you can offload some layers (not experts) to the GPU and keep the rest in system RAM, but the performance is asymptotically close to CPU-only. If you offload more than a handful of layers, then the GPU is mostly sitting around waiting for work. At which point, are you really running it “on” the RTX Pro 6000?

If you want to use RTX Pro 6000s to run GLM-4.7, then you really need 3 or 4 of them, which is a lot more than $10k.

And I don’t consider running a 1-bit superquant to be a valid thing here either. Much better off running a smaller model at that point. Quantization is often better than a smaller model, but only up to a point which that is beyond.


You don't need a REAP-processed model to offload on a per-expert basis. All MoE models are inherently sparse, so you're only operating on a subset of activated layers when the prompt is being processed. It's more of a PCI bottleneck than a CPU one.

> And I don’t consider running a 1-bit superquant to be a valid thing here either.

I don't either. MXFP4 is scalar.


Yes, you can offload random experts to the GPU, but it will still be activating experts that are on the CPU, completely tanking performance. It won't suddenly make things fast. One of these GPUs is not enough for this model.

You're better off prioritizing the offload of the KV cache and attention layers to the GPU than trying to offload a specific expert or two, but the performance loss I was talking about earlier still means you're not offloading enough for a 96GB GPU to make things how they need to be. You need multiple, or you need a Mac Studio.

If someone buys one of these $8000 GPUs to run GLM-4.7, they're going to be immensely disappointed. This is my point.


> If someone buys one of these $8000 GPUs to run GLM-4.7, they're going to be immensely disappointed. This is my point.

Absolutely, same if they get a $10K Mac/Apple computer, immense disappointment ahead.

Best is of course to start looking at models that fit within 96GB, but that'd make too much sense.


$10k is > 4 years of a $200/mo sub to models which are currently far better, continue to get upgraded frequently, and have improved tremendously in the last year alone.

This almost feels like a retro computing kind of hobby than anything aimed at genuine productivity.


I don't think the calculation is that simple. With your own hardware, there literally is no limits of runtime, or what models you use, or what tooling you use, or availability, all of those things are up to you.

Maybe I'm old school, but I prefer those benefits over some cost/benefit analysis across 4 years which by the time we're 20% through it, everything has changed.

But I also use this hardware for training my own models, not just inference and not just LLMs, I'd agree with you if we were talking about just LLM inference.


They are better in some ways, but they're also neutered.


> $10k gets you a Mac Studio with 512GB of RAM

Because Apple has not adjusted their pricing yet for the new ram pricing reality. The moment they do, its not going to be a $10k system anymore but in the $15k+...

The amount of wafers going to AI is insane and will influence not just memory prices. Do not forget, the only reason why Apple is currently immunity to this, is because they tend to make long term contracts but the moment those expire ... then will push the costs down consumers.


generous of you to predict apple only make it 50% expensive


Sorry for your loss - it's a terrible disease.

My mother is also data point - grew up on a farm where her father used it. She was diagnosed with Parkinson's 2018.


You need to think like an owner/operator to understand why you would defer to the system.

It’s to prevent employees from stealing. To “defer to the tag” requires a manual price override of some sort, which becomes an abuse vector.


Form a Nonprofit X and a Corp Y:

Noprofit X publishes outputs from competing AI, which is not copyrightable.

Corp Y injests content published by Nonprofit X.


You're assessing them with the wrong criteria.

You don't hire architects to execute a demolition and you also don't hire anyone heavily invested in keeping the building standing. But you DO hire people loyal to you to perform the work, who will receive staunch opposition the latter group of people.


This is the bit I don't get. Why are there so many people ready to line up to defend the powerful against the weak, the rich against the poor?

What a brave and noble purpose! I'm real proud of people like that...

The richest man on earth in charge of cost cutting. Go and bathe in a jacuzzi of cash Musk, and stop acting like a sore winner...


If you hired Trump's idiots to handle a demolition, they would:

1. Bring the building down on their own heads, with bystanders inside.

2. Destabilize the neighboring buildings to the point where they also had to be demolished.

3. Flood the site and cause a giant gas leak.


4. Use the self-made catastrophe as evidence that more buildings are dangerous and need to be demolished


It's a mistake to dismiss them all as idiots. MAGA people have very different values from you, and you need to consider the possibility that they are comeptent and malicious.


Seconding this. Their reasoning is unsound to the point of comedy, their arguments are nonsense, they have a fundamental misunderstanding of science, technology, society, etc.

Because they don't fucking care. They have an agenda. Sometimes it's rooted in bigotry, religion, or just antisocial beliefs, but they have an agenda and they can (and will, and are) execute it.


4. Pat yourself on the back while people complaining about egg prices cheer as their kids die of measles.


Cool. Now address the fact that all of this is an unconstitutional coup.


How is the executive branch directing a department under their authority a coup? Please be specific.


Because they're simply upending/ignoring Congressional mandates in the form of law. For example, Congress directed the creation of USAID as an independent agency in 1961. It cannot simply be abolished by executive fiat.


Actually in 1961 it was purely an executive thing. I think it was either 1978 or 1998 when Congress put it into statute.


https://en.wikipedia.org/wiki/United_States_Agency_for_Inter...

Sadly one of the references in that paragraph is a dead link now, because it references the USAID website which has been taken offline.


>abolished

That doesn't seem to have happened, it looks like Rubio is now in charge of it. So again, how is this a coup?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: