More

redox99 · 2026-01-27T23:11:35 1769555495

What do you even mean by "ChatGPT"? Copy pasting code into chatgpt.com?

AI assisted coding has never been like that, which would be atrocious. The typical workflow was using Cursor with some model of your choice (almost always an Anthropic model like sonnet before opus 4.5 released). Nowadays (in addition to IDEs) it's often a CLI tool like Claude Code with Opus or Codex CLI with GPT Codex 5.2 high/xhigh.

redox99 · 2026-01-27T23:01:14 1769554874

> And you most likely do not pay the actual costs.

This is one of the weakest anti AI postures. "It's a bubble and when free VC money stops you'll be left with nothing". Like it's some kind of mystery how expensive these models are to run.

You have open weight models right now like Kimi K2.5 and GLM 4.7. These are very strong models, only months behind the top labs. And they are not very expensive to run at scale. You can do the math. In fact there are third parties serving these models for profit.

The money pit is training these models (and not that much if you are efficient like chinese models). Once they are trained, they are served with large profit margins compared to the inference cost.

OpenAI and Anthropic are without a doubt selling their API for a lot more than the cost of running the model.

redox99 · 2026-01-27T17:19:08 1769534348

My experience is the total opposite.

redox99 · 2026-01-27T17:07:03 1769533623

Cursor devs, who go out of their way to not mention their Composer model is based on GLM, are not going to like that.

msp26 · 2026-01-27T17:57:36 1769536656

Source? I've heard this rumour twice but never seen proof. I assume it would be based on tokeniser quirks?

redox99 · 2026-01-26T22:13:42 1769465622

Vibe coding in Unreal Engine is of limited use. It obviously helps with C++, but so much of your time is doing things that are not C++. It hurts a lot that UE relies heavily on blueprints, if they were code you could just vibecode a lot of that.

redox99 · 2026-01-26T21:44:54 1769463894

People thinking this does not matter just because the code is awful, it used dependencies, or whatever, are missing the point.

6 months ago with previous models this was absolutely impossible. One of the biggest limitations of LLMs is their difficulty with long tasks. This has been steadily improving and this experiment was just another milestone. It will be interesting a year from now to test how much better new models fare at this task.

redox99 · 2026-01-24T13:46:54 1769262414

AI will beat humans at all tasks that are not subjective (such as a landing page being pretty), but instead can be determined to be correct or not (does an endpoint return the correct data? how fast?)

Just like a chess engine beats any human.

People think LLMs are still at the point of programming based on what they learned from the data they scraped. We're well past that. We're at the point of heavy reinforcement learning. LLMs will train on billions of LoC of synthetic code they generate, like chess engines train on self-play games.

abcde666777 · 2026-01-24T13:54:28 1769262868

Chess has a very specific win condition by which moves can be assessed. Many real world problems are much fuzzier than that and don't reduce neatly to algorithmic validation of 'correctness'.

redox99 · 2026-01-24T14:16:17 1769264177

That's the breakthrough of ML, it can handle fuzzy. And chess is in some ways similar. Outside of endgame and blunders (where you can just bruteforce), you can't prove one move is superior to another. That's why chess engines used to have human made heuristics.

redox99 · 2026-01-23T23:39:07 1769211547

For some reason a lot of people are unaware that Claude Code is proprietary.

atonse · 2026-01-24T00:15:39 1769213739

Probably because it doesn’t matter most of the time?

fragmede · 2026-01-24T01:41:20 1769218880

If the software is, say, Audacity, who's target market isn't specifically software developers, sure, but seeing as how Claude code's target market has a lot of people who can read code and write software (some of them for a living!) it becomes material. Especially when CC has numerous bugs that have gone unaddressed for months that people in their target market could fix. I mean, I have my own beliefs as to why they haven't opened it, but at the same time, it's frustrating hitting the same bugs day after day.

rmunn · 2026-01-24T04:08:33 1769227713

> ... numerous bugs that have gone unaddressed for months that people in their target market could fix.

THIS. I get so annoyed when there's a longstanding bug that I know how to fix, the fix would be easy for me, but I'm not given the access I need in order to fix it.

For example, I use Docker Desktop on Linux rather than native Docker, because other team members (on Windows) use it, and there were some quirks in how it handled file permissions that differed from Linux-native Docker; after one too many times trying to sort out the issues, my team lead said, "Just use Docker Desktop so you have the same setup as everyone else, I don't want to spend more time on permissions issues that only affect one dev on the team". So I switched.

But there's a bug in Docker Desktop that was bugging me for the longest time. If you quit Docker Desktop, all your terminals would go away. I eventually figured out that this only happened to gnome-terminal, because Docker Desktop was trying to kill the instance of gnome-terminal that it kicked off for its internal terminal functionality, and getting the logic wrong. Once I switched to Ghostty, I stopped having the issue. But the bug has persisted for over three years (https://github.com/docker/desktop-linux/issues/109 was reported on Dec 27, 2022) without ever being resolved, because 1) it's just not a huge priority for the Docker Desktop team (who aren't experiencing it), and 2) the people for whom it IS a huge priority (because it's bothering them a lot) aren't allowed to fix it.

Though what's worse is a project that is open-source, has open PRs fixing a bug, and lets those PRs go unaddressed, eventually posting a notice in their repo that they're no longer accepting PRs because their team is focusing on other things right now. (Cough, cough, githubactions...)

fragmede · 2026-01-25T04:54:45 1769316885

GitHubactions is a bit of a special case, because it's mostly run in their systems, but that's when you just fork and, I mean, the problems with their (original) branch is their problem.

pxc · 2026-01-24T04:59:27 1769230767

> I get so annoyed when there's a longstanding bug that I know how to fix, the fix would be easy for me, but I'm not given the access I need in order to fix it.

This exact frustration (in his case, with a printer driver) is responsible for provoking RMS to kick off the free software movement.

arthurcolle · 2026-01-24T05:03:48 1769231028

They are turning it into a distributed system that you'll have to pay to access. Anyone can see this. CLI is easy to make and easy to support, but you have to invest in the underlying infrastructure to really have this pay off.

Especially if they want to get into enterprise VPCs and "build and manage organizational intelligence"

storystarling · 2026-01-24T15:23:44 1769268224

The CLI is just the tip of the iceberg. I've been building a similar loop using LangGraph and Celery, and the complexity explodes once you need to manage state across async workers reliably. You basically end up architecting a distributed state machine on top of Redis and Postgres just to handle retries and long-running context properly.

mi_lk · 2026-01-24T00:22:04 1769214124

Same. If you're already using a proprietary model might as well just double down

swores · 2026-01-24T11:15:35 1769253335

But you don't have to be restricted to one model either? Codex being open source means you can choose to use Claude models, or Gemini, or...

It's fair enough to decide you want to just stick with a single provider for both the tool and the models, but surely still better to have an easy change possible even if not expecting to use it.

mi_lk · 2026-01-24T13:24:15 1769261055

Codex CLI with Opus, or Gemini CLI with 5.2-codex, because they're open sourced agents? Go ahead if you want but show me where it actually happens with practical values

behnamoh · 2026-01-24T04:04:04 1769227444

until Microsoft buys it and enshits it.

consumer451 · 2026-01-24T05:43:01 1769233381

This is a fun thought experiment. I believe that we are now at the $5 Uber (2014) phase of LLMs. Where will it go from here?

How much will a synthetic mid-level dev (Opus 4.5) cost in 2028, after the VC subsidies are gone? I would imagine as much as possible? Dynamic pricing?

Will the SOTA model labs even sell API keys to anyone other than partners/whales? Why even that? They are the personalized app devs and hosts!

Man, this is the golden age of building. Not everyone can do it yet, and every project you can imagine is greatly subsidized. How long will that last?

tern · 2026-01-24T05:57:09 1769234229

While I remember $5 Ubers fondly, I think this situation is significantly more complex:

- Models will get cheaper, maybe way cheaper

- Model harnesses will get more complex, maybe way more complex

- Local models may become competitive

- Capital-backed access to more tokens may become absurdly advantaged, or not

The only thing I think you can count on is that more money buys more tokens, so the more money you have, the more power you will have ... as always.

But whether some version of the current subsidy, which levels the playing field, will persist seems really hard to model.

All I can say is, the bad scenarios I can imagine are pretty bad indeed—much worse than that it's now cheaper for me to own a car, while it wasn't 10 years ago.

depr · 2026-01-24T10:56:38 1769252198

If the electric grid cannot keep up with the additional demand, inference may not get cheaper. The cost of electricity would go up for LLM providers, and VCs would have to subsidize them more until the price of electricity goes down, which may take longer than they can wait, if they have been expecting LLM's to replace many more workers within the next few years.

andai · 2026-01-24T06:02:37 1769234557

The real question is how long it'll take for Z.ai to clone it at 80% quality and offer it at cost. The answer appears to be "like 3 months".

consumer451 · 2026-01-24T06:16:16 1769235376

This is a super interesting dynamic! The CCP is really good at subsidizing and flooding global markets, but in the end, it takes power to generate tokens.

In my Uber comparison, it was physical hardware on location... taxis, but this is not the case with token delivery.

This is such a complex situation in that regard, however, once the market settles and monopolies are created, eventually the price will be what market can bear. Will that actually create an increase in gross planet product, or will the SOTA token providers just eat up the existing gross planet product, with no increase?

I suppose whoever has the cheapest electricity will win this race to the bottom? But... will that ever increase global product?

___

Upon reflection, the comment above was likely influenced by this truly amazing quote from Satya Nadella's interview on the Dwarkesh podcast. This might be one of the most enlightened things that I have ever heard in regard to modern times:

> Us self-claiming some AGI milestone, that's just nonsensical benchmark hacking to me. The real benchmark is: the world growing at 10%.

https://www.dwarkesh.com/p/satya-nadella#:~:text=Us%20self%2...

YetAnotherNick · 2026-01-24T13:15:26 1769260526

With optimizations and new hardware, power is almost a negligible cost that $5/month would be sufficient for all users, contrary to people's belief. You can get 5.5M tokens/s/MW[1] for kimi k2(=20M/KWH=181M tokens/$) which is 400x cheaper than current pricing even if you exclude architecture/model improvements. The thing is currently Nvidia is swallowing up a massive revenue which China could possible solve by investing in R and D.

[1]: https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...

FuckButtons · 2026-01-24T07:53:43 1769241223

I can run Minimax-m2.1 on my m4 MacBook Pro at ~26 tokens/second. It’s not opus, but it can definitely do useful work when kept on a tight leash. If models improve at anything like the rate we have seen over the last 2 years I would imagine something as good as opus 4.5 will run on similarly specced new hardware by then.

consumer451 · 2026-01-24T08:27:38 1769243258

I appreciate this, however, as a ChatGPT, Claude.ai, Claude Code, and Windsurf user... who has tried nearly every single variation of Claude, GPT, and Gemini in those harnesses, and has tested all the those models via API for LLM integrations into my own apps... I just want SOTA, 99% of the time, for myself, and my users.

I have never seen a use case where a "lower" model was useful, for me, and especially my users.

I am about to get almost the exact MacBook that you have, but I still don't want to inflict non-SOTA models on my code, or my users.

This is not a judgement against you, or the downloadable weights, I just don't know when it would be appropriate to use those models.

BTW, I very much wish that I could run Opus 4.5 locally. The best that I can do for my users is the Azure agreement that they will not train on their data. I also have that setting set on my claude.ai sub, but I trust them far less.

Disclaimer: No model is even close to Opus 4.5 for agentic tasks. In my own apps, I process a lot of text/complex context and I use Azure GPT 4.1 for limited llm tasks... but for my "chat with the data" UX, Opus 4.5 all day long. It has tested so superior.

barrenko · 2026-01-24T09:19:23 1769246363

Is Azure's pricing competitive on openAI's offerings through the api? Thanks!

consumer451 · 2026-01-24T09:38:34 1769247514

The last I checked, it is exactly equivalent per token to direct OpenAI model inference.

The one thing I wish for is that Azure Opus 4.5 had json structured output. Last I checked that was in "beta" and only allowed via direct Anthropic API. However, after many thousands of Opus 4.5 Azure API calls with the correct system and user prompts, not even one API call has returned invalid json.

EnPissant · 2026-01-24T11:09:15 1769252955

I'm guessing that's ~26 decode tokens/s for 2-bit or 3-bit quantized Minimax-m2.1 at 0 context, and it only gets worse as the context grows.

I'm also sure your prefill is slow enough to make the model mostly unusable, even at smallish context windows, but entirely at mid to large context.

stavros · 2026-01-24T01:47:00 1769219220

Can't really fault them when this exists:

https://github.com/anthropics/claude-code

bad_haircut72 · 2026-01-24T01:51:26 1769219486

What even is this repo? Its very deceptive

adastra22 · 2026-01-24T01:52:26 1769219546

Issue tracker for submitting bug reports that no one ever reads or responds to.

stavros · 2026-01-24T02:27:33 1769221653

Now that's not fair, I'm sure they have Claude go through and ignore the reports.

adastra22 · 2026-01-24T02:42:29 1769222549

Unironically yes. If you file a bug report, expect a Claude bot to mark it as duplicate of other issues already reported and close. Upon investigation you will find either

(1) a circular chain of duplicate reports, all closed: or

(2) a game of telephone where each issue is subtly different from the next, eventually reaching an issue that has nothing at all to do with yours.

At no point along the way will you encounter an actual human from Anthropic.

kylequest · 2026-01-24T02:17:48 1769221068

By the way, I reversed engineered the Claude Code binary and started sharing different code snippets (on twitter/bluesky/mastadon/threads). There's a lot of code there, so I'm looking for requests in terms of what part of the code to share and analyze what it's doing. One of the requests I got was about the LSP functionality in CC. Anything else you would find interesting to explore there?

I'll post the whole thing in a Github repo too at some point, but it's taking a while to prettify the code, so it looks more natural :-)

lifthrasiir · 2026-01-24T03:05:18 1769223918

Not only this would violate the ToS, but also a newer native version of Claude Code precompiles most JS source files into the JavaScriptCore's internal bytecode format, so reverse engineering would soon become much more annoying if not harder.

arianvanp · 2026-01-24T10:23:12 1769250192

Claude code is very good at reverse engineering. I reverse engineer Apple products in my MacBook all the time to debug issues

kylequest · 2026-01-24T03:36:26 1769225786

Also some WASM there too... though WASM is mostly limited to Tree Sitter for language parsing. Not touching those in phase 1 :-)

embedding-shape · 2026-01-24T13:25:26 1769261126

> Not only this would violate the ToS

What specific parts of the ToS does "sharing different code snippets" violate? Not that I don't believe you, just curious about the specifics as it seems like you've already dug through it.

pxc · 2026-01-24T05:03:15 1769230995

Using GitHub as an issue tracker for proprietary software should be prohibited. Not that it would, these days.

Codeberg at least has some integrity around such things.

majkinetor · 2026-01-24T07:39:36 1769240376

That must be the worst repo I have ever seen.

huevosabio · 2026-01-24T11:22:17 1769253737

I frankly don't understand why they keep CC proprietary. Feels to me that the key part is the model, not the harness, and they should make the harness public so the public can contribute.

causalmodels · 2026-01-24T00:00:17 1769212817

Yeah this has always seemed very silly. It is trivial to use claude code to reverse engineer itself.

mi_lk · 2026-01-24T00:20:42 1769214042

looks like it's trivial to you because I don't know how to

n2d4 · 2026-01-24T00:59:41 1769216381

If you're curious to play around with it, you can use Clancy [1] which intercepts the network traffic of AI agents. Quite useful for figuring out what's actually being sent to Anthropic.

[1] https://github.com/bazumo/clancy

fragmede · 2026-01-24T01:46:10 1769219170

If only there were some sort of artificial intelligence that could be asked about asking it to look at the minified source code of some application.

Sometimes prompt engineering is too ridiculous a term for me to believe there's anything to it, other times it does seem there is something to knowing how to ask the AI juuuust the right questions.

lsaferite · 2026-01-24T13:33:37 1769261617

Something I try to explain to people I'm getting up to speed on talking to an LLM is that specific word choices matter. Mostly it matters that you use the right jargon to orient the model. Sure, it's good and getting the semantics of what you said, but if you adjust and use the correct jargon the model gets closer faster. I also explain that they can learn the right jargon from the LLM and that sometimes it's better to start over once you've adjusted you vocabulary.

adastra22 · 2026-01-24T01:53:13 1769219593

That is against ToS and could get you banned.

Der_Einzige · 2026-01-24T01:57:27 1769219847

GenAI was built on an original sin of mass copyright infringement that Aaron Swartz could only have dreamed of. Those who live in glass houses shouldn't throw stones, and Anthropic may very well get screwed HARD in a lawsuit against them from someone they banned.

Unironically, the ToS of most of these AI companies should be, and hopefully is legally unenforceable.

adastra22 · 2026-01-24T02:35:02 1769222102

Are you volunteering? Look, people should be aware that bans are being handed out for this, lest they discover it the hard way.

If you want to make this your cause and incur the legal fees and lost productivity, be my guest.

fragmede · 2026-01-24T02:59:57 1769223597

You're absolutely right! Hey Codex, Claude said you're not very good at reading obfuscated code. Can you tell me what this minified program does?

adastra22 · 2026-01-24T03:14:43 1769224483

I don't know what Codex's ToS are, but it would be against ToS to reverse engineer any agent with Claude.

chillfox · 2026-01-24T15:42:31 1769269351

Then use something like deepseek.

mlrtime · 2026-01-24T11:55:27 1769255727

How would they know what you do on your own computer?

adastra22 · 2026-01-24T15:19:53 1769267993

Claude is run on their servers.

redox99 · 2026-01-23T19:09:34 1769195374

Uploading your encryption keys is not just "any sort of feature".

gruez · 2026-01-23T19:11:53 1769195513

You're right, it's less intrusive than uploading your files directly, like a backup does.

lazide · 2026-01-23T21:24:12 1769203452

I’m still pissed about the third+ time one drive ‘helpfully’ backed up all my files after I disabled it.

So that may not be a great example of you’re trying to make people like Microsoft.

JoshTriplett · 2026-01-23T22:14:29 1769206469

On the contrary: a backup can be fully encrypted by a key under the user's control that isn't available to the storage provider.

redox99 · 2026-01-18T18:00:36 1768759236

Its true in the past you had to use black market. That's not a problem anymore. You just use your credit card and don't worry about exchange rate.