Hacker Newsnew | past | comments | ask | show | jobs | submit | versteegen's commentslogin

Paid/API LLM inference is profitable, though. For example, DeepSeek R1 had "a cost profit margin of 545%" [1] (ignoring free users and using a placeholder $2/hour figure H800 GPU, which seems ballpark of real to me due to Chinese electricity subsidies). Dario has said each Anthropic model is profitable over its lifetime. (And looking at ccusage stats and thinking Anthropic is losing thousands per Claude Code user is nonsense, API prices aren't their real costs. That's why opencode gives free access to GLM 4.7 and other models: it was far cheaper than they expected due to the excellent cache hit rates.) If anyone ran out of money they would stop spending on experiments/research and training runs and be profitable... until their models were obsolete. But it's impossible for everyone to go bankrupt.

[1] https://github.com/deepseek-ai/open-infra-index/blob/main/20...


I don’t think the current industry can survive without both frontier training and inference.

Getting rid of frontier training will mean open source models will very quickly catch up. The great houses of AI need to continue training or die.

In any case, best of luck (not) to the first house to do so!


That's more of "cloud compute makes money" than "AI makes money".

If the models stop being updated, consumer hardware catches up and we can all just run them locally in about 5 years (for PCs, 7-10 for phones), at which point who bothers paying for a hosted model?


It's excellent that you're working on loneliness! Somehow. What is it your startup actually does?

Yes, unfortunate that people keep perpetuating that misquote. What he actually said was "we are not far from the world—I think we’ll be there in three to six months—where AI is writing 90 percent of the code."

https://www.cfr.org/event/ceo-speaker-series-dario-amodei-an...


That's pretty good, I'm jealous! The last time I reinstalled my OS (Slackware) from scratch was 2009, but I run into serious problems every couple of years when upgrading it to 'Slackware64-current' pre-release, because Slackware's package manager doesn't track dependencies and you can just install stuff in the wrong order: I usually don't upgrade the whole OS at once... just have to fix any .so link errors (I've got a script to pull old libraries from btrfs snapshots). I've even ended up without a working libc more than once! When you can't run any program it sure is useful that you can upgrade everything aside from the kernel without rebooting!


Then what do you say to 6.14" × 2.61" × 0.11 mm = 102 cm³


I have no idea at all whether the GCP "Service Specific Terms" [1] apply to Gemini CLI, but they do apply to Gemini used via Github Copilot [2] (the $10/mo plan is good value for money and definitely doesn't use your data for training), and states:

  Service Terms
  17. Training Restriction. Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.
[1] https://cloud.google.com/terms/service-terms

[2] https://docs.github.com/en/copilot/reference/ai-models/model...


Thanks for those links. GitHub Copilot looks like a good deal at $10/mo for a range of models.

I originally thought they only supported the previous generation models i.e. Claude Opus 4.1 and Gemini 2.5 Pro based on the copy on their pricing page [1] but clicking through [2] shows that they support far more models.

[1] https://github.com/features/copilot#pricing

[2] https://github.com/features/copilot/plans#compare


Yes, it's a great deal especially because you get access to such a wide range of models, including some free ones, and they only rate limit for a couple minutes at a time, not 5 hours. And if you go over the monthly limit you can just buy more at $0.04 a request instead of needing to switch to a higher plan. The big downside is the 128k context windows.

Lately Copilot have been getting access to new frontier models the same day they release elsewhere. That wasn't the case months ago (GPT 5.1). But annoyingly you have to explicitly enable each new model.


Yeah Github of course has proper enterprise agreements with all the models they offer and they include a no-training clause. The $10/mo plan is probably the best value for money out there currently along with Codex $20/mo (if you can live with GPT's speed).


That's an interesting observation. I'd suggest modelling the LLM's behaviour in that situation as selecting between different simple strategies, each of which has its own transition function. Some of the strategies will be far more common than others. Some of them may be very simple and obey the detailed balance condition (meaning they are reversible Markov chains), but others, and the overall transition function does not.

The definition of the detailed balance condition is very strict and it's obvious that it won't be met in general by most probabilistic programs (sets of rules with probabilistic output) even if you consider only those where all possible outputs have non-zero probability (as required by detailed balance).

And the LLM+agent is only a Markov chain because of the limited state space of the agent. While an LLM is adding to its context window without reaching the window size limit, it is not a Markov chain, as I explained here: https://news.ycombinator.com/item?id=45124761

And, agreed that better optimisation would be incredible. (I would describe it as a search problem.) I'm not sure how feasible it is improve without changing the architecture, e.g. to a diffusion language model. But LLMs already predict many tokens ahead at once which is why beam search is surprisingly unnecesarr. That's how they're able to write coherent sentences (and rhymes), they've already largely determined at the beginning what they're going to write. (See Anthropic mech interp work.) So maybe if we could tap into that we search over vaguely-formed next blocks of text rather than next words.


Since it took me some minutes to find the description of the task, here it is:

We conducted experiments on three different models, including GPT-5 Nano, Claude-4, and Gemini-2.5-flash. Each model was prompted to gener- ate a new word based on a given prompt word such that the sum of the letter indices of the new word equals 100. For example, given the prompt “WIZ- ARDS(23+9+26+1+18+4+19=100)”, the model needs to generate a new word whose letter indices also sum to 100, such as “BUZZY(2+21+26+26+25=100)”


One interesting and ironic part of the article is that one of the mentioned optics research groups has been submitting a lot of patents on EUV sources. Are we meant to be mad about it?


So how do the models compare in your experience?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: