> WD-40 themselves have come out with improved "Specialist" formulations that mostly just copy other, superior products.
We all know that there is something better for the job than WD-40, its value comes from its convenience, affordability, availability, brand recognition, and the number of cases where it is "good enough".
The "specialist" brand is what its name imply, specialist products, all of them better for a specific application, but none of them as universal as the original. The original formulation is not magic, but it is the one we are familiar with and it works well enough when you don't have anything better for your specific job.
It had been shown that LLMs don't know how they work. They asked a LLM to perform computations, and explain how they got to the result. The LLM explanation is typical of how we do it: add number digit by digit, with carry, etc... But by looking inside the neural network, it show that the reality is completely different and much messier. None of it is surprising.
Still, feeding it back its own completely made up self-reflection could be an effective strategy, reasoning models kind of work like this.
The explanation becomes part of the context which can lead to more effective results in the next turn, it does work, but it does so in a completely misleading way
Which should be expected, since the same is true for humans. The "adding numbers digit by digit with carry" works well on paper, but it's not an effective method for doing math in your head, and is certainly not how I calculate 14+17. In fact I can't really tell you how I calculate 14+17 since that's not in the "inner monologue" part of my brain, and I have little introspection in any of the other parts
Still, feeding humans their completely made-up self-reflection back can be an effective strategy
The difference is that if you are honest and pragmatic and someone asked you how you added two numbers, you would only say you did long addition if that's what you actually did. If you had no idea what you actually did, you would probably say something like "the answer came to me naturally".
LLMs work differently. Like a human, 14+17=31 may come naturally, but when asked about their though process, LLMs will not self-reflect on their condition, instead they will treat it like "in your training data, when someone is asked how he added number, what follows?", and usually, it is long addition, so that is the answer you will get.
It is the same idea as to why LLMs hallucinate. They will imitate what their dataset has to say, and their dataset doesn't have a lot of "I don't know" answers, and a LLM that learns to answer "I don't know" to every question wouldn't be very useful anyways.
>if you are honest and pragmatic and someone asked you how you added two numbers, you would only say you did long addition if that's what you actually did. If you had no idea what you actually did, you would probably say something like "the answer came to me naturally".
To me that misses the argument of the above comment. The key insight is that neither humans nor LLMs can express what actually happens inside their neural networks, but both have been taught to express e.g. addition using mathematical methods that can easily be verified. But it still doesn't guarantee for either of them not to make any mistakes, it only makes it reasonably possible for others to catch on to those mistakes. Always remember: All (mental) models are wrong. Some models are useful.
Life lesson for you: the internal functions of every individual's mind are unique. Your n=1 perspective is in no way representative of how humans as a category experience the world.
Plenty of humans do use longhand arithmetic methods in their heads. There's an entire universe of mental arithmetic methods. I use a geometric process because my brain likes problems to fit into a spatial graph instead of an imaginary sheet of paper.
Claiming you've not examined your own mental machinery is... concerning. Introspection is an important part of human psychological development. Like any machine, you will learn to use your brain better if you take a peek under the hood.
> Claiming you've not examined your own mental machinery is... concerning
The example was carefully chosen. I can introspect how I calculate 356*532. But I can't introspect how I calculate 14+17 or 1+3. I can deliberate the question 14+17 more carefully, switching from "system 1" to "system 2" thinking (yes, I'm aware that that's a flawed theory), but that's not how I'd normally solve it. Similarly I can describe to you how I can count six eggs in a row, I can't describe to you how I count three eggs in a row. Sure, I know I'm subitizing, but that's just putting a word on "I know how many are there without conscious effort". And without conscious effort I can't introspect it. I can switch to a process I can introspect, but that's not at all the same
Right. Last time I checked this was easy to demonstrate with word logic problems:
"Adam has two apples and Ben has four bananas. Cliff has two pieces of cardboard. How many pieces of fruit do they have?" (or slightly more complex, this would probably be easily solved, but you get my drift.)
Change the wordings to some entirely random, i.e. something not likely to be found in the LLM corpus, like walruses and skyscrapers and carbon molecules, and the LLM will give you a suitably nonsensical answer showing that it is incapable of handling even simple substitutions that a middle schooler would recognize.
The lack of QA isn't felt right away. They are accumulating tech debt, which mean problems are becoming more frequent and harder to solve over time until they fix the fundamentals, and it doesn't feel like they intend to.
1. "isn't felt right away" then what's the correct timescale? Is it 2 years? Is it 5 years? We are looking at 10 years now. Do you have any studies on this that you can quote to prove that at Microsoft scale and for the product they develop, 10 years is the time when things go bad?
2. "becoming more frequent and harder to solve" how much more frequent and harder? Things works pretty fine during Windows 10, but these days I run into a bug in Windows 11 every other day myself.
It would be a surprise if this has more to do with QA from 2014 than vibe coding.
These are multipliers. First, the QA left, but nothing major happened for years, automated tests did suffice. Then, vibe code happened, that with the lack of QA, led to disaster.
I doubt "studies" exist and proving every little assumption takes too much effort as per Brandolini's law.
Updates breaking stuff already started when they moved from the security/bugfix-only updates to the add-new-features-into-the-mix model with Windows 10. That was roughly 10 years ago.
Windows is like a fractal layer of progressive enhancements. You can drill into esoteric windows features and almost physically see the different decades windows has existed in, not unlike a physical tree (with leaves).
They won't fix the fundamentals, the next API layer will just be built over the broken one.
I'm waiting in morbid anticipation of the obvious next broken layer: They'll rename Windows to CopilotOS, and 90% of how you interact with the OS is through a LLM chat box. Of course, as is historically the case with Windows, there will be that 10% not brought into the new way, so you'll need to launch a traditional windows desktop+start menu to access that stuff. Just like 90% of the system today uses modern UI, but there's still that 10% using the legacy Windows look and feel, like the Run dialog and the Disk/Device manager.
The biggest differentiator is price. An entry level Android phone is about $300 while an iPhone is in the $1000 range. And to be honest, anything more than an entry level Android is luxury these days. I say that because that's what I have and I have never felt held down, except maybe for pictures, but it is good enough for my (lack of) skills as a photographer.
So, Android may actually benefit from a lack of differentiation: like iOS, for a third of the price seems like a good value proposition.
Turns out that I rarely need to know sizes or indices of a UTF8 string in anything other than bytes.
If I write a parser for instance, usually, what to know is "what is the sequence of byte between this sequence of bytes and that sequence of bytes". That there are flag emojis or whatever in there don't matter, and the way UTF8 works ensures that a character representation doesn't partially overlap with a another.
What the byte sequences mean only really matters if you are writing an editor, so that you know how many bytes to remove when you press backspace for instance.
Truncation as to prevent buffer overflow seems to be a case where it would matter but not really. An overflow is an error and should be treated as such. Truncation is a safety mechanism, for when having your string truncated is a lesser evil. At that point, having half a flag emoji doesn't really matter.
1980s: 1 packet per keystroke is too much, we must find a solution to bundle them together, for efficiency (see Nagle's algorithm, delayed ACK), also let's send everything in plaintext, including passwords
2020s: ha! with some advanced probabilistic models, we may be able to deduce something about what is being typed behind one of our layers of encryption, let's sent 100 packets per keystroke to mitigate that
Unfortunate result of the security theater.. "Someone who has access to run privileged application can run side channel attacks! Let's drop cpu performance 20 percent over the world"
As I understood it’s enough to have “access to run privileged application” anywhere where the packet goes through. So, not necessarily at client or server sides. Or did I misunderstand?
What I find particularly ironic is that the title make it feel like Rust gives a 5x performance improvement when it actually slows thing down.
The problem they have software written in Rust, and they need to use the libpg_query library, that is written in C. Because they can't use the C library directly, they had to use a Rust-to-C binding library, that uses Protobuf for portability reasons. Problem is that it is slow.
So what they did is that they wrote their own non-portable but much more optimized Rust-to-C bindings, with the help of a LLM.
But had they written their software in C, they wouldn't have needed to do any conversion at all. It means they could have titled the article "How we lowered the performance penalty of using Rust".
I don't know much about Rust or libpg_query, but they probably could have gone even faster by getting rid of the conversion entirely. It would most likely have involved major adaptations and some unsafe Rust though. Writing a converter has many advantages: portability, convenience, security, etc... but it has a cost, and ultimately, I think it is a big reason why computers are so fast and apps are so slow. Our machines keep copying, converting, serializing and deserializing things.
Note: I have nothing against what they did, quite the opposite, I always appreciate those who care about performance, and what they did is reasonable and effective, good job!
> What I find particularly ironic is that the title make it feel like Rust gives a 5x performance improvement when it actually slows thing down.
Rust didn't slow them down. The inefficient design of the external library did.
Calling into C libraries from Rust is extremely easy. It takes some work to create a safer wrapper around C libraries, but it's been done for many popular libraries.
This is the first and only time I've seen an external library connected via a Rube Goldberg like contraption with protobufs in the middle. That's the problem.
Sadly they went with the "rewrite to Rust" meme in the headline for more clickability.
Writing Rust bindings for arbitrary C data structures is not hard. You just need to make sure every part of your safe Rust API code upholds the necessary invariants. (Sometimes that's non-trivial, but a little thinking will always yield a solution: if C code can do it, then it can be done, and if it can be done, then it can be done in Rust.)
What about the other way around? i recently had a use case where i needed a C shared library that persists complex C data structures into an RDBMS. Given my team had minimal C experience and this needed to be production grade. I ended up writing a thin C lib that offloads the heavy lifting to a sidecar go process. They interacted via protobuf over a local unix socket.
Would love to hear if i could've come up with a better design.
>> But had they written their software in C, they wouldn't have needed to do any conversion at all. It means they could have titled the article "How we lowered the performance penalty of using Rust".
That's not really fair. The library was doing serialization/deserialization which was poor design choice from a performance perspective. They just made a more sane API that doesn't do all that extra work. It might best be titles "replacing protobuf with a normal API to go 5 times faster."
BTW what makes you think writing their end in C would yield even higher performance?
> BTW what makes you think writing their end in C would yield even higher performance?
C is not inherently faster, you are right about that.
But what I understand is that the library they use works with data structures that are designed to be used in a C-like language, and are presumably full of raw pointers. These are not ideal for working in Rust, instead, presumably, they wrote their own data model in Rust fashion, which means that now, they need to make a conversion, which is obviously slower than doing nothing.
They probably could have worked with the C structures directly, resulting in code that could be as fast as C, but that wouldn't make for great Rust code. In the end, they chose the compromise of speeding up conversion.
Also, the use of Protobuf may be a poor choice from a performance perspective, but it is a good choice for portability, it allows them to support plenty of languages for cheaper, and Rust was just one among others. The PgDog team gave Rust and their specific application special treatment.
> which means that now, they need to make a conversion, which is obviously slower than doing nothing.
One would think. But since caches have grown so large, and memory speed and latency haven't scaled with compute, so long as the conversion fits in the cache and is operating on data already in the cache from previous operations, which admittedly takes some care, there's often an embarrassing amount of compute sitting idle waiting for the next response from memory. So if your workload is memory or disk or network bound, conversions can oftentimes be "free" in terms of wall clock time. At the cost of slightly more wattage burnt by the CPU(s). Much depends on the size and complexity of the data structure.
I wonder why they didn't immediately FFI it: C is the easiest lang to write rust binding for. It can get tedious if using many parts of a large API, but otherwise is straightforward.
I write most of my applications and libraries in Rust, and lament that most of the libraries I wish I would FFI are in C++ or Python, which are more difficult.
Protobuf sounds like the wrong tool. It has applications for wire serialization and similar, but is still kind of a mess there. I would not apply it to something that stays in memory.
It’s trivial to expose the raw C bindings (eg a -sys crate) because you just run bindgen on the header. The difficult part can be creating safe, high-performance abstractions.
No it’s not common for two pieces of code within a single process to communicate by serializing the protobuf into the wire format and deserializing it.
It’s however somewhat common to pass in-memory protobuf objects between code, because the author didn’t want to define a custom struct but preferred to use an existing protobuf definition.
Given they heavily used LLMs for this optimization, makes you wonder why they didn’t use them to just port the C library to rust entirely. I think the volume of library ports to more languages/the most performant languages is going to explode, especially given it’s a relatively deterministic effort so long as you have good tests and api contracts, etc
The underlying C library interacts directly with the postgres query parser (therefore, Postgres source). So unless you rewrite postgres in Rust, you wouldn't be able to do that.
Well then why didn’t they just get the LLM to rewrite all of Postgres too /s
I agree that LLMs will make clients/interfaces in every language combination much more common, but I wonder the impact it’ll have on these big software projects if more people stop learning C.
> they had to use a Rust-to-C binding library, that uses Protobuf for portability reasons.
That sounds like a performance nightmare, putting Protobuf of all things between the language and Postgres, I'm surprised such a library ever got popular.
pg_query (TFA) has ~1 million downloads, the postgres crate has 11 million downloads and the related tokio-postgres crate has over 33 million downloads. The two postgres crates currently see around 50x as much traffic as the (special-purpose) crate from the article.
edit: There is also pq-sys with over 12 million downloads, used by diesel, and sqlx-postgres with over 16 million downloads, used by sqlx.
Notably though, I believe neither tokio nor tokio-postgres parse SQL queries, they just pass them on the wire to the server. Generally the client side doesn't need to parse the query.
https://crates.io/crates/sqlparser has 48 million downloads, though. It's not exactly 100% compatible (yet!) but it's pretty darn great.
I certainly can't memorize Homer's work, and why would I? In exchange I can do so much more. I can find an answer to just about any question on any subject better than the most knowledgeable ancient Greek specialist, because I can search the internet. I can travel faster and further than their best explorers, because I can drive and buy tickets. I have no fighting experience, but give me a gun and a few hours of training and I could defeat their best champions. I traded the ability to memorize the equivalent of entire books to a set of skills that combined with modern technological infrastructure gives me what would be godlike powers at the time of the ancient Greeks.
In addition to these base skills, I also have specialized skills adapted to the modern world, that is my job. Combined with the internet and modern technology I can get to a level of proficiency that no one could get to in the ancient times. And the best part: I am not some kind of genius, just a regular guy with a job.
And I still have time to swipe on social media. I don't know what kind of brainless activities the ancient Greeks did, but they certainly had the equivalent of swiping on social media.
The general idea is that the more we offload to machines, the more we can allocate our time to other tasks, to me, that's progress, that some of these tasks are not the most enlightening doesn't mean we did better before.
And I don't know what economist mean by "productivity", but we can certainly can buy more stuff than before, it means that productivity must have increased somewhere (with some ups and downs). It may not appear in GDP calculations, but to me, it is the result that counts.
I don't count home ownership, because you don't produce land. In fact, that land is so expensive is a sign of high global productivity. Since land is one of the few things that we need and can't produce, the more we can produce the other things we need, the higher the value of land is, proportionally.
If there is no in-house storage to match, how does it help the grid? It is still needed for cold winter nights, where demand is high and solar panels produce nothing. Hydro can provide the power, but the grid will be running at full load.
Most houses in Canada are heated with natural gas. I'm not negating your overall comment, but in general, cold nights don't strain the grid because of heating needs.
(still good news, as most of Canada's electric generation is low carbon hydro, and the rest of fossil generation can be pushed out with storage and renewables, although I do not have a link handy by province how much fossil generation needs to be pushed out)
I live in New England. We do not have enough natural gas pipeline capacity to meet demand in long periods of very cold weather, and have very limited natural gas storage that can't buffer that for as long as a cold spell can last.
In these periods of time the grid traditionally keeps the lights on by switching over a significant portion of the grid to burning oil for power, and/or with the occasional LNG tanker load into Everett MA. These are both....pretty terrible and expensive solutions.
Burning less natural gas during the day still helps at night/at peak, because it means there's been less draw-down of our limited storage/more refill of it during the day, so we don't have to turn to worse options as heavily at night.
We all know that there is something better for the job than WD-40, its value comes from its convenience, affordability, availability, brand recognition, and the number of cases where it is "good enough".
The "specialist" brand is what its name imply, specialist products, all of them better for a specific application, but none of them as universal as the original. The original formulation is not magic, but it is the one we are familiar with and it works well enough when you don't have anything better for your specific job.
reply