Yes, humans have found that you don't need officially stamped statistics (and in many cases they're unreliable or "doctored" anyway), and that they can make general observations on their own, through something they call experience.
And a near universal experience with doctors for anybody paying attention is that.
One can reject it or accept it and improve upon it after checking its predictive power, or they can pause their thinking and wait for some authority to give them the official numbers on that.
>For example, you'd not want hear the same track twice in a row, even though this is bound to happen in a strictly random shuffling.
Why would it be? A random shuffling of a unique set remains a unique set.
It's only when "next song is picked at random each time from set" which you're bound to hear the same song twice, but that's not a random playlist shuffling (shuffling implies the new set is created at once).
Or when the set repeats, and the random order puts songs from the end of the first ordering of the set into the beginning of the second ordering of the set, so you quickly hear them twice.
They decided that they don't need to handle it, justifications being: some debt is fine, it wont be anything huge, newer AI will handle the debt too eventually, and if the quality suffers, who cares, companies didn't care much about putting out shitty software before either.
If crappy outsourced code was fine by modern standards for companies to churn, somewhat crap AI code will be too.
Partly because it's a good construct. Most people's writing is garbage compared to what LLMs output by default.
But the other part of it is, each conversation you have, and each piece of AI output you read online, is written by LLM instance that has no memory of prior conversations, so it doesn't know that, from human perspective, it used this construct 20 times in the last hour. Human writers avoid repeating the same phrases in quick succession, even across different writings (e.g. I might not reuse some phrase in email to person A, because I just used it in email to unrelated person B, and it feels like bad style).
Perhaps that's why reading LLM output feels like reading high school essays. Those essays all look alike because they're all written independently and each is a self-contained piece where the author tries to show off their mastery of language. After reading 20 of them in a row, one too gets tired of seeing the same few constructs being used in nearly every one of them.
>But the other part of it is, each conversation you have, and each piece of AI output you read online, is written by LLM instance that has no memory of prior conversations, so it doesn't know that, from human perspective, it used this construct 20 times in the last hour.
In theory we should be able to use these properties to detect LLM-generated output better if we can explore how they originate in the “default” trained feature space.
Very much so. It feels like it can't have been that common in the original training corpus. Probably more common now given that we are training slop generators with slop.
Every generation throughout time didn't have to compete with massive instant access to everything ever written to facilitate plagiarism, or with AI generated slop...
And everything wasn't "content", nor did they have massive numbers of influencers and public content creators, nor was there was a push even for laymen to churn heaps of text every day or to project an image to the whole world.
And until recently if you got caught plagiarizing you were shamed or even fired from journalism. Now it's just business as usual...
Follow that rule next time you read such a statement in a context that's not formal math.
reply