There was a really great article or blog post published in the last few months about the author's very personal experience whose gist was "People complain that I sound/write like an LLM, but it's actually the inverse because I grew up in X where people are taught formal English to sound educated/western, and those areas are now heavily used for LLM training."
I wish I could find it again, if someone else knows the link please post it!
> These detectors, as I understand them, often work by measuring two key things: ‘Perplexity’ and ‘burstiness’. Perplexity gauges how predictable a text is. If I start a sentence, "The cat sat on the...", your brain, and the AI, will predict the word "floor."
I can't be the only one who's brain predicted "mat" ?
I've been critical of people that default to "an em dash being used means the content is generated by an LLM", or, "they've numbered their points, must be an LLM"
I do know that LLMs generate content heavy with those constructs, but they didn't create the ideas out of thin air, it was in the training set, and existed strongly enough that LLMs saw it as common place/best practice.
I think that the messaging around this is going to be pretty important in heading off gut-reaction "it's all or nothing locked in to their world" first takes. It's probably attractive marketing for things to be aimed at "look how easy it easy to use our entire ecosystem", but there's a risk to that too.
An interesting thought experiment would be a language/toolchain that would be permissive when generating debug builds, but hard-required warn-free to generate an optimized executable.
I wish I could find it again, if someone else knows the link please post it!
reply