More

kianN · 2026-01-21T17:00:12 1769014812

I actually conducted a similar analysis back in December. I was more focused on discovering the topics that most resonated with the community but ended up digging into this phenomenon as well (specifically focusing on the probability of getting over 100 upvotes)

The really interesting thing is that the number of posts were growing exponentially by year, but it was only in 2025 that the probability of landing on the front page dropped meaningfully. I attributed this to macroeconomic climate, and found some (shaky) evidence of voting rings based on the topics that had a unusually high likelihood of gaining 10 points and an unusually low likelihood of reaching 100 points given that they reached 10.

Analysis here if anyone is interested: https://blog.sturdystatistics.com/posts/show_hn/

altairprime · 2026-01-21T20:40:55 1769028055

Please email the mods your shaky evidence; they care about that and have more detailed logs to investigate with!

kianN · 2026-01-21T21:30:28 1769031028

I did not conduct a deep dive into the specific examples: this was my takeaway from a slope plot comparing which topics clear a 10 point threshold (eg escape the new page) vs which topics clear a 100 point threshold.

> Nearly every AI related topic does worse once it clears the 10 point threshold than any other category. This means that either the people looking through the New and Show sections are disproportionately interested in AI. This is very possible, but from my interaction with this crowd from my posts, these users tend to be more technically minded (think DIY hardware, rather than landing-page builders).

Last visual in the following section: https://blog.sturdystatistics.com/posts/show_hn/#digging-int...

It's good to know that this would be helpful. My tendency would be to dig in a bit more into the individual examples that fall into this more suspicious bucket before presenting this evidence formally, but curious if you think these high level results are sufficiently helpful?

altairprime · 2026-01-21T21:39:05 1769031545

No, seriously, you should find out by emailing the mods. Footer contact link. They’re not going to be upset at you bringing tentative concerns about voting rings with shaky evidence, so long as you aren’t knocking down the door with overconfidence and denying how shaky the evidence is — which you clearly aren’t.

I’m not even remotely equipped to judge the veracity of your work, but they are, and that you care at all is, like, 0.000001%. Take the plunge and write them a note (or simply link them your comment thread here with a one sentence FYI email!). It’ll be fine :)

kianN · 2026-01-22T01:22:02 1769044922

I just sent an email. Thank you for the push!

riku_iki · 2026-01-22T17:20:59 1769102459

I don't believe it is possible to prove fraud base on just votes count. But data of which user upvote which post could generate very strong evidence.

altairprime · 2026-01-23T01:16:03 1769130963

The onus isn’t on us to prove; it’s to report concerns for assessment by the team responsible.

kianN · 2026-01-16T05:44:48 1768542288

When I was in high school, I got hit head on by a car while walking. It wasn’t going fast but I got thrown 1-2 feet in the air and landed hard on my backpack.

Both my Thinkpad and I (thanks to my Thinkpad) were totally fine, and I continued to use it for 4 more years.

hahahahhaah · 2026-01-16T05:50:15 1768542615

Physics says the thinkpad didn't save you if it was fine.

thesumofall · 2026-01-16T06:08:52 1768543732

That might be true for today’s laptops, but back then laptops had a lot more empty space to compress. Combined with a tough but flexible shell, the thinkpad might indeed have saved him!

riskassessment · 2026-01-16T06:12:38 1768543958

The thinkpad shell could have undergone elastic deformation which could reduce peak force.

kianN · 2026-01-04T16:30:54 1767544254

I don’t love these “X is Bayesian” analogies because they tend to ignore the most critical part of Bayesian modeling: sampling with detailed with detailed balance.

This article goes into the implicit prior/posterior updating during LLM inference; you can even go a step further and directly implement hierarchical relationships between layers with H-Nets. However, even under an explicit Bayesian framework, there’s a stark difference in robustness between these H-Nets and the equivalent Bayesian model with the only variable being the parameter estimation process. [1]

[1] https://blog.sturdystatistics.com/posts/hnet_part_II/

kianN · 2025-12-22T19:19:00 1766431140

I do the exact same thing and this was my first thought. To be fair, I would probably not be able to format tables in a single cope/paste

kianN · 2025-12-21T12:23:41 1766319821

An aside that I do want to mention here because it is a really unique way for many people to interface with LLMs: many commenters mention the model over indexing on a few comments they made that do not necessarily reflect of the broader themes of their writing. This is not any issue in the author’s engineering but an inherent issue in LLMs. The reason it is so noticeable in this case is because the subject matter is extremely familiar to the user: themselves.

LLMs consistently misrepresent information in this exact same way in, more critical applications. Because they are often employed on datasets that engineers and potentially end users are not deeply familiar with, the results often seem exceptional.

Disclaimer via my HN wrapped: “The Anti LLM Manifesto You will write a 5,000-word blog post on why a single Bayesian prior is more 'sentient' than GPT-6, and it will be ignored because the summary was generated by a 3B parameter model.”

kianN · 2025-12-08T04:22:32 1765167752

> How does our imagination shrink when we consider our options of what we create with code to be choosing between the outputs of the LLM rather than starting from the blank slate of our imagination?

This has been my biggest hesitancy with adopting these technologies. All of the things of which I’m most proud of building were built from a foundation of deep understanding of several domains, not from the solutions of a series of one offs problems, but from the process of solving them.

kianN · 2025-12-05T03:14:05 1764904445

This resonated a lot with me. Thank you for your articulate writing.

kianN · 2025-11-28T16:25:20 1764347120

This approach has really helped me out in my work. I do something very similar using DuckDB to slurp output files anytime I write a custom hierarchical model. The single sql queryable file simplified my storage and analytics pipeline. I imagine SQLite would be especially ideal where long term data preservation is critical.

spdegabrielle · 2025-11-28T17:23:51 1764350631

I think the developers had the same idea https://fossil-scm.org/

kianN · 2025-11-20T20:14:13 1763669653

I’m just continually amazed by the DuckDB team. We had built out a naive solution with OpenSSL to encrypt duckdb files, but that lead to a 2x runtime cost for first time queries and used up a lot of ram because we were encrypting/decrypting the entire file all at once. It seems like because DuckDB is encrypting at the page level and leveraging modern processors native AES operations, they are able to perform read/writes at practically no cost.

PunchyHamster · 2025-11-20T20:21:15 1763670075

Why not just LUKS ? Kernel level, leverages acceleration, transparent to anything you run on top of it.

DB encryption is useful if you have multiple things that need separate ACL and encryption keys but if it is one app one DB there is no need for it

beala · 2025-11-20T21:16:32 1763673392

From the article:

> This allows for some interesting new deployment models for DuckDB, for example, we could now put an encrypted DuckDB database file on a Content Delivery Network (CDN). A fleet of DuckDB instances could attach to this file read-only using the decryption key. This elegantly allows efficient distribution of private background data in a similar way like encrypted Parquet files, but of course with many more features like multi-table storage. When using DuckDB with encrypted storage, we can also simplify threat modeling when – for example – using DuckDB on cloud providers. While in the past access to DuckDB storage would have been enough to leak data, we can now relax paranoia regarding storage a little, especially since temporary files and WAL are also encrypted.

kianN · 2025-11-20T23:19:06 1763680746

We are in the separate ACL/encryption key bucket. We provide a Bayesian data analytics platform/api for other companies. Each company can have hundreds to thousands of datasets ("indices") each of which has a separate encryption key, and those keys are also stored encrypted with an organizational level key that is rotated daily.

letmetweakit · 2025-11-20T21:03:22 1763672602

I believe it's also to protect against the occasionally "lost" DB file.

notorious_pgb · 2025-11-20T20:49:54 1763671794

With respect, none of this sounds like "amazing" work on DuckDB's part. It's not bad work, either! It's competent work.

Comparing it to a naive approach (encrypting an entire database file in a single shot and loading it all into memory at once) is always going to make competent work seem "amazing".

I say this not to shit on DuckDB (I see no reason to shit on them); rather, I think it's important that we as professionals have realistic standards that we expect _ourselves_ to hit. Work we view as "amazing" is work we allow ourselves not to be able to replicate. But this is not in that category, and therefore, you should hold yourself to the same standard.

kianN · 2025-11-20T23:14:49 1763680489

I'm more amazed that they released this as part of their open-source offering (not clear from my above comment). Encryption is a standard lever for open-source projects to monetize.

I run a small company and needed to budget solid amount of chunk of time for next year to dig into improving this component of our system. I respect your perspective around holding high standards, but I do think it's worth getting excited about and celebrating reliable performant software that demonstrates consistent competence.

vjerancrnjak · 2025-11-21T08:56:49 1763715409

It’s just pipelining. Encryption is free compared to reads or writes to storage.

kianN · 2025-11-06T21:53:13 1762465993

https://archive.ph/7Y5wV