Could you try with some open-weighted models, e.g. Qwen3-coder, GLM-4.7 or Devst...

kevinday · 2026-01-21T21:07:35 1769029655

I tried GLM-4.7 running locally on a beefy GPU server, in about 3 minutes it got to 25846 cycles, but then struggled in circles for about 90 minutes without making any meaningful progress, making the same mistakes repeatedly and misdiagnosing the cause most of the time. It seems to understand what needs to happen to reach the goal, but keeps failing on the implementation side. It seemed to understand that to beat the target an entirely new approach would be required (it kept leaning towards a wavefront design), but wasn't seeing the solution due to the very limited ISA.