This weekend I'm going to informally benchmark Gemini 3 by giving it code that...

@mistersql

This weekend I'm going to informally benchmark Gemini 3 by giving it code that 2.5 wrote and asking it to find bugs. Not the best benchmark because you could give 2.5 code that 2.5 wrote and it would find bugs too.

Self-replies

I've been wanting to try [Claude|Codex|Gemini|OpenCode] CLI, but from my experience with Codex, I just can't think of good tasks that
- won't bankrupt me
- would probably work
The agentic stuff needs to have a super focused loop- do this simple thing over and over. When the loop is, "make a complex decision over and over and pick one of 100s of possible tools" the bots performance is trash.

I do want to try the Cursor|Antigravity but only because I think it would be a better way to get a bot to write/update a UI.
but
- I don't like VSCode
- I suspect it would bankrupt me