Still trying to keep bot from needing to regen whole file to edit it. Asking...
Still trying to keep bot from needing to regen whole file to edit it.
Asking the bot to use `edlin` is mixed results.
Asking the bot to create a unidiff patch and use git to apply it? Maybe?
Nope
- sometimes it nails it. Wow!
- sometimes it reads the file (or doesn't) and generates a patch for an entirely hallucinated file
- sometimes the unidiff format is wrong/corrupt. It requires careful line counting and line numbers to exactly fit a range.
Self-replies
Next, I'm going to see if it can do `sed` or `regex` replacements as an editing strategy. If and when it works, replace should be simpler than "use a text editor" or "create a perfect unidiff file"
Still trying to teach it to edit code. The bot is doing better with plain replace and plain insert at X. I think it thinks it is better at regex than it really is (or hoping the escapes make it thru json, python strings to the re library is too much to hope)
4 with tools is smart.
3.5 with tools is dumber? It is like the cognitive complexity of using tools uses up it's space for thinking.
But oh my, gpt4 is expensive.
Latest bot-with-tools weirdness: I ask the bot to edit a document. It uses a replace text tool. The tool can throw an exception (Fail!) and initially was returning nothing otherwise. So I decided it needed more feedback for the bot, so I began returning "Success".
It decided that "Success" meant, "The edit was exactly correct", but actually the bot had made a mess of the edit, it just wasn't so bad it threw an exception.
The bot is overconfident and reluctant to check its work with anything similar to a unit test, so it has achieved "mid career software developer" level of sentience.