Bot driven unit testing Yield, % of usable tests - titan 5% - gpt3.5 30% - gpt4...
Bot driven unit testing
Yield, % of usable tests
- titan 5%
- gpt3.5 30%
- gpt4 60%
Self-replies
Bot driven unit testing
"Hey this needs redesign" signal
- titan - bot struggles with all APIs, almost like watching a bad coding interview
- gpt3.5 - bot uses API, but gets confused if the API is bad.
- gpt4 - bot reads your mind and codes around a bad API
So point to the dumber bot.
Bot driven unit testing
"How did the human's workflow improve"
- titan - bot exercises the test generation script's failure handling. Bot's tests are deleted as not worth salvaging.
- gpt3.5 - All the easy to write tests are done. Tests didn't find bugs. Woot! Time saver, prevents future bugs.
- gpt4 - Bot wrote unit tests that demo bugs. Woot! Code is actually better now. You are also now $10 poorer.
Matt, why are you using Titan, you clearly hate it. It is cheap.
Other surprising elements
- Unit tests used to be 1 file per code under test. Now I got many, many batches of unit tests, each with 1 test file per CUT file.
- When code evolves, easier to regenerate a new suite of tests than to ask the bot to edit a test (or have a human edit a test)
- If it looks like an integration test, (i.e. testing the entrypoint of whole app), the bots make bad tests that pass only with massive mocking or a miracle