1) Most local bots are not even ChatGPT3.5 level 2) Inference speed is often...
@AlSweigart 1) Most local bots are not even ChatGPT3.5 level
2) Inference speed is often usable slow*
3) The hardware to run inference at a reasonable speed is ruinously expensive ($6000 is table stakes)
Self-replies
@AlSweigart * The smallest (dumbest) models on a small context and a small max output can be fast, but except for NPCs in games, these are going to have the most limited application
I think some chip/card maker is going to have to make consumer grade LLM cards. Graphics cards don't have enough memory. Or some other memory efficient breakthru is needed.