1) Most local bots are not even ChatGPT3.5 level 2) Inference speed is often...

@mistersql July 18, 2024

@AlSweigart 1) Most local bots are not even ChatGPT3.5 level
2) Inference speed is often usable slow*
3) The hardware to run inference at a reasonable speed is ruinously expensive ($6000 is table stakes)

Self-replies

July 18, 2024

@AlSweigart * The smallest (dumbest) models on a small context and a small max output can be fast, but except for NPCs in games, these are going to have the most limited application

I think some chip/card maker is going to have to make consumer grade LLM cards. Graphics cards don't have enough memory. Or some other memory efficient breakthru is needed.