The two numbers that decide local LLMs: 100 tokens/sec and 1M context
I ran a 754B model on a 512GB Mac Studio. It fits, it crawls, and its 1M context flag is a warning message and a silent cap. Where local models fail as coding tools, and the one job they are great at.