DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 5 months ago

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 5 months ago

I think the key part is that you can run these large scale models cheaply in terms of energy cost. The price of hardware will inevitably come down going forward, but now we know that there is no fundamental blocker for running models efficiently.

pinguinu [any]@lemmygrad.ml · 5 months ago

I generally agree, but given how niche a powerful SoC like this is, I doubt it matters right now (<5 years). I understand it proves a point, but I wager there’s still a long-ish way to see power-efficient hardware like this available for cheaper (which will most likely come from China natively)

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 5 months ago

Yeah, a 5 year or so timeline before we see SoC design becomes dominant is a good guess. There are other interesting ideas like analog chips that have potential to drastically cut power usage for neural networks as well. Next few years will be interesting to watch.