DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

misk@sopuli.xyz · 5 months ago

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

Linktank@lemmy.today · 5 months ago

Okay, can somebody who knows about this stuff please explain what the hell a “token per second” means?

IndeterminateName@beehaw.org · 5 months ago

A bit like a syllable when you are talking about text based responses. 20 tokens a second is faster than most people could read the output so that’s sufficient for a real time feeling “chat”.

SteevyT@beehaw.org · 5 months ago

Huh, yeah that actually is above my reading speed assuming 1 token = 1 word. Although, I found that anything above 100 words per minute, while slow to read, feels real time to me since that’s about the absolute top end of what most people type.

IrritableOcelot@beehaw.org · 5 months ago

Not somebody who knows a lot about this stuff, as I’m a bit of an AI Luddite, but I know just enough to answer this!

“Tokens” are essentially just a unit of work – instead of interacting directly with the user’s input, the model first “tokenizes” the user’s input, simplifying it down into a unit which the actual ML model can process more efficiently. The model then spits out a token or series of tokens as a response, which are then expanded back into text or whatever the output of the model is.

I think tokens are used because most models use them, and use them in a similar way, so they’re the lowest-level common unit of work where you can compare across devices and models.

badcodecat@lemux.minnix.dev · 5 months ago

it’s a little similar to words per second

Mniot@programming.dev · 5 months ago

Not an answer to your question, but I thought this was a nice article for getting some basic grounding on the new AI stuff: https://arstechnica.com/science/2023/07/a-jargon-free-explanation-of-how-ai-large-language-models-work/