DeepSeek launched a free, open-source large language model in late December, claiming it was developed in just two months at a cost of under $6 million.
I get the tech, and still agree with the preposter. I’d even go so far as that it probably worsens a lot currently, as it’s generating a lot of bullshit that sounds great on the surface, but in reality is just regurgitated stuff that the AI has no clue of. For example I’m tired of reading AI generated text, when a hand written version would be much more precise and has some character at least…
confidently so in the face of overwhelming evidence
That I’d really like to see. And I mean more than the marketing bullshit that AI companies are doing…
For the record I was one of the first jumping on the AI hype-train (as programmer, and computer-scientist with machine-learning background), following the development of GPT1-4, being excited about having to do less boilerplaty code etc. getting help about rough ideas etc. GPT4 was almost so far as being a help (similar with o1 etc. or Anthropics models). Though I seldom use AI currently (and I’m observing similar with other colleagues and people I know of) because it actually slows me down with my stuff or gives wrong ideas, having to argue, just to see it yet again saturating at a local-minimum (aka it doesn’t get better, no matter what input I try). Just so that I have to do it myself… (which I should’ve done in the first place…).
Same is true for the image-generative side (i.e. first with GANs now with diffusion-based models).
I can get into more details about transformer/attention-based-models and its current plateau phase (i.e. more hardware doesn’t actually make things significantly better, it gets exponentially more expensive to make things slightly better) if you really want…
I hope that we do a breakthrough of course, that a model actually really learns reasoning, but I fear that that will take time, and it might even mean that we need different type of hardware.
Any other AI company, and most of that would be legitimate criticism of the overhype used to generate more funding. But how does any of that apply to DeepSeek, and the code & paper they released?
Yeah it’ll be exciting to see where this goes, i.e. if it really develops into a useful tool, for certain. Though I’m slightly cautious non-the less. It’s not doing something significantly different (i.e. it’s still an LLM), it’s just a lot cheaper/efficient to train, and open for everyone (which is great).
I get the tech, and still agree with the preposter. I’d even go so far as that it probably worsens a lot currently, as it’s generating a lot of bullshit that sounds great on the surface, but in reality is just regurgitated stuff that the AI has no clue of. For example I’m tired of reading AI generated text, when a hand written version would be much more precise and has some character at least…
Try getting a quick powershell script from Microsoft help or spiceworks. And then do the same on GPT
What should I expect? (I don’t do powershell, nor do I have a need for it)
It’s one thing to be ignorant. It’s quite another to be confidently so in the face of overwhelming evidence that you’re wrong. Impressive.
That I’d really like to see. And I mean more than the marketing bullshit that AI companies are doing…
For the record I was one of the first jumping on the AI hype-train (as programmer, and computer-scientist with machine-learning background), following the development of GPT1-4, being excited about having to do less boilerplaty code etc. getting help about rough ideas etc. GPT4 was almost so far as being a help (similar with o1 etc. or Anthropics models). Though I seldom use AI currently (and I’m observing similar with other colleagues and people I know of) because it actually slows me down with my stuff or gives wrong ideas, having to argue, just to see it yet again saturating at a local-minimum (aka it doesn’t get better, no matter what input I try). Just so that I have to do it myself… (which I should’ve done in the first place…).
Same is true for the image-generative side (i.e. first with GANs now with diffusion-based models).
I can get into more details about transformer/attention-based-models and its current plateau phase (i.e. more hardware doesn’t actually make things significantly better, it gets exponentially more expensive to make things slightly better) if you really want…
I hope that we do a breakthrough of course, that a model actually really learns reasoning, but I fear that that will take time, and it might even mean that we need different type of hardware.
Any other AI company, and most of that would be legitimate criticism of the overhype used to generate more funding. But how does any of that apply to DeepSeek, and the code & paper they released?
Yeah it’ll be exciting to see where this goes, i.e. if it really develops into a useful tool, for certain. Though I’m slightly cautious non-the less. It’s not doing something significantly different (i.e. it’s still an LLM), it’s just a lot cheaper/efficient to train, and open for everyone (which is great).