• adoxographer@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    9 days ago

    While this is great, the training is where the compute is spent. The news is also about R1 being able to be trained, still on an Nvidia cluster but for 6M USD instead of 500

    • orange@communick.news
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 days ago

      That’s becoming less true. The cost of inference has been rising with bigger models, and even more so with “reasoning models”.

      Regardless, at the scale of 100M users, big one-off costs start looking small.

    • vrighter@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 days ago

      if, on a modern gaming pc, you can get breakneck speeds of 5 tokens per second, then actually inference is quite energy intensive too. 5 per second of anything is very slow