You must log in or register to comment.
Nice of them to make even a 0.3B model, just too bad it was the only one that wasn’t MoE. I’ve been wanting more small MoEs since Qwen 30B A3B.
On a random note, I’d really love to see this approach explored more. It would be really handy to have models that can learn and evolve over time through usage https://github.com/babycommando/neuralgraffiti