https://arxiv.org/abs/2405.20304 they invented their own reinforcement learning framework called Group Relative Policy Optimization
EDIT: deepseek publicly released and published the model and methods to the global community, and there is now an open effort by researchers to reproduce them https://github.com/huggingface/open-r1 it is like the opposite of stealing
@deranger@theunknownmuncher the US trying to stifle Chinese progress/stop chip exports has had exactly what anyone could see. China is making leaps and bounds in all sorts of tech areas, innovating around obstacles
Like. You can compile better or more diverse datasets to train a model on. But you can also have better code training on the same dataset.
The model is what the code poops out after its eaten the dataset
I haven’t read the paper so no idea if the better training had to do with some super unique spin on their dataset but I’m assuming its better code.
deepseek is not stolen tech, it was trained using novel innovations that western companies were not doing
I thought the innovative part was using more efficient code, not what it’s trained on.
https://arxiv.org/abs/2405.20304 they invented their own reinforcement learning framework called Group Relative Policy Optimization
EDIT: deepseek publicly released and published the model and methods to the global community, and there is now an open effort by researchers to reproduce them https://github.com/huggingface/open-r1 it is like the opposite of stealing
Yeah the original comment in this chain more describes US Telcos and shit, not this particular instance.
thats capitalisms dark secret. Its only innovative when it has to be.
@deranger @theunknownmuncher the US trying to stifle Chinese progress/stop chip exports has had exactly what anyone could see. China is making leaps and bounds in all sorts of tech areas, innovating around obstacles
That’s what they said basically.
Like. You can compile better or more diverse datasets to train a model on. But you can also have better code training on the same dataset.
The model is what the code poops out after its eaten the dataset I haven’t read the paper so no idea if the better training had to do with some super unique spin on their dataset but I’m assuming its better code.