RedPajama Reproducing LLaMA🦙 Dataset on 1.2 Trillion Tokens
The llama in the pajama picture is just too cute not to use.

What is RedPajama?
Redpajama is a project that recreated the LLaMA training dataset of over 1.2 trillion tokens.
More importantly, they are making the dataset open.
Even more, they aim to create a set of leading, fully open-source models.
The first step is to create the training s…
Keep reading with a 7-day free trial
Subscribe to The MLnotes Newsletter to keep reading this post and get 7 days of free access to the full post archives.

