The MLnotes Newsletter

The MLnotes Newsletter

RedPajama Reproducing LLaMA🦙 Dataset on 1.2 Trillion Tokens

Angelina Yang's avatar
Angelina Yang
May 08, 2023
∙ Paid

The llama in the pajama picture is just too cute not to use.

Source: Together

What is RedPajama?

Redpajama is a project that recreated the LLaMA training dataset of over 1.2 trillion tokens.

More importantly, they are making the dataset open.

Even more, they aim to create a set of leading, fully open-source models.

The first step is to create the training s…

Keep reading with a 7-day free trial

Subscribe to The MLnotes Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 MLnotes · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture