The MLnotes Newsletter

The MLnotes Newsletter

What's the Headache with Data For ML❓

Angelina Yang's avatar
Angelina Yang
May 10, 2022
∙ Paid

What is data for ML?

In the industry this is also called: "model-ready data". The key difference between data for ML vs. others is this "readiness", which is typically prepared through preprocessing pipelines, feature transformations and engineering. These pipelines can be part of the models (sklearn preprocessing), or ETL steps (Spark, DBT, Airflow).

In …

Keep reading with a 7-day free trial

Subscribe to The MLnotes Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 MLnotes · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture