Data Science Interview Challenge
Welcome to today's data science interview challenge! Today’s challenge is inspired by a Huggingface Transformer Lecture (2022 version) at Stanford! Relax!
A warm up question 🤓:
See if you can tell me (without writing down) what the code looks like that creates a torch.tensor
with the following contents:
Now tell me what the code look like to compute the average of each row (.mean()
) and each column. What's the shape of the results?
I usually don’t do live coding questions but this one is straightforward and you should be able to speak while thinking. Have fun!
Now back to the basics:
Question 1: What does the tokenizer do for a language model?
Question 2: The BERT model is a ground breaking model in the development of large language models. What does it look at? How to explain the model or the attention mechanism?

Here are some tips for readers' reference:
Warm up Question :
Is the following what you are envisioning?
Question 1:
Pretrained models are implemented along with tokenizers that are used to preprocess their inputs. The tokenizers take raw strings or list of strings and output what are effectively dictionaries that contain the the model inputs.
Check the lecturer’s explanation below! (To jump to the answer scroll to roughly 3 minutes of the lecture.)
Keep reading with a 7-day free trial
Subscribe to The MLnotes Newsletter to keep reading this post and get 7 days of free access to the full post archives.