Data Science Interview Challenge
Welcome to today's data science interview challenge! Here it goes:
Question 1: Is vanishing/exploding gradient just a RNN problem?
Question 2: What is skip connection?

Here are some tips for readers' reference:
Question 1:
No. It can be a problem for all neural architectures (including feed-forward and convolutional), especially deep ones. Although RNNs are more susceptible to these problems due to their recurrent nature, they are not exclusive to RNNs.
Key points from Stanford Lecturer Abby See:
Due to chain rule/choice of nonlinearity function, gradient can become vanishingly small as it back-propagates
Thus lower layers are learnt very slowly (hard to train)
Solution: modern deep architectures add more direct connections allowing the gradient to flow (like ResNet); as well as other solutions such as careful weight initialization, using activation functions like ReLU to mitigate vanishing gradients, and gradient clipping to control exploding gradients.
Let’s hear how Abby See explains this:
To hear the explanation, scroll the video to timestamp: 52:48 and remember to tune in a little longer to hear the full story!
Question 2:
A skip connection, also known as a residual connection, is a neural network architectural component used to facilitate the training and improve the performance of deep neural networks. It was introduced as a fundamental part of ResNet (Residual Network), which won the ImageNet competition in 2015 and significantly advanced the capabilities of deep learning.
Keep reading with a 7-day free trial
Subscribe to The MLnotes Newsletter to keep reading this post and get 7 days of free access to the full post archives.