Practical aspects of deep learning
Practical aspects of deep learning >> Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
1. If you have 10,000,000 examples, how would you split the train/dev/test set?
- 33% train . 33% dev . 33% test
- 98% train . 1% dev . 1% test
- 60% train . 20% dev . 20% test
2. The dev and test set should:
- Come from the same distribution
- Come from different distributions
- Be identical to each other (same (x,y) pairs)
- Have the same number of examples
3. If your Neural Network model seems to have high variance, what of the following would be promising things to try?
- Get more training data
- Make the Neural Network deeper
- Get more test data
- Add regularization
- Increase the number of units in each hidden layer
3. If your Neural Network model seems to have high variance, what of the following would be promising things to try?
- Make the Neural Network deeper
- Get more training data
- Add regularization
- Get more test data
- Increase the number of units in each hidden layer
4. You are working on an automated check-out kiosk for a supermarket, and are building a classifier for apples, bananas and oranges. Suppose your classifier obtains a training set error of 0.5%, and a dev set error of 7%. Which of the following are promising things to try to improve your classifier? (Check all that apply.)
- Increase the regularization parameter lambda
- Decrease the regularization parameter lambda
- Get more training data
- Use a bigger neural network
5. What is weight decay?
- Gradual corruption of the weights in the neural network if it is trained on noisy data.
- A regularization technique (such as L2 regularization) that results in gradient descent shrinking the weights on every iteration.
- A technique to avoid vanishing gradient by imposing a ceiling on the values of the weights.
- The process of gradually decreasing the learning rate during training.
6. What happens when you increase the regularization hyperparameter lambda?
- Weights are pushed toward becoming smaller (closer to 0)
- Weights are pushed toward becoming bigger (further from 0)
- Doubling lambda should roughly result in doubling the weights
- Gradient descent taking bigger steps with each iteration (proportional to lambda)
7. With the inverted dropout technique, at test time:
- You do not apply dropout (do not randomly eliminate units) and do not keep the 1/keep_prob factor in the calculations used in training
- You do not apply dropout (do not randomly eliminate units), but keep the 1/keep_prob factor in the calculations used in training.
- You apply dropout (randomly eliminating units) and do not keep the 1/keep_prob factor in the calculations used in training
- You apply dropout (randomly eliminating units) but keep the 1/keep_prob factor in the calculations used in training.
8.Increasing the parameter keep_prob from (say) 0.5 to 0.6 will likely cause the following: (Check the two that apply)
- Increasing the regularization effect
- Reducing the regularization effect
- Causing the neural network to end up with a higher training set error
- Causing the neural network to end up with a lower training set error
9. Which of these techniques are useful for reducing variance (reducing overfitting)? (Check all that apply.)
- Xavier initialization
- Vanishing gradient
- Gradient Checking
- Exploding gradient
- L2 regularization
- Dropout
- Data augmentation
10. Why do we normalize the inputs xx?
- It makes the parameter initialization faster
- Normalization is another word for regularization–It helps to reduce variance
- It makes it easier to visualize the data
- It makes the cost function faster to optimize