Pytania: DeepLearning_GPT3_questions / Memorizer

Pytanie 41

Which of the following is a step-size adjustment technique used in the stochastic gradient descent algorithm for optimization in deep learning?

Learning rate decay

Learning rate decay. Learning rate decay is a step-size adjustment technique used in the stochastic gradient descent algorithm for optimization in deep learning.

Pytanie 42

Which of the following is a disadvantage of using the gradient descent algorithm for optimization in deep learning?

It may converge to a local minimum

b) It may converge to a local minimum. The gradient descent algorithm is sensitive to the starting point and can converge to a local minimum instead of the global minimum, which is a disadvantage.

Pytanie 43

What is the gradient descent algorithm used for optimization in deep learning?

An algorithm used to find the minimum value of the objective function

b) An algorithm used to find the minimum value of the objective function. The gradient descent algorithm is an optimization algorithm used in deep learning to find the minimum value of the objective function.

Pytanie 44

What is the purpose of the DataLoader class in PyTorch?

To load the data in mini-batches during training

d) To load the data in mini-batches during training.

Explanation: The DataLoader class in PyTorch is used to load the data in mini-batches during training. It takes a dataset as input and allows users to specify various parameters such as the batch size, shuffle, and number of workers for loading the data efficiently.

Pytanie 45

Which of the following is true about the backward() method in PyTorch?

a) It computes the gradients of the loss function with respect to the model parameters.

Explanation: The backward() method in PyTorch is used to compute the gradients of the loss function with respect to the model parameters using automatic differentiation. These gradients can then be used to update the model parameters using an optimizer such as torch.optim.Adam.

Pytanie 46

What is the purpose of torch.nn.functional in PyTorch?

To provide a set of pre-defined functions for neural network operations

d) To provide a set of pre-defined functions for neural network operations.

Explanation: The torch.nn.functional module provides a set of pre-defined functions for neural network operations such as activation functions, loss functions, and pooling functions, among others. These functions can be used in conjunction with the torch.nn module to define and construct neural network models.

Pytanie 47

Which of the following is used for creating custom datasets in PyTorch?

torch.utils.data

Pytanie 48

What is the PyTorch module used for optimizing neural network parameters?

torch.optim

b) torch.optim. The torch.optim module in PyTorch provides various optimization algorithms for updating the parameters of neural networks during training, including stochastic gradient descent and its variants.

Pytanie 49

What is the PyTorch module used for building neural networks?

torch.nn

torch.nn. The torch.nn module in PyTorch provides several classes and functions for building neural networks, including various layers and activation functions.

Pytanie 50

Which of the following is not a PyTorch data type?

IntTensor

Answer: d) IntTensor. PyTorch supports various data types, including FloatTensor, DoubleTensor, and LongTensor, but not IntTensor.

Pytanie 51

What is the main difference between a traditional feedforward neural network and a recurrent neural network (RNN)?

RNNs can process sequential data of varying lengths while feedforward neural networks cannot.

a) RNNs can process sequential data of varying lengths while feedforward neural networks cannot. The main advantage of RNNs is their ability to take sequential data of varying lengths as input and produce an output at each time step.

Pytanie 52

What is the learning rate schedule used in Adam?

A learning rate that adapts based on the history of the gradients

d) A learning rate that adapts based on the history of the gradients. The learning rate in Adam is computed as a function of the moving averages of the gradient and the squared gradient, and is adjusted for bias correction. This allows the learning rate to be tailored to the history of the gradients, rather than

Pytanie 53

What is the role of the bias correction terms in Adam?

They correct for the fact that the moving averages start at zero

They correct for the fact that the moving averages start at zero. The bias correction terms are used to adjust the moving averages so that they are more accurate at the beginning of training, when the moving averages have not had time to stabilize.

Pytanie 54

What is the update rule for the moving average of the gradient in Adam?

m_t = beta_1 * m_t-1 + (1 - beta_1) * g_t

a) m_t = beta_1 * m_t-1 + (1 - beta_1) * g_t. This update rule computes a moving average of the gradient using an exponential decay. The hyperparameter beta_1 controls the rate of decay, with smaller values leading to more smoothing.

Pytanie 55

What is the key feature of Adam that distinguishes it from other optimization algorithms?

It adapts the learning rate for each parameter

Adam (Adaptive Moment Estimation) is a popular optimization algorithm used for training neural networks. It combines ideas from both Adagrad and RMSprop to provide adaptive learning rates that are tailored to each parameter.

Pytanie 56

Which of the following is an alternative to Adagrad that addresses its memory requirement issue?

Adadelta

a) Adadelta is an alternative to Adagrad that addresses its memory requirement issue. Adadelta replaces the historical gradients with a moving average of the historical squared gradients, which reduces the memory requirement while still allowing for adaptive learning rates.

Pytanie 57

What is the main disadvantage of Adagrad?

It requires a large amount of memory

c) One main disadvantage of Adagrad is that it requires a large amount of memory to store the historical gradients for each parameter. This can be a problem in large models with many parameters.

Pytanie 58

What is the advantage of using Adagrad over other optimization algorithms?

It can adapt the learning rate for each parameter separately

c) One advantage of Adagrad over other optimization algorithms is that it can adapt the learning rate for each parameter separately. This can be particularly useful in deep learning models where t

Pytanie 59

How does Adagrad adapt the learning rate for each parameter?

It uses a moving average of the past gradients for each parameter to scale the learning rate

c) Adagrad uses a moving average of the past gradients for each parameter to scale the learning rate. This means that the learning rate is reduced for parameters that have large gradients and increased for parameters that have small gradients.

Pytanie 60

What is Adagrad?

An optimization algorithm for training deep learning models

Pytania i odpowiedzi

DeepLearning_GPT3_questions

Inne tryby