What Does Batch Size Mean in Deep Learning? An In-Depth Guide

Written by Coursera Staff • Updated on

Learn about the hyperparameter batch size and how it affects the speed at which you train a deep learning model, such as a neural network.

[Featured Image] A computer engineer is programming the appropriate batch size on a computer.

Key takeaways

Batch size is a vital hyperparameter in machine learning and deep learning that determines how fast you can train a model.

  • The availability of computational resources, such as graphical processing units (GPUs), can affect your batch size. 

  • Three common types of batch processing include batch gradient descent, stochastic gradient descent, and mini-batch gradient descent. 

  • You can find the optimal batch size by experimenting with powers of two, managing your training throughput, considering training time, and re-tuning other hyperparameters

Discover more about the impact of batch size on training dynamics, the types of batch processing available, and how to optimize your batch size. Afterward, if you’re ready to strengthen your skills in deep learning, enroll in the Deep Learning Specialization from DeepLearning.AI. In just three months, you can explore how to build neural networks using TensorFlow, analyze variance for deep learning applications, implement vectorized neural networks, and more. 

 

What does batch size mean in deep learning?

In deep learning, the batch size is the number of training samples that pass forward and backward through a neural network in one epoch. Determining the correct batch size is crucial to the training process, as it helps determine the learning rate of the model. Research in deep learning continues to search for the optimal batch size for training, as some studies advocate for the largest batch size possible, while others think that smaller batch sizes are better. In training a model, researchers typically find the optimal batch size by trial and error and usually identify a size between two and 128. 

Impact on training dynamics: How does batch size affect training? 

Batch size impacts training dynamics in multiple ways since it affects a deep learning model's training time and resource consumption. Batch size also impacts training dynamics in the following ways:

  • With proper hyperparameter re-tuning, increasing the batch size decreases the number of steps to reach the intended performance.

  • Increasing the batch size may require the purchase of new hardware, such as additional GPUs.

  • If you’re using a cloud provider, increasing batch size may increase the usage costs billed to you by your provider.

  • As you increase your batch size, other hyperparameters, such as learning rate and regularization, need retuning, which is time-consuming and potentially complex.

Why is batch size 32?

When training a machine learning algorithm, a default batch size of 32 can be a good starting point for maximizing accuracy while minimizing training time. However, this is not a mandatory value, and practitioners can experiment with different batch sizes to find the optimal one for their particular model. 

Types of batch processing

Different types of batch processing or gradient descent exist depending on the needs of your deep learning model. Three popular options include:

  • Batch gradient descent

  • Stochastic gradient descent 

  • Mini-batch gradient descent

Each type of batch processing deals with data differently. Explore more about each type below. 

Batch gradient descent

Batch gradient descent, sometimes called gradient descent, performs error calculations for each sample in the training set. However, the algorithm's prediction only updates parameters after the entire data set has undergone an iteration. This makes the batch size equal to the data set's total number of training samples. Batch gradient descent is an efficient batch type at the risk of not always achieving the most accurate model. 

Stochastic gradient descent

Stochastic gradient descent (SGD) updates its parameters after each training sample passes through the model. This means that the batch size is set to one. This makes SGD sometimes faster and more accurate than batch gradient descent. However, this speed and accuracy come at the cost of computational efficiency and can lead to noisy gradients as the error rate frequency jumps around with the constant updates. 

Mini-batch gradient descent

Mini-batch gradient descent combines the best of batch gradient descent and SGD into one method to achieve a balance of computational efficiency and accuracy. To do this, it splits the entire data set into smaller batches, runs those batches through the model, and updates the parameters after each smaller batch. The batch size for this method is larger than one but less than the total number of samples in the data set. 

Since deep learning models train using very large data sets, mini-batch gradient descent is the most common neural network training method. 

Read more: Layer Normalization vs. Batch Normalization: What’s the Difference?

Optimizing batch size for performance

The optimal batch size when training a deep learning model is usually the largest one your computer hardware can support. By optimizing the batch size, you control the speed and stability of the neural network learning performance. However, batch size is not something you want to tune itself because, for every batch size you test, you need to tune the hyperparameters around it, such as learning rate and regularization. 

Finding the optimal batch size

Finding the optimal batch size when training your deep learning model is a process of trial and error, since calculating which batch size fits in your memory is difficult. Explore the following steps to help you find the optimal batch size when training a neural network:

  1. Create a set of batch size experiments that increase by the power of two (2, 4, 8, 16, 32, 64…) until you go beyond your hardware memory. 

  2. Consider the training throughput. Even if your hardware supports a larger batch size, once your training throughput no longer increases as batch size increases, use that as your maximum batch size. 

  3. Consider the training time of different batch sizes. If increasing batch size no longer reduces the number of training steps, then you use that as your maximum, since increasing it provides diminishing returns. 

  4. Ensure you properly re-tune other hyperparameters as you experiment with different batch sizes to achieve optimal model performance. The learning rate, momentum, and regularization are important hyperparameters to always re-tune for each batch size. 

Finding the optimal batch size is an important early step in developing your deep learning model, as re-tuning each hyperparameter later on becomes expensive, difficult, and time-consuming.

How to get started in deep learning

Deep learning builds on machine learning principles to create in-depth neural networks that function similarly to the human brain. It requires a basic understanding of linear algebra, data science, and programming. Once you have these basics down, consider these steps to enhance your knowledge of deep learning:

  1. You can take an online deep learning course like IBM's Introduction to Deep Learning & Neural Networks with Keras on Coursera.

  2. Learn about deep learning frameworks like TensorFlow, PyTorch, and Keras.

  3. Consider a Guided Project, such as Deep Learning with PyTorch: Siamese Network on Coursera, or create your own project based on an online example.

  4.  Practice regularly and find an online community to stay consistent.

Explore our free resources on machine learning 

Discover fresh insights into your career or learn about trends in your industry by subscribing to our LinkedIn newsletter, Career Chat. Or if you want to continue exploring machine learning, check out these free resources:

Accelerate your career growth with a Coursera Plus subscription. When you enroll in either the monthly or annual option, you’ll get access to over 10,000 courses. 

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.