How do PyTorch code and TensorFlow code compare? Maybe you’re in the beginning phases of your machine learning journey and deciding which framework to embrace, or maybe you’re an experienced ML practicioner considering a change of framework. Either way, you’re in the right place.

If you’re interested in a high-level comparison between the frameworks, considering the popularity and capabilities of each, this article is a good resource. In this post, we’re going to dig a bit deeper and look at actual code. In my PyTorch, Keras, and TensorFlow posts, I explain how you can classify images from the Fashion MNIST dataset, while introducing key concepts of each of these machine learning frameworks. In this post, I’ll focus on comparing the code written in each of these frameworks.

I assume that you’re familiar with machine learning concepts, and that you’ve used at least one of the frameworks before. If you’ve read one of my the three introductory posts before (PyTorch, Keras, or TensorFlow), you’ll be well prepared to understand this post.

All the code shown in this post can be found on GitHub.

The Fashion MNIST dataset is a collection of 70,000 black-and-white images of articles of clothing, along with corresponding labels. The labels are represented by an integer from 0 to 9, with the following meaning:

```
labels_map = {
0: 'T-Shirt',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle Boot',
}
```

The PyTorch and TensorFlow versions of the code that loads the Fashion MNIST dataset appear very different, but their behavior is in fact quite similar.

```
PyTorch
```

```
def _get_data(batch_size: int) -> Tuple[DataLoader, DataLoader]:
"""Downloads Fashion MNIST data, and returns two DataLoader objects
wrapping test and training data."""
training_data = datasets.FashionMNIST(
root=DATA_PATH,
train=True,
download=True,
transform=ToTensor(),
)
test_data = datasets.FashionMNIST(
root=DATA_PATH,
train=False,
download=True,
transform=ToTensor(),
)
train_dataloader = DataLoader(training_data,
batch_size=batch_size,
shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=batch_size, shuffle=True)
return (train_dataloader, test_dataloader)
```

```
TensorFlow
```

```
def _get_data(batch_size: int) -> Tuple[tf.data.Dataset, tf.data.Dataset]:
"""Downloads Fashion MNIST data, and returns two Dataset objects
wrapping test and training data."""
(training_images, training_labels), (
test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()
train_dataset = tf.data.Dataset.from_tensor_slices(
(training_images, training_labels))
test_dataset = tf.data.Dataset.from_tensor_slices(
(test_images, test_labels))
train_dataset = train_dataset.map(lambda image, label:
(float(image) / 255.0, label))
test_dataset = test_dataset.map(lambda image, label:
(float(image) / 255.0, label))
train_dataset = train_dataset.batch(batch_size).shuffle(500)
test_dataset = test_dataset.batch(batch_size).shuffle(500)
return (train_dataset, test_dataset)
```

PyTorch and TensorFlow both make popular datasets easily available to their users, including the Fashion MNIST dataset. PyTorch exposes it through the `torchvision.datasets.FashionMNIST`

class, and TensorFlow through the `tensorflow.keras.datasets.fashion_mnist`

class.

In both PyTorch and TensorFlow, we apply the same transformation to the image pixels, converting from integer values between 0 and 255 to floating-point values between 0 and 1. In PyTorch we use the built-in `ToTensor`

data transformation, while in TensorFlow we write the transformation code explicitly. Also, we shuffle the data in both frameworks — in PyTorch we specify our shuffle preferences to the DataLoader, and in TensorFlow we call the `shuffle`

function on the `Dataset`

.

In both cases, we obtain an instance of an `Iterable`

which enables us to iterate over batches of data of a specified size. In PyTorch, we obtain a `DataLoader`

class, while in TensorFlow we obtain a `Dataset`

class. For most practical purposes they work the same way — if we iterate over them using a `for`

loop, we get a batch of size `batch_size`

(specified as a parameter) on each iteration, until we’ve gone through the full dataset. We’ll see code for this later, in the training section.

For model creation, we’ll look at PyTorch, the higher-level TensorFlow Keras API, and lower-level TensorFlow.

The differences between PyTorch and Keras are mainly cosmetic. PyTorch’s `Linear`

layers are known as `Dense`

layers in Keras. PyTorch `Linear`

layers require both the number of inputs and the number of outputs, while Keras `Dense`

layers only need the number of outputs (they infer the number of inputs from the outputs of previous layers). And PyTorch requires a `Linear`

layer and its associated activation function to be specified as separate layers, while Keras’ `Dense`

layer permits you to specify an activation function by name as a convenience.

```
PyTorch
```

```
class NeuralNetwork(nn.Module):
"""Neural network that classifies Fashion MNIST-style images."""
def __init__(self):
super().__init__()
self.sequence = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 20),
nn.ReLU(), nn.Linear(20, 10))
def forward(self, x: torch.Tensor) -> torch.Tensor:
y_prime = self.sequence(x)
return y_prime
```

```
Keras
```

```
class NeuralNetwork(tf.keras.Model):
"""Neural network that classifies Fashion MNIST-style images."""
def __init__(self):
super().__init__()
self.sequence = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(20, activation='relu'),
tf.keras.layers.Dense(10)
])
def call(self, x: tf.Tensor) -> tf.Tensor:
y_prime = self.sequence(x)
return y_prime
```

```
TensorFlow
```

```
class NeuralNetwork(tf.keras.Model):
"""Neural network that classifies Fashion MNIST-style images."""
def __init__(self):
super().__init__()
initializer = tf.keras.initializers.GlorotUniform()
self.w1 = tf.Variable(initializer(shape=(784, 20)))
self.b1 = tf.Variable(tf.zeros(shape=(20,)))
self.w2 = tf.Variable(initializer(shape=(20, 10)))
self.b2 = tf.Variable(tf.zeros(shape=(10,)))
def call(self, x: tf.Tensor) -> tf.Tensor:
x = tf.reshape(x, [-1, 784])
x = tf.matmul(x, self.w1) + self.b1
x = tf.nn.relu(x)
x = tf.matmul(x, self.w2) + self.b2
return x
```

Defining the model using lower-level TensorFlow components looks a bit different from the two other implementations because we’re being explicit about the calculations that happen under the hood. We need to define our own `tf.Variables`

, and we need to manually perform the additions, multiplications and activation function calls that are encapsulated in the PyTorch and Keras layers. Still, for an example as simple as this, the code isn’t that complicated.

In both PyTorch and TensorFlow, you can create your own layers with custom behavior, if the provided ones don’t work for your purposes. And in both frameworks, you can create a model by writing your own class, or you can just use the framework’s built-in `Sequential`

class directly as your model. I show how to use your own class above, which permits breakpoints within the `forward`

or `call`

method.

The code to train a neural network is a bit different in these three frameworks. Let’s start by comparing the code to train the network on a single batch of data.

```
PyTorch
```

```
def _fit_one_batch(x: torch.Tensor, y: torch.Tensor, model: NeuralNetwork,
loss_fn: CrossEntropyLoss,
optimizer: Optimizer) -> Tuple[torch.Tensor, torch.Tensor]:
"""Trains a single minibatch (backpropagation algorithm)."""
y_prime = model(x)
loss = loss_fn(y_prime, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
return (y_prime, loss)
```

```
TensorFlow
```

```
@tf.function
def _fit_one_batch(
x: tf.Tensor, y: tf.Tensor, model: tf.keras.Model,
loss_fn: tf.keras.losses.Loss, optimizer: tf.keras.optimizers.Optimizer
) -> Tuple[tf.Tensor, tf.Tensor]:
"""Trains a single minibatch (backpropagation algorithm)."""
with tf.GradientTape() as tape:
y_prime = model(x, training=True)
loss = loss_fn(y, y_prime)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
return (y_prime, loss)
```

In both PyTorch and TensorFlow, we need the same four steps required to execute the backpropagation algorithm: we do one forward pass by executing the model, we call the loss function, we do one backward pass to calculate the gradients (in `loss.backward`

and `tape.gradient`

), and we take a gradient descent step by applying the calculated gradients (in `optimizer.step`

and `optimizer.apply_gradients`

). The PyTorch call to `optimizer.zero_grad`

is just a technicality — gradients get accumulated when calculated, and since PyTorch doesn’t clear them at the end of a backward pass, we need to do that manually.

Notice that in TensorFlow, the forward pass needs to be executed within the `tf.GradientTape`

context. That’s because, unlike PyTorch, TensorFlow does not record forward operations automatically for playback during differentiation. Therefore we need to explicitly tell TensorFlow to hold on to all the data it will later need to calculate derivatives.

Notice also the `@tf.function`

decorator in the TensorFlow code. This tells TensorFlow to execute the code in “graph execution” mode, instead of the “eager execution” mode that is used by default. In graph execution mode, we cannot set breakpoints as usual, therefore we typically develop the code in eager execution mode and add the decorator only when the code is complete. However, code that has been compiled into a static graph is much more performant and can be executed in environments without Python, which enables the deployment to production scenarios such as embedded devices.

It’s worth mentioning that PyTorch can also take advantage of the benefits of a static graph with the help of TorchScript, a newer library provided by Facebook. We won’t go into the details of TorchScript in this post.

Now let’s look at the code that trains the network on the entire dataset by repeatedly calling the `_fit_one_batch(...)`

function.

```
PyTorch
```

```
def _fit(device: str, dataloader: DataLoader, model: nn.Module,
loss_fn: CrossEntropyLoss,
optimizer: Optimizer) -> Tuple[float, float]:
"""Trains the given model for a single epoch."""
loss_sum = 0
correct_item_count = 0
item_count = 0
# Used for printing only.
batch_count = len(dataloader)
print_every = 100
model.to(device)
model.train()
for batch_index, (x, y) in enumerate(dataloader):
x = x.float().to(device)
y = y.long().to(device)
(y_prime, loss) = _fit_one_batch(x, y, model, loss_fn, optimizer)
correct_item_count += (y_prime.argmax(1) == y).sum().item()
loss_sum += loss.item()
item_count += len(x)
# Printing progress.
if ((batch_index + 1) % print_every == 0) or ((batch_index + 1)
== batch_count):
accuracy = correct_item_count / item_count
average_loss = loss_sum / item_count
print(f'[Batch {batch_index + 1:>3d} - {item_count:>5d} items] ' +
f'loss: {average_loss:>7f}, ' +
f'accuracy: {accuracy*100:>0.1f}%')
average_loss = loss_sum / item_count
accuracy = correct_item_count / item_count
return (average_loss, accuracy)
```

```
TensorFlow
```

```
def _fit(dataset: tf.data.Dataset, model: tf.keras.Model,
loss_fn: tf.keras.losses.Loss,
optimizer: tf.optimizers.Optimizer) -> Tuple[float, float]:
"""Trains the given model for a single epoch."""
loss_sum = 0
correct_item_count = 0
item_count = 0
# Used for printing only.
batch_count = len(dataset)
print_every = 100
for batch_index, (x, y) in enumerate(dataset):
x = tf.cast(x, tf.float64)
y = tf.cast(y, tf.int64)
(y_prime, loss) = _fit_one_batch(x, y, model, loss_fn, optimizer)
correct_item_count += (tf.math.argmax(y_prime,
axis=1) == y).numpy().sum()
loss_sum += loss.numpy()
item_count += len(x)
# Printing progress.
if ((batch_index + 1) % print_every == 0) or ((batch_index + 1)
== batch_count):
accuracy = correct_item_count / item_count
average_loss = loss_sum / item_count
print(f'[Batch {batch_index + 1:>3d} - {item_count:>5d} items] ' +
f'loss: {average_loss:>7f}, ' +
f'accuracy: {accuracy*100:>0.1f}%')
average_loss = loss_sum / item_count
accuracy = correct_item_count / item_count
return (average_loss, accuracy)
```

The `_fit(...)`

functions above simply iterate through the iterators we created earlier (the PyTorch `DataLoader`

or the TensorFlow `Dataset`

), which give us a batch of data in each iteration. We then pass the batch to the `_fit_one_batch(...)`

function that we saw earlier. The rest of the code just keeps track of certain metrics and prints them as the training progresses.

Next, let’s take a look at the code that evaluates our model’s performance on a single batch.

```
PyTorch
```

```
def _evaluate_one_batch(
x: torch.tensor, y: torch.tensor, model: NeuralNetwork,
loss_fn: CrossEntropyLoss) -> Tuple[torch.Tensor, torch.Tensor]:
"""Evaluates a single minibatch."""
with torch.no_grad():
y_prime = model(x)
loss = loss_fn(y_prime, y)
return (y_prime, loss)
```

```
TensorFlow
```

```
@tf.function
def _evaluate_one_batch(
x: tf.Tensor, y: tf.Tensor, model: tf.keras.Model,
loss_fn: tf.keras.losses.Loss) -> Tuple[tf.Tensor, tf.Tensor]:
"""Evaluates a single minibatch."""
y_prime = model(x, training=False)
loss = loss_fn(y, y_prime)
return (y_prime, loss)
```

While evaluating a batch, we only need to do a forward pass through the network to obtain a prediction, and call the loss function to evaluate it — we don’t need to calculate the derivatives in a backward pass. Since PyTorch calculates derivatives by default, this time we need to wrap the forward pass in a `torch.no_grad()`

context to avoid the cost of unneeded calculations. There’s no need to change the TensorFlow code, since it assumes that we don’t need derivatives by default.

Similarly to the `_fit(...)`

function, in both PyTorch and TensorFlow, the `_evaluate(...)`

function iterates through each batch of data and calls the `_evaluate_one_batch(...)`

function. Most of the code below is used for printing progress during execution.

```
PyTorch
```

```
def _evaluate(device: str, dataloader: DataLoader, model: nn.Module,
loss_fn: CrossEntropyLoss) -> Tuple[float, float]:
"""Evaluates the given model for the whole dataset once."""
loss_sum = 0
correct_item_count = 0
item_count = 0
model.to(device)
model.eval()
with torch.no_grad():
for (x, y) in dataloader:
x = x.float().to(device)
y = y.long().to(device)
(y_prime, loss) = _evaluate_one_batch(x, y, model, loss_fn)
correct_item_count += (y_prime.argmax(1) == y).sum().item()
loss_sum += loss.item()
item_count += len(x)
average_loss = loss_sum / item_count
accuracy = correct_item_count / item_count
return (average_loss, accuracy)
```

```
TensorFlow
```

```
def _evaluate(dataset: tf.data.Dataset, model: tf.keras.Model,
loss_fn: tf.keras.losses.Loss) -> Tuple[float, float]:
"""Evaluates the given model for the whole dataset once."""
loss_sum = 0
correct_item_count = 0
item_count = 0
for (x, y) in dataset:
x = tf.cast(x, tf.float64)
y = tf.cast(y, tf.int64)
(y_prime, loss) = _evaluate_one_batch(x, y, model, loss_fn)
correct_item_count += (tf.math.argmax(
y_prime, axis=1).numpy() == y.numpy()).sum()
loss_sum += loss.numpy()
item_count += len(x)
average_loss = loss_sum / item_count
accuracy = correct_item_count / item_count
return (average_loss, accuracy)
```

Now that we have code that trains and evaluates our model using the entire dataset, let’s call it. We want to feed the full dataset into the neural network multiple times (or “epochs”) for training, and just once for evaluation. The code below shows how to do that using PyTorch, TensorFlow, and Keras. While the PyTorch and TensorFlow versions call the `_fit(...)`

and `_evaluate(...)`

functions presented earlier, the Keras version takes advantage of built-in methods `model.fit`

and `model.evaluate`

. If you plan on using the usual training and evaluation loops, Keras will save you from writing a bunch of code, since you won’t need to provide `_fit(...)`

, `_fit_one_batch(...)`

, `_evaluate(...)`

, and `_evaluate_one_batch(...)`

functions.

```
PyTorch
```

```
def training_phase(device: str):
"""Trains the model for a number of epochs, and saves it."""
learning_rate = 0.1
batch_size = 64
epochs = 5
(train_dataloader, test_dataloader) = _get_data(batch_size)
model = NeuralNetwork()
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
print('\n***Training***')
for epoch in range(epochs):
print(f'\nEpoch {epoch + 1}\n-------------------------------')
(train_loss, train_accuracy) = _fit(device, train_dataloader, model,
loss_fn, optimizer)
print(f'Train loss: {train_loss:>8f}, ' +
f'train accuracy: {train_accuracy * 100:>0.1f}%')
print('\n***Evaluating***')
(test_loss, test_accuracy) = _evaluate(device, test_dataloader, model,
loss_fn)
print(f'Test loss: {test_loss:>8f}, ' +
f'test accuracy: {test_accuracy * 100:>0.1f}%')
torch.save(model.state_dict(), WEIGHTS_PATH)
```

```
TensorFlow
```

```
def training_phase():
"""Trains the model for a number of epochs, and saves it."""
learning_rate = 0.1
batch_size = 64
epochs = 5
(train_dataset, test_dataset) = _get_data(batch_size)
model = NeuralNetwork()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.optimizers.SGD(learning_rate)
print('\n***Training***')
t_begin = time.time()
for epoch in range(epochs):
print(f'\nEpoch {epoch + 1}\n-------------------------------')
(train_loss, train_accuracy) = _fit(train_dataset, model, loss_fn,
optimizer)
print(f'Train loss: {train_loss:>8f}, ' +
f'train accuracy: {train_accuracy * 100:>0.1f}%')
t_elapsed = time.time() - t_begin
print(f'\nTime per epoch: {t_elapsed / epochs :>.3f} sec')
print('\n***Evaluating***')
(test_loss, test_accuracy) = _evaluate(test_dataset, model, loss_fn)
print(f'Test loss: {test_loss:>8f}, ' +
f'test accuracy: {test_accuracy * 100:>0.1f}%')
model.save_weights(WEIGHTS_PATH)
```

```
Keras
```

```
def training_phase():
"""Trains the model for a number of epochs, and saves it."""
learning_rate = 0.1
batch_size = 64
epochs = 5
(train_dataset, test_dataset) = _get_data(batch_size)
model = NeuralNetwork()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate)
metrics = ['accuracy']
model.compile(optimizer, loss_fn, metrics)
print('\n***Training***')
model.fit(train_dataset, epochs=epochs)
print('\n***Evaluating***')
(test_loss, test_accuracy) = model.evaluate(test_dataset)
print(f'Test loss: {test_loss:>8f}, ' +
f'test accuracy: {test_accuracy * 100:>0.1f}%')
model.save(WEIGHTS_PATH)
```

As you can see below, the code to make a prediction is very similar in PyTorch and TensorFlow. The main difference is the additional `torch.no_grad()`

call in the PyTorch code, which we’ve already covered. Also, just like before, we annotate the TensorFlow function with `@tf.function`

when we’re done debugging, to get the benefits of graph execution.

```
PyTorch
```

```
def _predict(model: nn.Module, x: torch.Tensor, device: str) -> np.ndarray:
"""Makes a prediction for input x."""
model.to(device)
model.eval()
x = torch.from_numpy(x).float().to(device)
with torch.no_grad():
y_prime = model(x)
probabilities = nn.functional.softmax(y_prime, dim=1)
predicted_indices = probabilities.argmax(1)
return predicted_indices.cpu().numpy()
```

```
TensorFlow
```

```
@tf.function
def _predict(model: tf.keras.Model, x: np.ndarray) -> tf.Tensor:
"""Makes a prediction for input x."""
y_prime = model(x, training=False)
probabilities = tf.nn.softmax(y_prime, axis=1)
predicted_indices = tf.math.argmax(input=probabilities, axis=1)
return predicted_indices
```

In addition, you’ll notice that the PyTorch code contains a `model.eval()`

line of code, and that the TensorFlow code passes `training=False`

to the model. These both tell the corresponding models to execute in inference mode, which changes the behavior of some layers (for example, dropout and batch normalization layers).

The PyTorch and TensorFlow versions of our `inference_phase`

function can now obtain a predicted label by passing an image of an ankle boot to the appropriate version of the `_predict(...)`

function above.

```
PyTorch
```

```
def inference_phase(device: str):
"""Makes a prediction for a local image."""
print('\n***Predicting***')
model = NeuralNetwork()
model.load_state_dict(torch.load(WEIGHTS_PATH))
with Image.open(IMAGE_PATH) as image:
x = np.asarray(image).reshape((-1, 28, 28)) / 255.0
predicted_index = _predict(model, x, device)[0]
predicted_class = labels_map[predicted_index]
print(f'Predicted class: {predicted_class}')
```

```
TensorFlow
```

```
def inference_phase():
"""Makes a prediction for a local image."""
print('\n***Predicting***')
model = NeuralNetwork()
model.load_weights(WEIGHTS_PATH)
with Image.open(IMAGE_PATH) as image:
x = np.asarray(image).reshape((-1, 28, 28)) / 255.0
predicted_index = _predict(model, x).numpy()[0]
predicted_name = labels_map[predicted_index]
print(f'Predicted class: {predicted_name}')
```

```
Keras
```

```
def inference_phase():
"""Makes a prediction for a local image."""
print('\n***Predicting***')
model = tf.keras.models.load_model(WEIGHTS_PATH)
with Image.open(IMAGE_PATH) as image:
x = np.asarray(image).reshape((-1, 28, 28)) / 255.0
predicted_index = np.argmax(model.predict(x))
predicted_name = labels_map[predicted_index]
print(f'Predicted class: {predicted_name}')
```

In contrast with TensorFlow and PyTorch, the Keras version calls the built-in `model.predict`

method instead of a custom-written `_predict(...)`

function. Once again, if our scenario can be accomplished using the standard prediction routine, Keras requires less code.

In this post, you saw how to accomplish the same training scenario in PyTorch and TensorFlow. The PyTorch, Keras and TensorFlow code for this project can be found on GitHub.

Historically, PyTorch and TensorFlow have had very distinct strengths, with PyTorch being much easier to debug, and TensorFlow having much better production deployment capabilities. For this reason, PyTorch is more popular in academic settings, and TensorFlow takes the lead in the industry. But in recent years the functionality gap has been reduced, with TensorFlow adding a debug-friendly eager execution mode and PyTorch improving its deployment offerings with TorchScript. I’m curious to see if and how the balance will shift in light of these new developments, and what the future will bring to both frameworks.