# Region of interest pooling in TensorFlow – example

In the previous post we explained what region of interest pooling (RoI pooling for short) is. In this one, we present an example of applying RoI pooling in TensorFlow. We base it on our custom RoI pooling TensorFlow operation. We also use Neptune as a support in our experiment performance tracking.

## Example overview

Our goal is to detect cars in the images. We’d like to construct a network that is able to automatically draw a box around every car.

In our example we deal with car images from the Pascal VOC 2007 dataset. For simplicity we choose only cars not marked as truncated.

Related:  Region of interest pooling explained

## Neptune

We manage our experiment using Neptune. It’s a pretty handy tool:

• We track the tuning in real time. Especially, we preview the currently estimated bounding boxes.
• We can change model hyperparameters on the fly.
• We can easily integrate Neptune with TensorFlow and get all the charts, graphs and summary objects from the TensorFlow graph.
• We store the executed experiments in an aesthetic list.

## Network architecture

In our example we use the Fast R-CNN architecture.

The network has two inputs:

1. Batch of images
2. Batch of potential bounding boxes – RoI proposals
In the Fast R-CNN model RoI proposals are generated via an external algorithm, for example selective search. In our example, we take ground truth bounding boxes from the Pascal annotations and generate more negative bounding boxes ourselves.

The network has two outputs:

1. Batch of RoI proposals not classified as background (with corrected coordinates)
2. Probabilities that RoI proposals consist of objects of the consecutive categories

The network consists of three main parts:

1. Deep convolutional neural network
• Input: images
• Output: feature map

We use the popular VGG16 network pretrained on the ImageNet dataset.

2. RoI pooling layer
• Input: feature map, RoI proposals resized to a feature map
• Output: max-pooled RoI proposals
3. Fully connected layer with RoI features
• Input: max-pooled RoI proposals
• Output: corrected RoI proposals, probabilities

We note that our detection task can be also solved with the Faster R-CNN architecture, which works significantly faster :). However, the implementation of Faster R-CNN requires much more code to write, so we chose the simpler Fast R-CNN.

### Loss function

We tune the network to minimize the loss given by

$loss = \frac 1n\sum_{i=1}^n \frac 1{k_i} \sum_{j=1}^{k_i} loss_{ij}$

where:

• $$n$$ is a number of images in a batch,
• $$k_i$$ is a number of RoI proposals for the image $$i$$,
• $$loss_{ij}$$ is a loss for the RoI proposal $$j$$ for the image $$i$$.

For a single RoI proposal, $$loss_{ij}$$ is the sum of the classification and regression loss, where:

• classification loss is the common cross entropy,
• regression loss is a smooth L1 distance between the rescaled coordinates of a RoI proposal and the ground-truth box. The regression loss is computed if the ground-truth box is not categorized as background, otherwise it’s defined as 0.
Related:  Playing Atari with deep reinforcement learning - deepsense.ai’s approach

## Implementation details

### Prerequisites

To run the code we provide, you need the following software:

### Repository

You can download our code from our GitHub repository. It consists of two folders with the following content:

File Purpose
code
main.py The script to execute.
fast_rcnn.py Builds the TensorFlow graph.
trainer.py Preprocesses data and trains the network.
neptune_handler.py Contains Neptune utilities.
config.yaml Neptune configuration file.
data
vgg16-20160129.tfmodel.torrent References to weights of the pretrained network.

### Description

When we run  main.py , the script trainer.py  first restores the VGG16 network with the pretrained weights. Then it adds the RoI pooling layer and the fully connected layer. Finally, it begins tuning the entire network with use of provided images and RoI proposals. It also sends information to Neptune, so we can track the tuning progress in real time.

## Let’s test our RoI pooling in TensorFlow!

We run the script main.py  from the  code folder by typing:

If we want to also use a non-default learning rate value or the number of epochs, we can add:

to the command at the end.

After a while, we can start observing the tuning progress in Neptune:

Moreover, we can display the RoIs fitted to the cars by our network. We could just load all the processed images, but this procedure would take much of resources. That’s why we decided to activate this feature by a simple Neptune action.

To do that, we can go to the Actions tab and click ‘RUN’ to start sending the images.

After that, we can go to the Channels tab and expand the channels ‘region proposals for RoI pooling’ and ‘network detections’ by clicking ‘+’ signs.

Now we can see the RoIs in real time!

We can click on the pictures to zoom them. If we want Neptune to stop sending new images, we go to the Actions tab and click ‘RUN’ again.

An exemplary NeptuneGo execution of our script can be found here.

Related:  Logo detection and brand visibility analytics - example

## Summary

We hope you enjoy our example of RoI pooling in TensorFlow and experiment managing features offered by Neptune. If you want to comment our work, don’t be hesitate to leave us feedback!

## References

### Related Posts

9 replies
1. Atila Orhon says:

Great stuff, thank you! Do you have any plans on implementing ROIalign from MaskRCNN paper? Kaiming He and his FAIR group did not release their code yet.

• Blazej Osinski says:

Hi Atila!
Thanks for the comment. At the moment Fast and Faster RCNN are sufficient for our needs at deepsense.ai. We haven’t investigated MaskRCNN thoroughly yet.

Have you met a problem in which performance of these algorithms wasn’t enough? What were the characteristics of the data?

2. Fábio Uechi says:

Hi, first of all thanks for the article and sample code.
I’m trying to run the sample using my own dataset.
I could not figure out the logic behind generate_positive_roi and generate_negative_roi methods (both in the Trainer class).

Why 32 (positive) and 96 (negative) ?
Why dividing by 16 ?

def generate_positive_roi(self, gt_boxes):
res = []
num_ = []
for j in range(32):
epsilon1 = np.random.randint(-10, 10)
epsilon2 = np.random.randint(-10, 10)
num_box = np.random.randint(0, len(gt_boxes))
bxes = gt_boxes[num_box] + np.asarray([epsilon1, epsilon2, epsilon1, epsilon2])
bxes /= 16
res.append(bxes)
num_.append(num_box)
return np.asarray(res), np.asarray(num_)

def generate_negative_roi(self, gt_boxes):
res = []
for j in range(96):
epsilon1 = np.random.randint(50, 300)
epsilon2 = np.random.randint(50, 300)
num_box = np.random.randint(0, len(gt_boxes))
bxes = gt_boxes[num_box] + np.asarray([epsilon1, epsilon2, epsilon1 + 100, epsilon2 + 75])
bxes /= 16
res.append(bxes)
return np.asarray(res)

• Krzysztof Dziedzic says:

Hi Fabio,

Thanks for the question.
there are 32 positives and 96 negatives, because this proportion of positives vs. negatives (1:3) has been proposed by the authors of the paper (Fast RCNN, reference is contained in the post).
We need to divide coordinates of the boxes by 16, because they are used at the RoI pooling layer. Before reaching this layer, the input image is downsampled 16 times ( there are four max pooling layers before the RoI pooling layer ), thus all coordinates shrink 16 times.
Don’t hesitate to ask if you have any more questions

• Himanshu Rai says:

Hi,

So extending this question “mapping from the input ROI to the feature map ROI done” from the previous post, all that needs to be done is divide by a factor(16 here) ? For example if the original roi coordinate was (16,16), then in the feature map , will this correspond to (1,1)? Thanks!

• Krzysztof Dziedzic says:

Yes, that’s correct

• Karthik Suresh says:

Hi, thank you for a great blog post. Can you explain why you are adding random epsilon 1 and epsilon 2 before dividing it by downsampling ratio (16)? Also, why are these epsilon values different for positive and negative samples?

Thank you

3. Rafay Zia Mir says:

Hi, first of all thats a great contribution. I need your help.
My question is that i have a region proposal method which generates 4 rois for one ground truth bounding box,
boxes_tr = self.bbox_transform(gt_boxes, im.shape)
for j in range(32):
reg[j] = boxes_tr[ob_numbers[j]]
Above lines are confusing me. Are you trying to generate ground truth for each ROI? Kindly help me, if i have 4 rois for 1 groundtruth bbox then what should i do?