Region of interest pooling in TensorFlow – example

Region of interest pooling in TensorFlow – example

In the previous post we explained what region of interest pooling (RoI pooling for short) is. In this one, we present an example of applying RoI pooling in TensorFlow. We base it on our custom RoI pooling TensorFlow operation. We also use Neptune as a support in our experiment performance tracking.

Example overview

Our goal is to detect cars in the images. We’d like to construct a network that is able to automatically draw a box around every car.

In our example we deal with car images from the Pascal VOC 2007 dataset. For simplicity we choose only cars not marked as truncated.

Exemplary images from Pascal VOC 2007 dataset
Related:  Region of interest pooling explained


We manage our experiment using Neptune. It’s a pretty handy tool:

  • We track the tuning in real time. Especially, we preview the currently estimated bounding boxes.
  • We can change model hyperparameters on the fly.
  • We can easily integrate Neptune with TensorFlow and get all the charts, graphs and summary objects from the TensorFlow graph.
  • We store the executed experiments in an aesthetic list.

Network architecture

In our example we use the Fast R-CNN architecture.

The network has two inputs:

  1. Batch of images
  2. Batch of potential bounding boxes – RoI proposals
    In the Fast R-CNN model RoI proposals are generated via an external algorithm, for example selective search. In our example, we take ground truth bounding boxes from the Pascal annotations and generate more negative bounding boxes ourselves.

The network has two outputs:

  1. Batch of RoI proposals not classified as background (with corrected coordinates)
  2. Probabilities that RoI proposals consist of objects of the consecutive categories

The network consists of three main parts:

  1. Deep convolutional neural network
    • Input: images
    • Output: feature map

    We use the popular VGG16 network pretrained on the ImageNet dataset.

  2. RoI pooling layer
    • Input: feature map, RoI proposals resized to a feature map
    • Output: max-pooled RoI proposals
  3. Fully connected layer with RoI features
    • Input: max-pooled RoI proposals
    • Output: corrected RoI proposals, probabilities
RoI pooling in TensorFlow scheme
Fast R-CNN architecture

We note that our detection task can be also solved with the Faster R-CNN architecture, which works significantly faster :). However, the implementation of Faster R-CNN requires much more code to write, so we chose the simpler Fast R-CNN.

Loss function

We tune the network to minimize the loss given by

\[loss = \frac 1n\sum_{i=1}^n \frac 1{k_i} \sum_{j=1}^{k_i} loss_{ij}\]


  • \(n\) is a number of images in a batch,
  • \(k_i\) is a number of RoI proposals for the image \(i\),
  • \(loss_{ij}\) is a loss for the RoI proposal \(j\) for the image \(i\).

For a single RoI proposal, \(loss_{ij}\) is the sum of the classification and regression loss, where:

  • classification loss is the common cross entropy,
  • regression loss is a smooth L1 distance between the rescaled coordinates of a RoI proposal and the ground-truth box. The regression loss is computed if the ground-truth box is not categorized as background, otherwise it’s defined as 0.
Related:  Playing Atari with deep reinforcement learning -’s approach

Implementation details


To run the code we provide, you need the following software:


You can download our code from our GitHub repository. It consists of two folders with the following content:

main.pyThe script to execute.
fast_rcnn.pyBuilds the TensorFlow graph.
trainer.pyPreprocesses data and trains the network.
neptune_handler.pyContains Neptune utilities.
config.yamlNeptune configuration file.
get_data.pyDownloads images from Pascal VOC 2007 dataset
vgg16-20160129.tfmodel.torrentReferences to weights of the pretrained network.


When we run , the script  first restores the VGG16 network with the pretrained weights. Then it adds the RoI pooling layer and the fully connected layer. Finally, it begins tuning the entire network with use of provided images and RoI proposals. It also sends information to Neptune, so we can track the tuning progress in real time.

After cloning the repository, please download the file vgg16-20160129.tfmodel  referred to by the torrent file vgg16-20160129.tfmodel.torrent  and save it in the  data  directory. Also, please run the script to download needed images:

Let’s test our RoI pooling in TensorFlow!

We run the script  from the  code folder by typing:

If we want to also use a non-default learning rate value or the number of epochs, we can add:

to the command at the end.

After a while, we can start observing the tuning progress in Neptune:

RoI pooling in TensorFlow - tuning
Tracking the network tuning in Neptune

Moreover, we can display the RoIs fitted to the cars by our network. We could just load all the processed images, but this procedure would take much of resources. That’s why we decided to activate this feature by a simple Neptune action.

To do that, we can go to the Actions tab and click ‘RUN’ to start sending the images.

RoI pooling in TensorFlow - turning on the image sending in Neptune
Turning on the image sending

After that, we can go to the Channels tab and expand the channels ‘region proposals for RoI pooling’ and ‘network detections’ by clicking ‘+’ signs.

Roi pooling in TensorFlow - expanding image channels in Neptune
Expanding image channels

Now we can see the RoIs in real time!

RoI pooling in TensorFlow - RoI preview
RoI proposals preview in Neptune

We can click on the pictures to zoom them. If we want Neptune to stop sending new images, we go to the Actions tab and click ‘RUN’ again.

An exemplary NeptuneGo execution of our script can be found here.

Related:  Logo detection and brand visibility analytics


We hope you enjoy our example of RoI pooling in TensorFlow and experiment managing features offered by Neptune. If you want to comment our work, don’t be hesitate to leave us feedback!


Related Posts

7 replies
  1. Atila Orhon
    Atila Orhon says:

    Great stuff, thank you! Do you have any plans on implementing ROIalign from MaskRCNN paper? Kaiming He and his FAIR group did not release their code yet.

    • Blazej Osinski
      Blazej Osinski says:

      Hi Atila!
      Thanks for the comment. At the moment Fast and Faster RCNN are sufficient for our needs at We haven’t investigated MaskRCNN thoroughly yet.

      Have you met a problem in which performance of these algorithms wasn’t enough? What were the characteristics of the data?

  2. Fábio Uechi
    Fábio Uechi says:

    Hi, first of all thanks for the article and sample code.
    I’m trying to run the sample using my own dataset.
    I could not figure out the logic behind generate_positive_roi and generate_negative_roi methods (both in the Trainer class).

    Why 32 (positive) and 96 (negative) ?
    Why dividing by 16 ?

    def generate_positive_roi(self, gt_boxes):
    res = []
    num_ = []
    for j in range(32):
    epsilon1 = np.random.randint(-10, 10)
    epsilon2 = np.random.randint(-10, 10)
    num_box = np.random.randint(0, len(gt_boxes))
    bxes = gt_boxes[num_box] + np.asarray([epsilon1, epsilon2, epsilon1, epsilon2])
    bxes /= 16
    return np.asarray(res), np.asarray(num_)

    def generate_negative_roi(self, gt_boxes):
    res = []
    for j in range(96):
    epsilon1 = np.random.randint(50, 300)
    epsilon2 = np.random.randint(50, 300)
    num_box = np.random.randint(0, len(gt_boxes))
    bxes = gt_boxes[num_box] + np.asarray([epsilon1, epsilon2, epsilon1 + 100, epsilon2 + 75])
    bxes /= 16
    return np.asarray(res)

    • Krzysztof Dziedzic
      Krzysztof Dziedzic says:

      Hi Fabio,

      Thanks for the question.
      there are 32 positives and 96 negatives, because this proportion of positives vs. negatives (1:3) has been proposed by the authors of the paper (Fast RCNN, reference is contained in the post).
      We need to divide coordinates of the boxes by 16, because they are used at the RoI pooling layer. Before reaching this layer, the input image is downsampled 16 times ( there are four max pooling layers before the RoI pooling layer ), thus all coordinates shrink 16 times.
      Don’t hesitate to ask if you have any more questions

      • Himanshu Rai
        Himanshu Rai says:


        So extending this question “mapping from the input ROI to the feature map ROI done” from the previous post, all that needs to be done is divide by a factor(16 here) ? For example if the original roi coordinate was (16,16), then in the feature map , will this correspond to (1,1)? Thanks!

      • Karthik Suresh
        Karthik Suresh says:

        Hi, thank you for a great blog post. Can you explain why you are adding random epsilon 1 and epsilon 2 before dividing it by downsampling ratio (16)? Also, why are these epsilon values different for positive and negative samples?

        Thank you


Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *