Quality assurance in manufacturing is demanding and expensive, yes, but also absolutely crucial. After all, selling flawed goods results in returns and disappointed customers. Harnessing the power of image recognition and deep learning may significantly reduce the cost of visual quality control while also boosting overall process efficiency.
Traditional quality control is time-consuming. It is manually performed by specialists testing the products for flaws. Yet the process is crucial for business, as product quality is the pillar a brand will stand on. It is also expensive. Electronics industry giant Flex claims that for every 1 dollar it spends creating a product, it lays out 100 more on resolving quality issues.
Since the inception of image recognition software, manufacturers have been able to incorporate IP cameras into the quality control process. Most of the implementations are based on complex systems of triggers. But with the conditions predefined by programmers, the cameras were able to spot only a limited number of flaws. While the technology may not yet have been worthy of the title game changer, the image recognition revolution was one step further.
Deep learning about perfection
Artificial intelligence may enhance the company’s ability to spot flawed products. Instead of embedding complex and lengthy lists of possible flaws into an algorithm, the algorithm learns the product’s features. With the vision of the perfect product, the software can easily spot imperfect ones.
A great example of how AI combined with vision systems can improve product quality is on display at Fujitsu’s Oyama factory. The Image Recognition System the company uses not only helps it ensure the production of parts of an optimal quality, but also supervises the assembly process. This dual role has markedly boosted the company’s efficiency.
As the company stated, the solution lacked the flexibility today’s fast-moving world demands. But powering up an AI-driven solution allowed it to quickly adapt its software to new products without the need for time-consuming recalibration. With the AI solutions, Fujitsu reduced its development time by 80% while keeping part recognition rates at 97%+.
As their solution proved successful, Fujitsu deployed it at all of its production sites.
Visual quality control is also factoring in the agricultural product packing arena. One company has recently introduced a high-performance fruit sorting machine that uses computer vision and machine learning to classify skin defects. The operator can teach the sorting platform to distinguish between different types of blemishes and sort the fruit into sophisticated pack grades. The solution combines hardware, software and operational optimization to reduce the complexity of the sorting process.
As automation becomes more widespread and manufacturing more complex, factories will need to employ AI. Self-learning machines ultimately allow the companies forward-thinking enough to use them to reduce operational costs while maintaining the highest quality possible.
However, an out-of-box solution is not always the best option. Limited flexibility and lower accuracy are the most significant obstacles most companies face. Sometimes building an in-house team of machine learning experts is the best way to provide both the competence and ability to tailor the right solutions for one’s business. As building the internal team to design visual quality control is more than challenging, finding the reliable partner to gain knowledge may be the best option.
https://deepsense.ai/wp-content/uploads/2019/02/Spot-the-flaw-Visual-quality-control-in-manufacturing.jpg3371140Konrad Budekhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngKonrad Budek2018-04-19 14:14:462023-04-12 23:23:25Spot the flaw – visual quality control in manufacturing
Researchers from the deepsense.ai machine learning team, Piotr Miłoś, Błażej Osiński and Henryk Michalewski, together with Łukasz Kaiser from Google Brain’s TensorFlow team optimized infrastructure for reinforcement learning in the Tensor2Tensor project.
The idea behind the improvements was to develop an artificial intelligence capable of imagining and reasoning about the future. Instead of using precise and costly simulators or even more costly real-world data, the new AI spends most of its energy on imagining possible future events. The process of imagining is much less costly than gathering real data. At the same time, a properly trained imagination is a far cry from daydreaming. In fact, it makes it possible to precisely model reality and reason about it hundreds of times faster than would be possible using simulators.
The novelty of Tensor2Tensor consists in implementation of the Proximal Policy Optimization, which is completely contained in the computation graph. This is the main technical factor behind the lightning fast imagination.
In the second stage of the project the researchers from deepsense.ai, the University of Warsaw and Google Brain are focusing on the end-to-end training of an reinforcement learning agent fully inside a computation graph.
One of the steps in the experiment is the implementation of the Proximal Policy Optimization algorithm entirely using TensorFlow atoms. The training will be run on Cloud Tensor Processing Units (TPUs), which are custom Google-designed chips for machine learning. Assuming that a game simulator can be represented as a neural network, we expect that the whole training process can then be kept in the memory of the Cloud TPU.
Stay tuned for the results of our project!
https://deepsense.ai/wp-content/uploads/2019/02/artificial-intelligence-imagining-and-reasoning-about-the-future.jpg4021362Anna Kowalczykhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngAnna Kowalczyk2018-03-09 12:33:152023-04-14 00:07:02Artificial intelligence imagining and reasoning about the future
Companies pay astonishing amounts of money to sponsor events and raise brand visibility. Calculating the ROI from such sponsorship can be augmented with machine learning-powered tools to deliver more accurate results.
Event sponsoring is a well-established marketing strategy to build brand awareness. Despite being one of the most recognizable brands in the automotive industry, Chevrolet pays $71.4 million dollars each year to put its brand on Manchester United shirts.
How many people does your brand reach?
According to Eventmarketer’s study, 72% of consumers positively view brands that provide them with positive experiences, be it a great sports game or another cultural event, such as a music festival. Such events attract large numbers of viewers both directly and via media reports, allowing brands to get favorable positioning and work on their word-of-mouth recognition.
Sponsorship contracts often come at a steep price, so brand owners are naturally more than a little interested in finding out how effectively their outlays are working for them. However, it’s difficult to assess quantitatively just how great the brand exposure is in a given campaign. The information on brand exposure can further support demand forecasting efforts, as the company gains information on expected demand peaks that result from greater brand exposure in media coverage.
The current approach to computing such statistics has involved manually annotating broadcast material, which is tedious and expensive. To address these problems, we have developed an automated tool for logo detection and visibility analysis that provides both raw detection and a rich set of statistics.
We decided to break the problem down into two steps: logo detection with convolutional neural networks and an analytics for computing summary statistics.
The main advantage of this approach is that swapping the analytics module for a different one is straightforward. This is essential when different types of statistics are called for, or even if the neural net is to be trained for a completely different task (we had plenty of fun modifying this system to spot and count coins – stay tuned for a future blog post on that).
Logo detection with deep learning
There are two principal approaches to object detection with convolutional neural networks: region-based methods and fully convolutional methods.
Region-based methods, such as R-CNN and its descendants, first identify image regions which are likely to contain objects (region proposals). They then extract these regions and process them individually with an image classifier. This process tends to be quite slow, but can be sped up to some extent with Fast R-CNN, where the image is processed by the convolutional network as a whole and then region representations are extracted from high-level feature maps. Faster R-CNN is a further improvement where region proposals are also computed from high-level CNN features, which accelerates the region proposal step.
Fully convolutional methods, such as SSD, do away with processing individual region proposals and instead aim to output class labels where the region proposal step would be. This approach can be much faster, since there is no need to extract and process region proposals individually. In order to make this work for objects with very different sizes, the SSD network has several detection layers attached to feature maps of different resolutions.
Since real-time video processing is one of the requirements of our system, we decided to go with the SSD method rather than Fast R-CNN. Our network also uses ResNet-50 as its convnet backbone, rather than the default VGG-16. This made it much less memory-hungry, while also helping to stabilize the training process.
In the process of refining the SSD architecture for our requirements, we ran dozens of experiments. This was an iterative process with a large delay between the start and finish of an experiment (typically 1-2 days). In order to run numerous experiments in parallel, we used Neptune, our machine learning experiment manager. Neptune captures the values of the loss function and other statistics while an experiment is running, displaying them in a friendly web UI. Additionally, it can capture images via image channels and display them, which really helped us troubleshoot the different variations of the data augmentation we tested.
Logo detection analytics
The model we produced generates detections very well. However, when even a short video is analyzed, the raw description can span thousands of lines. To help humans analyze the results, we created software that translates these descriptions into a series of statistics, charts, rankings and visualizations that can be assembled into a concise report.
The statistics are calculated globally and per brand. Some of them, like brand display time, are meant to be displayed, but many are there to fuel the visual representation. Speaking of which, the charts are really expressive in this task. Some features include brand exposure size in time, heatmaps of a logo’s position on the screen and bar charts to allow you to easily compare various statistics across the brands. Last but not least, we have a module for creating highlights – visualizations of the bounding boxes detected by the model. This module serves a double purpose: in addition to making the analysis easy to track, such visualizations are also a source of valuable information for data scientists tweaking the model.
We processed a short video featuring a competition between rivals Coca-Cola and Pepsi to see which brand received more exposure in quantitative terms. You can watch it on YouTube by following this link. Which logo has better visibility?
Below, you can compare your guesses with what our model reported:
Possible extensions
There are many business problems where object detection can be helpful. Here at deepsense.ai, we have worked on a number of them.
We developed a solution for Nielsen that extracts information about ingredients from photographs of FMCG products, using object detection networks to locate the list of ingredients in photographs of products. This made Nielsen’s data collection more efficient and automatic. In its bid to save the gravely endangered North Atlantic Right Whale,The NOAA used a related technique to spot whales in aerial photographs. Similar techniques are used when the reinforcement learning-based models behind autonomous vehicles learn to recognize road signs.
With logo detection technology, companies can evaluate a campaign’s ROI by analyzing any media coverage of a sponsored event. With the information on brand positioning in hand, it is easy to calculate the advertising equivalent value or determine the most impactful events to sponsor.
With further extrapolation, companies can monitor the context of media coverage and track whether their brand is shown with positive or negative information, providing even more knowledge for the marketing team.
https://deepsense.ai/wp-content/uploads/2019/02/logo-detection-and-brand-visibility-analytics.jpg3371140Michal Romaniukhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngMichal Romaniuk2019-08-29 10:03:572024-02-06 19:08:48Logo detection and brand visibility analytics – example
We’re thrilled today to announce the latest version of Neptune: Machine Learning Lab. This release will allow data scientists using Neptune to take some giant steps forward. Here we take a quick look at each of them.
Cloud support
One of the biggest differences between Neptune 1.x and 2.x is that 2.x supports Google Cloud Platform. If you want to use NVIDIA® Tesla® K80 GPUs to train your deep learning models or Google’s infrastructure for your computations, you can just select your machine type and easily send your computations to the cloud. Of course, you can still run experiments on your hardware the way it was. We currently support only GCP–but stay tuned as we will not only be bringing more clouds and GPUs into the Neptune support fold, but offering them at even better prices!
With cloud support, we are also changing our approach to managing data. Neptune uses shared storage to store data about each experiment, for both the source code and the results (channel values, logs, output files, e.g. trained models). On top of that, you can upload any data to a project and use it in your experiments. As you execute your experiments, you’ve got all your sources at your fingertips, in the /neptune directory, which is available on fast drive for reading and writing. It is also your current working directory – just like you would run it on your local machine. Alongside this feature, Neptune can still keep your original sources so you can easily reproduce your experiments. For more details please read documentation.
Interactive Notebooks
Engineers love how interactive and easy to use Notebooks are, so it should come as no surprise that they’re among the most frequently used data science tools. Neptune now allows you to prototype faster and more easily using Jupyter Notebooks in the cloud, which is fully integrated with Neptune. You can choose from among many environments with different libraries (Keras, TensorFlow, Pytorch, etc) and Neptune will save your code and outputs automatically.
New Leaderboard
Use Neptune’s new leaderboard to organize data even more easily.
You can change the width of all columns and reorder them by simply drag and dropping their headings.
You can also edit the name, tags and notes directly in the table and display metadata including running time, worker type, environment, git hash, source code size and md5sum.
The experiments are now presented with their Short ID. This allows you to identify an experiment among those with identical names.
Sometimes you may want to see the same type of data throughout the entire project. You can now fix chosen columns on the left for quick reference as you scroll horizontally through the other sections of the table.
Parameters
Neptune comes with new, lightweight and yet more expressive parameters for experiments.
This means you no longer need to define parameters in configuration files. Instead, you just write them in the command line!
Let’s assume you have a script named main.py and you want to have 2 parameters: x=5 and y=foo . You need to pass them in the neptune send command:
neptune send -- '--x 5 --y foo'
Under the hood, Neptune will run python main.py –x 5 –y foo , so your parameters are placed in sys.argv . You can then parse these arguments using the library of your choice.
An example using argparse :
Everybody who watched ‘Minority Report’, Steven Spielberg’s movie based on the Philip Dick’s short story, daydreams about crime forecasting in the real world. We have good news: machine learning algorithms can do just that!
In September 2016, the National Institute of Justice in the US announced the Real-Time Crime Forecasting Challenge. The goal was to predict future crimes in the city of Portland, OR. CodiLime, deepsense.ai’s parent company, took part in it, giving the job to our machine learning team. The results were revealed in August 2017: we did a great job and won eight out of 40 sub-competitions! In this post we describe the crime forecasting algorithms we used.
Competition rules
Fortunately, the NIJ didn’t ask contestants to carve names of forthcoming criminals and victims into wooden balls, as was the case in the movie. Instead, they wanted to know the hotspots – small areas with the greatest ‘intensity’ of future crimes.
Three different types of crimes were considered separately: burglary, car theft and street crimes (including assaults, robberies, shots fired). Additionally, all the crimes together were of interest as well.
The end of February 2017 was the deadline and five future timespans were involved:
The first week of March 2017,
The first two weeks of March 2017,
All of March 2017,
March and April 2017,
March, April and May 2017.
Thus, we had to make 4 x 5 = 20 individual crime forecasts for 20 type/time categories (e.g. ‘burglary, two weeks’).
Once we finished May 2017, in each of 20 type/time categories our hotspot predictions were compared against the actual state of affairs in Portland using two independent metrics:
‘crime density’: number of crimes that occurred in hotspots divided by the total volume of hotspots,
‘prediction efficiency’: the number of crimes that occurred in hotspots divided by the number of crimes in the actual worst regions with the same total volume as our hotspots.
Hence, the competition consisted of 4 x 5 x 2 = 40 separate sub-competitions in total (e.g. ‘burglary, two weeks, crime density’). The winner took it all in each of them and the all was $15,000. So, there was $600,000 in the pot – a good motivation to work!
To be clear, three independent clones of the Real-Time Crime Forecasting Challenge were run simultaneously. The one we took part in was intended for large businesses. Of the remaining two, one was run for small businesses and the other for students. Every clone had the same rules and goals, but its own contestants, winners and prizes.
Our solution
Data
In ‘Minority Report’, the Precrime Police unit got their crime forecasts from Precogs, three mutated humans who could see into the future. At deepsense.ai, our Precrime unit created the predictions based on the past.
The organizer delivered historical data with all the crimes registered in Portland between March 2012 and February 2017. Almost 1,000,000 records were provided in total. Each of them contained daytime, place (with accuracy to one foot!) and the type of crime committed.
Our first question was: since we have no Precogs onboard, can we use anything else than historical data? What could affect future crimes, but hadn’t left a trace on those that had already been committed? Well, in our opinion these could only be future events. But are they easier to predict than crimes themselves? For instance, one can page through local newspapers seeking sentences like ‘A new gin mill is going to be opened in March 2017. The crime rate will certainly rise there.’ However, such research requires a lot of work and there is no guarantee it’ll actually help. So we decided to squeeze as much out of the historical data only as we could.
Blind contest
No leaderboard was run during the contest. We didn’t know how many competitors we had and how honed their crime forecasting skills were. The only thing we could do to win was improve our own results over and over.
The first attempts showed us that in each of 20 type/time categories the ‘crime density’ metric was maximized by a lot of small hotspots whereas the ‘prediction efficiency’ performed best for a small number of large hotspots. Hence it was clear that we couldn’t satisfy both metrics simultaneously. Since each metric formed an independent sub-competition with a separate prize, it was better to have a good score for one metric than mediocre results for both. So, for each of the 20 type/time categories we had to decide which metric to focus on in our further work.
Which metric to choose when the metrics are incomparable, scores between categories are incomparable and you don’t know other competitors’ results? We checked that under some reasonable assumptions the best strategy is to just toss a coin; and this is what we did, 20 times – once per type/time category.
Bad neighborhoods remain bad
The major rule we followed while building our models was rather pessimistic: ‘if many crimes have occurred somewhere, more are likely to happen.’ This principle may strike some as naive, but the longer we explored the data, the more confident we were that it worked.
Not every past crime is equally important. We took advantage of the aging and seasonality of data. We focused more on data from 2017 and 2016 than on older ones. Also, we boosted the significance of crimes committed in the same season as the forecasting time. For instance, to make predictions for March 2017 we took special care of data from preceding Marches.
Moreover, as we know, evil is prone to ‘radiate’. When a crime is committed, we can expect others to happen nearby. This is why we decided to ‘diffuse’ the data points. For those who like statistical jargon, we note that this technique is called kernel density estimation.
However, we didn’t set the ‘intensities’ of data aging, seasonality and diffusion by hand. They were adjusted by our algorithm automatically. How did it know how to do that, you ask? As always in machine learning, it just chose them to obtain the best results! For each of 20 type/time categories we separated the last period of historical data as a validation dataset (e.g. February 2017 for a forecasting of March 2017). The algorithm used all but validation data to check which parameters best predict crimes from the validation set. Then, ultimately, it took all the available data to prepare the final crime forecasting.
Neptune
We must say that the Real-Time Crime Forecasting Challenge was also a logistic challenge. We had to manage and improve 40 models simultaneously. To do that we used our own machine learning lab called Neptune. We designed it for precisely this type of task: to easily store, compare and recreate a lot of experiments. To be honest, we can’t imagine how one would handle 40 models without using this tool.
Results
The results were announced in August 2017: in our large-business group we won 8 out of 40 sub-competitions, were the runner-up in 6 more and took third place in yet another 6. This is a big success, but there is something we are especially proud of. We compared crime forecasts from all the three clones of the competition: large businesses, small businesses and students, and it turned out that our results would give us the top place in the total ranking! Our team finished with the best predictions in seven sub-competitions, three more that the runner-up managed.
Do you want to see one of our winning crime forecasts? Here it is:
The gray area is Portland, around 15 by 20 miles. 56,000 black dots are all the crimes committed between March and May 2017. The hotspots we chose are blue, but you probably can’t see them, so let’s zoom in on the Downtown:
We indicated 112 hotspots, 294 by 213 ft each. They appear to be placed randomly, but they are not, they lie optimally. This is why machine learning algorithms are so fun: it’s hard to deal with their outputs using common sense, but they work!
Needle in a haystack
The total number of crimes in Portland between March and May 2017 – 56,000 – is impressively big. Another category was on the opposite pole: during the first week of March 2017 only 20 (twenty) burglaries were committed in the investigated area!
If you think that it is hard to shoot 20 random events in a 150 mi2 area with use of bars with the total volume less than ¾ mi2 (the organizer’s requirement), you are absolutely right. In our opinion it was a matter of luck. We indeed hit one burglary, but it wasn’t enough to win this category.
But there was another way. The number of 20 crimes is so small that hypothetically any cheater could simply change the history and assure his victory by arranging a burglary or two in fixed places. Of course we didn’t do that and we think that nobody did since 20-25 is a typical amount of weekly burglaries in Portland. Experienced data scientists wouldn’t try this hoax because they’d know that if they weren’t the only ones who were going to do so, they wouldn’t benefit from this highly risky move. And, above all, they tend to spend their time on doing data science stuff rather than plotting fake crimes – being honest is usually a simpler way for us. However, in the ‘Minority Report’ universe a wooden ball would inform us about any bad intentions. In our world we just believe in people… or we can predict their behavior using machine learning algorithms!
Summary
If you’ve enjoyed our post or want to ask about anything related to crime forecasting (or maybe demand forecasting?), please leave us a reply!
Product recognition is a challenging area that offers great financial promise. Automatically detected product attributes in photos should be easy to monetize, e.g., as a basis for cross-selling and upselling.
However, product recognition is a tough task because the same product can be photographed from different angles, in different lighting, with varying levels of occlusion, etc. Also, different fine-grained product labels, such as ones in royal blue or turquoise, may prove difficult to distinguish visually. Fortunately, properly tuned convolutional neural networks can effectively resolve these problems.
In this post, we discuss our solution for the iMaterialist challenge announced by CVPR and Google and hosted on Kaggle in order to show our approach to product recognition.
The problem
Data and goal
The iMaterialist organizer provided us with hyperlinks to more than 50,000 pictures of shoes, dresses, pants and outerwear. Some tasks were attached to every picture and some labels were matched to every task. Here are some examples:
blue, blue jeans, denim blue, light blue, light, denim
pants: type
jeans
pants: age
adult
pants: decoration
men jeans
pants: gender
men
task
labels
shoe: color
dark brown
shoe: up height
kneehigh
pants: color
black
Our goal was to match a proper label to every task for every picture from the test set. From the machine learning perspective this was a multi-label classification problem.
There were 45 tasks in total (a dozen per cloth type) and we had to predict a label for all of them for every picture. However, tasks not attached to the particular test image were skipped during the evaluation. Actually, usually only a few tasks were relevant to a picture.
Problems with data
There were two main problems with data:
We weren’t given the pictures themselves, but only the hyperlinks. Around 10% of them were expired, so our dataset was significantly smaller than the organizer had intended. Moreover, the hyperlinks were a potential source of a data leak. One could use text-classification techniques to take advantage of leaked features hidden in hyperlinks, though we opted not to do that.
Some labels with the same meaning were treated by the organizer as different, for example “gray” and “grey”, “camo” and “camouflage”. This introduced noise in the training data and distorted the training itself. Also, we had no choice but to guess if a particular picture from the test set was labeled by the organizer as either “camo” or “camouflage”.
Evaluation
The evaluation score function was the average error over all test pictures and relevant tasks. A score value of 0 meant that all the relevant tasks for all the test pictures were properly labeled, while a score of 1 implied that no relevant task for any picture was labeled correctly. A random sample submission provided by the organizer yielded a score greater than 0.99. Hence we knew that a good result couldn’t be achieved by accident and we would need a model that could actually learn how to solve the problem.
Our solution
A bunch of convolutional neural networks
Our solution consisted of about 20 convolutional neural networks. We used the following architectures in several variants:
All of them were initialized with weights pretrained on the ImageNet dataset. Our models also differed in terms of the data preprocessing (cropping, normalizing, resizing, switching of color channels) and augmentation applied (random flips, rotations, color perturbations from Krizhevsky’s AlexNet paper). All the neural networks were implemented using the PyTorch framework.
Choosing the training loss function
Which loss function to choose for the training stage was one of the major problems we faced. 576 unique pairs of task/label occurred in the training data so the outputs of our networks were 576-dimensional. On the other hand, typically only a few labels were matched to a picture’s tasks. Therefore the ground truth vector was very sparse – only a few of its 576 coordinates were nonzero – so we struggled to choose the right training loss function.
Assume that \((z_1,…,z_{576})in mathbb{R}^{576}\) is a model output and
[y_i=left{begin{array}{ll}1, & text{if task/label pair }itext{ matches the picture,}, & text{elsewhere,}end{array}right.quadtext{for } i=1,2,ldots,576.]
As this was a multi-label classification problem, choosing the popular crossentropy loss function:
\([sum_{i=1}^{576}-y_ilog p_i,quad text{where } p_i=frac{exp(z_i)}{sum_{j=1}^{576}exp(z_j)},]\)
wouldn’t be a good idea. This loss function tries to distinguish only one class from others.
Also, for the ‘element-wise binary crossentropy’ loss function:
\([sum_{i=1}^{576}-y_ilog q_i-(1-y_i)log(1-q_i),quad text{where } q_i=frac{1}{1+exp(-z_i)},]\)
the sparsity caused the models to end up constantly predicting no labels for any picture.
In our solution, we used the ‘weighted element-wise crossentropy’ given by:
\([sum_{i=1}^{576}-bigg(frac{576}{sum_{j=1}^{576}y_j}bigg)cdot y_ilog q_i-(1-y_i)log(1-q_i),quad text{where } q_i=frac{1}{1+exp(-z_i)}.]\)
This loss function focused the optimization on positive cases.
Ensembling
Predictions from particular networks were averaged, all with equal weights. Unfortunately, we didn’t have enough time to perform any more sophisticated ensembling techniques, like xgboost ensembling.
Other techniques tested
We also tested other approaches, though they proved less successful:
Training the triplet network and then training xgboost models on features extracted via embedding (different models for different tasks).
Mapping semantically equivalent labels like “gray” and “grey” to a common new label and remapping those to the original ones during postprocessing.
Neptune
We managed all of our experiments using Neptune, deepsense.ai’s Machine Learning Lab. Thanks to that, we were easily able to track the tuning of our models, compare them and recreate them.
Results
We achieved a score of 0.395, which means that we correctly predicted more than 60% of all the labels matched to relevant tasks.
We are pleased with this result, though we could have improved on it significantly if the competition had lasted longer than only one month.
Summary
Challenges like iMaterialist are a good opportunity to create product recognition models. The most important tools and tricks we used in this project were:
Playing with training loss functions. Choosing the proper training loss function was a real breakthrough as it boosted accuracy by over 20%.
A custom training-validation split. The organizer provided us with a ready-made training-validation split. However, we believed we could use more data for training so we prepared our own split with more training data while maintaining sufficient validation data.
Using the PyTorch framework instead of the more popular TensorFlow. TensorFlow doesn’t provide the official pretrained models repository, whereas PyTorch does. Hence working in PyTorch was more time-efficient. Moreover, we determined empirically that, much to our surprise, the same architectures yielded better results when implemented in PyTorch than in TensorFlow.
We hope you have enjoyed this post and if you have any questions, please don’t hesitate to ask!
https://deepsense.ai/wp-content/uploads/2019/02/how-to-create-a-product-recognition-solution.jpg3371140Krzysztof Dziedzichttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngKrzysztof Dziedzic2017-08-22 13:57:172021-01-05 16:49:37How to create a product recognition solution
In this post, we provide an example of how to run a TensorFlow experiment on a Slurm cluster. Since TensorFlow doesn’t yet officially support this task, we developed a simple Python module for automating the configuration. It parses the environment variables set by Slurm and creates a TensorFlow cluster configuration based on them. We’re sharing this code along with a simple image recognition example on CIFAR-10. You can find it in our github repo.
But first, why do we even need distributed machine learning?
Distributed TensorFlow
When machine learning models are developed, training time is an important factor. Some experiments can take weeks or even months on a single machine. Shortening this time enables us to try out more approaches, test many similar models and use the best one. That’s why it’s useful to use multiple machines for faster training.
One of of TensorFlow’s strongest points is that it’s designed to support distributed computation. To use multiple nodes, you just have to create and start a tf.train.Server and use a tf.train.MonitoredTrainingSession.
Between Graph Replication
In our example we’re going be using a concept called ‘Between Graph Replication’. If you’ve ever run MPI jobs or used the ‘fork’ system call, you’ll be familiar with it.
In Distributed TensorFlow, Between Graph Replication means that when several processes are being run on different machines, each process (worker) runs the same code and constructs the same TensorFlow computational graph. However, each worker uses a discriminator (the worker’s I.D., for example) to execute instructions differently from the rest (e.g. process different batches of the training data).
This information is also used to make processes on some machines work as ‘Parameter Servers’. These jobs don’t actually run any computations – they’re only responsible for storing the weights of the model and sending them over the network to other processes.
Connections between tasks in a distributed TensorFlow job with 3 workers and 2 parameter servers.
Apart from the worker I.D. and the job type (normal worker or parameter server), TensorFlow also needs to know the network addresses of other workers performing the computations. All this information should be passed as configuration for the tf.train.Server. However, keeping track of it all in addition to starting multiple processes on multiple machines with different parameters can be really tedious. That’s why we have cluster managers, such as Slurm.
Slurm
Slurm is a workload manager for Linux used by many of the world’s fastest supercomputers. It provides the means for running computational jobs on multiple nodes, queuing the jobs until sufficient resources are available and monitoring jobs that have been submitted. For more information about Slurm, you can read the official documentation here.
When running a Slurm job you can discover other nodes taking part by examining environment variables:
SLURMD_NODENAME – name of the current node
SLURM_JOB_NODELIST – number of nodes the job is using
SLURM_JOB_NUM_NODES – list of all nodes allocated to the job
Our python module parses these variables to make using distributed TensorFlow easier. With the tf_config_from_slurm function you can automate this process. Let’s see how it can be used to train a simple CIFAR-10 model on a CPU Slurm cluster.
Distributed TensorFlow on Slurm
In this section we’re going to show you how to run TensorFlow experiments on Slurm. A complete example of training a convolutional neural network on the CIFAR-10 dataset can be found in our github repo, so you might want to take a look at it. Here we’ll just examine the most interesting parts.
Most of the code responsible for training the model comes from this TensorFlow tutorial. The modifications allow the code to be run in a distributed setting on the CIFAR-10 dataset. Let’s examine the changes one by one.
Starting the Server
import tensorflow as tf
from tensorflow_on_slurm import tf_config_from_slurm
cluster, my_job_name, my_task_index = tf_config_from_slurm(ps_number=1)
cluster_spec = tf.train.ClusterSpec(cluster)
server = tf.train.Server(server_or_cluster_def=cluster_spec,
job_name=my_job_name, task_index=my_task_index)
if my_job_name == 'ps':
server.join()
sys.exit(0)
Here we import our Slurm helper module and use it to create and start the tf.train.Server. The tf_config_from_slurm function returns the cluster spec necessary to create the server along with the task name and task index of the current job. The ‘ps_number’ parameter specifies how many parameter servers to set up (we use 1). All other nodes will be working as normal workers and everything gets passed to the tf.train.Server constructor.
Afterwards we immediately check whether the current job is a parameter server. Since all the work in a parameter server (ps) job is handled by the tf.train.Server (which is running in a separate thread), we can just call server.join() and not execute the rest of the script.
Placing the Variables on a parameter server
def weight_variable(shape):
with tf.device("/job:ps/task:0"):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
with tf.device("/job:ps/task:0"):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
These two functions are used when defining the model parameters. Note the “with tf.device(“/job:ps/task:0”)” statements telling TensorFlow that the variables should be placed on the parameter server, thus enabling them to be shared between the workers. The “0” index denotes the I.D. of the parameter server used to store the variable. Here we’re only using one server, so all the variables are placed on task “0”.
Instead of using the usual AdamOptimizer, we’re wrapping it with the SyncReplicasOptimizer. This enables us to prevent the application of stale gradients. In distributed training, the network communication may introduce communication delays which make it harder to train the model.
In distributed settings we’re using the tf.train.MonitoredTrainingSession instead of the usual tf.Session. This ensures the variables are properly initialized. It also allows you to restore a previously saved model and control how the summaries and checkpoints are written to disk.
Training
During the training, we split the batches between workers so everyone has their own unique batch subset to train on:
for i in range(max_epoch):
batch = mnist.train.next_batch(batch_size)
if i % len(cluster['worker']) != my_task_index:
continue
_, train_accuracy, xentropy = sess.run([train_step, accuracy, cross_entropy],
feed_dict={x: batch[0], y_: batch[1],
keep_prob: 0.5})
Summary
We hope this example was helpful in your experiments with TensorFlow on Slurm clusters. If you’d like to reproduce it or use our Slurm helper module in your experiments, don’t hesitate to clone our github repo.
https://deepsense.ai/wp-content/uploads/2019/02/tensorflow-on-slurm-clusters.png3371140Tomasz Grelhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngTomasz Grel2017-06-26 09:09:382021-02-23 11:19:35Running distributed TensorFlow on Slurm clusters
It all started with mathematics – rigorous thinking, science, technology. Today’s world is maths‑driven. Despite recent advances in deep learning, the way mathematics is done today is still much the same as it was 100 years ago. Isn’t it time for a change?
Introduction
Mathematics is at the core of science and technology. However the growing amount of mathematical research makes it impossible for non‑experts to fully use the developments made in pure mathematics. Research has become more complicated and more interdependent.
Moreover it is often impossible to verify correctness for non‑experts – knowledge is accepted as knowledge by a small group of experts (e.g. the problem with accepting Mochizuki’s proof of abc‑conjecture – it is not understandable for other experts).
Fig. 1: The graph on the left shows the growing number of submissions to arXiv – an Internet repository for scientific research. In 2012, mathematics accounted for approx. 20,000 submissions annually.
Automated reasoning
To address the issue mentioned above, researchers try to automate or semi‑automate:
Producing mathematics
Verifying existing mathematics
This domain of science is called automatic theorem proving and is a part of automated reasoning. The current approach to automation is:
Take a mathematical work (e.g. Feit‑Thompson theorem or proof of Kepler’s conjecture)
Rewrite it in Coq, Mizar or another Interactive Theorem Prover (language/program which understands logic behind mathematics and is able to check its correctness)
Verify
The downside to this approach is that it is a purely manual work and quite a tedious process! One has to fill in the gaps as the human way of writing mathematics is different than what Coq/Mizar accepts. Moreover mathematical work is based on previous works. One needs to lay down foundations each time at least to some extent (but have a look at e.g. Mizar Math Library).
Once in Coq/Mizar, there is a growing number of methods to prove new theorems:
Hammers and tactics (methods for automatic reasoning over large libraries)
Machine learning and deep learning
Here we concentrate on the last method of automated reasoning. Firstly, in order to use the power of machine learning and deep learning, one needs more data. Moreover to keep up with current mathematical research we need to translate LaTeX into Coq/Mizar much faster.
Building a dictionary
We need to automate translation of human‑written mathematical works in LaTeX to Coq/Mizar. We view it as an NLP problem of creating a dictionary between two languages. How can we build such a dictionary? We could build upon existing syntactic parsers (e.g. TensorFlow’s SyntaxNet) and enhance them with Types and variables, which we explain in an example:
Consider the sentence “Let $G$ be a group” . Then “G” is a variable of Type “group”.
Once we have such a dictionary with at least some basic accuracy we can use it to translate LaTeX into Coq/Mizar sentence by sentence. Nevertheless we still need a good source of mathematics! Here is what we propose in the DeepAlgebra program:
Algebraic geometry is one of the pillars of modern mathematical research, which is rapidly developing and has a solid foundation (Grothendieck’s EGA/SGA, The Stacks Project). It is “abstract” hence easier to verify for computers than analytical parts of mathematics.
The Stacks Project is an open multi‑collaboration on foundations of algebraic geometry starting from scratch (category theory and algebra) up to the current research. It has a well‑organized structure (an easy‑to‑manage dependency graph) and is verified thoroughly for correctness.
The Stacks Project now consists of:
547,156 lines of code
16,738 tags (57 inactive tags)
2,691 sections
99 chapters
5,712 pages
162 slogans
Moreover it has an API to query!
Statements (also in LaTeX)
Data for graphs
Below we present a few screenshots.
Fig. 2: One of the lemmas in the Stacks Project. Each lemma has a unique tag (here 01WC), which never changes, even though the number of the lemma may change. Each lemma has a proof and we can access its dependency graphs:
Figs. 3 and 4: Two dependency graphs for Lemma 01WC, which show the structure of the proof together with all the lemmas, propositions and definitions which were used along the way.
Conclusion
Summing up, we propose to treat the Stacks Project as a source of data for NLP research and eventual translation into one of the Interactive Theorem Provers. The first step in the DeepAlgebra program is to build a dictionary (syntactic parser with Types/variables) and then test it on the Stacks Project. This way we would build an “ontology” of algebraic geometry. If that works out, we can verify, modify and test it on arXiv (Algebraic Geometry submissions). We will report on our progress in automated reasoning in future texts.
In the previous post we explained what region of interest pooling (RoI pooling for short) is. In this one, we present an example of applying RoI pooling in TensorFlow. We base it on our custom RoI pooling TensorFlow operation. We also use Neptune as a support in our experiment performance tracking.
Example overview
Our goal is to detect cars in the images. We’d like to construct a network that is able to automatically draw a box around every car.
In our example we deal with car images from the Pascal VOC 2007 dataset. For simplicity we choose only cars not marked as truncated.
We manage our experiment using Neptune. It’s a pretty handy tool:
We track the tuning in real time. Especially, we preview the currently estimated bounding boxes.
We can change model hyperparameters on the fly.
We can easily integrate Neptune with TensorFlow and get all the charts, graphs and summary objects from the TensorFlow graph.
We store the executed experiments in an aesthetic list.
Network architecture
In our example we use the Fast R-CNN architecture.
The network has two inputs:
Batch of images
Batch of potential bounding boxes – RoI proposals
In the Fast R-CNN model RoI proposals are generated via an external algorithm, for example selective search. In our example, we take ground truth bounding boxes from the Pascal annotations and generate more negative bounding boxes ourselves.
The network has two outputs:
Batch of RoI proposals not classified as background (with corrected coordinates)
Probabilities that RoI proposals consist of objects of the consecutive categories
The network consists of three main parts:
Deep convolutional neural network
Input: images
Output: feature map
We use the popular VGG16 network pretrained on the ImageNet dataset.
RoI pooling layer
Input: feature map, RoI proposals resized to a feature map
Output: max-pooled RoI proposals
Fully connected layer with RoI features
Input: max-pooled RoI proposals
Output: corrected RoI proposals, probabilities
We note that our detection task can be also solved with the Faster R-CNN architecture, which works significantly faster :). However, the implementation of Faster R-CNN requires much more code to write, so we chose the simpler Fast R-CNN.
Loss function
We tune the network to minimize the loss given by
\(loss = frac 1nsum_{i=1}^n frac 1{k_i} sum_{j=1}^{k_i} loss_{ij}\)
where:
\(n\) is a number of images in a batch,
\(k_i\) is a number of RoI proposals for the image \(i\),
\(loss_{ij}\) is a loss for the RoI proposal \(j\) for the image \(i\).
For a single RoI proposal, \(loss_{ij}\) is the sum of the classification and regression loss, where:
classification loss is the common cross entropy,
regression loss is a smooth L1 distance between the rescaled coordinates of a RoI proposal and the ground-truth box. The regression loss is computed if the ground-truth box is not categorized as background, otherwise it’s defined as 0.
You can download our code from our GitHub repository. It consists of two folders with the following content:
File
Purpose
code
main.py
The script to execute.
fast_rcnn.py
Builds the TensorFlow graph.
trainer.py
Preprocesses data and trains the network.
neptune_handler.py
Contains Neptune utilities.
config.yaml
Neptune configuration file.
get_data.py
Downloads images from Pascal VOC 2007 dataset
data
vgg16-20160129.tfmodel.torrent
References to weights of the pretrained network.
Description
When we run main.py , the script trainer.py first restores the VGG16 network with the pretrained weights. Then it adds the RoI pooling layer and the fully connected layer. Finally, it begins tuning the entire network with use of provided images and RoI proposals. It also sends information to Neptune, so we can track the tuning progress in real time.
After cloning the repository, please download the file vgg16-20160129.tfmodel referred to by the torrent file vgg16-20160129.tfmodel.torrent and save it in the data directory. Also, please run the script get_data.py to download needed images:
python get_data.py
Let’s test our RoI pooling in TensorFlow!
We run the script main.py from the code folder by typing:
neptune run --
--im_folder $PWD/../data/images
--roidb $PWD/../data/roidb
--pretrained_path $PWD/../data/vgg16-20160129.tfmodel
If we want to also use a non-default learning rate value or the number of epochs, we can add:
--learning_rate 1e-03 --num_epochs 200
to the command at the end.
After a while, we can start observing the tuning progress in Neptune:
Moreover, we can display the RoIs fitted to the cars by our network. We could just load all the processed images, but this procedure would take much of resources. That’s why we decided to activate this feature by a simple Neptune action.
To do that, we can go to the Actions tab and click ‘RUN’ to start sending the images.
After that, we can go to the Channels tab and expand the channels ‘region proposals for RoI pooling’ and ‘network detections’ by clicking ‘+’ signs.
Now we can see the RoIs in real time!
We can click on the pictures to zoom them. If we want Neptune to stop sending new images, we go to the Actions tab and click ‘RUN’ again.
An exemplary NeptuneGo execution of our script can be found here.
We hope you enjoy our example of RoI pooling in TensorFlow and experiment managing features offered by Neptune. If you want to comment our work, don’t be hesitate to leave us feedback!
References
R. Girshick, Fast R-CNN, IEEE International Conference on Computer Vision (ICCV), 2015.
https://deepsense.ai/wp-content/uploads/2019/02/roi-pooling-in-tensorflow-example.jpg3371140Krzysztof Dziedzichttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngKrzysztof Dziedzic2017-04-25 11:46:062021-01-05 16:49:52Region of interest pooling in TensorFlow – example
At the end of April 2017, deepsense.ai released a new version of Neptune, the DevOps platform for data scientists. Neptune 1.5 introduces a range of new features and improvements, including support for Python 3, simplification of Neptune CLI, offline execution, compact view, improved channels and charts, and a number of improvements in the user experience.
Python 3.5 Support
One of the most upvoted tickets on our feedback channel has been requests to add support for Python 3. Well, we put your request on our roadmap and now, using version 1.5, you can run Neptune experiments using both Python 2.7 and 3.5.
We encourage you to stay active in our feedback forum and vote for features you need in your data science work. This is how you will influence where Neptune goes and how it develops – and ultimately make it more convenient for you.
Simplification of Neptune CLI
Until now, Neptune CLI’s commands were long and complex. With version 1.5, however, convenience has taken center stage as we’ve introduced a host of improvements and simplifications. Click over and have a look at the simplified CLI commands and configuration file in our documentation.
To see how this change could work for you, compare the commands for running our “Flower Species Prediction” example.
In version 1.4:
neptune run flower-species-prediction/main.py --config flower-species-prediction/config.yaml --storage-url /tmp/neptune-iris --paths-to-dump flower-species-prediction
In version 1.5:
neptune run
Offline Execution
Our users often run parts of their experiments using Jupyter Notebook, but the Neptune client library requires communication with our server. Thanks to offline execution, users can disable communication with the server and run their experiments without CLI. Read more about this convenient development here.
Compact View
To make comparing your experiments easier we have introduced a compact view of the experiments table. You can now display more experiment results on your screen, and draw conclusions even faster and more confidently.
Improved Channels and Charts
Neptune 1.5 comes with the new API for channels and charts. Thanks to the new API you will be able to send and display even more data points. Your charts will load faster and more seamlessly. We encourage you to give the improved channels and charts a go.
The Neptune Pipeline
We are already working on the next version of Neptune, which is slated for a May release and will focus on better displaying the experiments list.
We hope you will enjoy working with our DevOps platform for data scientists. Neptune 1.5 will help you manage and monitor your machine learning experiments even more conveniently.
Would you like to test drive Neptune? Visit NeptuneGo!, have a look around and run your first experiments.