Table of contents
Table of contents
Object recognition is common for street and satellite photos, diagram analysis and text recognition. After the DS team, including several deepsense.ai data scientists, took first place in the National Museum in Warsaw’s HackArt hackathon, designing a “Museum Treasures” game, the technique may soon be used to popularize art and culture, too.
In May 2018, the National Museum in Warsaw organized its HackArt hackathon. The task was to combine seemingly disparate fields: museology, art history and artificial intelligence. The goal of HackArt was to create tools: AI-based applications, bots and plug-ins that could help solve challenges the museum set.
The prize for correctly retrieving the images could be stickers or other small gadgets and a badge that could be awarded on social media.
The game can also be developed for group work. Ultimately, schools could use it. One of the ideas also assumes gamification:
The idea
Our focus was on the target group of parents and children, and on answering the questions of:- How to encourage families to visit the museum
- How to make the visit interesting for children
- How to build interest, among children and their parents, in the museum’s resources.
The execution
Artificial intelligence allows for the automation of many activities, especially tedious and time-consuming ones. Object recognition in street and satellite photos, diagram analysis, text recognition and analysis – there are countless applications. Now the automatic recognition of what can be found in paintings hanging in a museum can be added to the list. Everyone participating in the hackathon had access to 200 photos of museum pieces. Our solution was to create a database of specific fragments of images – animals, trees, houses, feet, you name it – which could be used to create the various paths of a treasure hunt by category. The database would be extensive, and elements from new exhibitions could be added to it once they were digitalized. The only limitation in selecting the fragments to be used was in this case the set of elements which the object detection model we intended to use could recognize.Image analysis
Take, for example, the image of Antoni Brodowski “Parys in a hat”. How would the object detection model work in this case? In the image, we will recognize, for example, the head, the hand, the cap and the human, together with the areas where these elements appear (coordinates x, y of the rectangle) and the probability (the certainty with which the model found the element, p). A dictionary of fragments is thus created:{category_1: image_id, detected fragment, (x, y), probability p}
Object detection models (among many other kinds of models) can be found in the open-source code repositories on GitHub. Depending on your needs and situation, a model can be taught from scratch using the data available, or, alternatively, pre-trained, ready-made models can be used. We chose the latter approach, because training a model requires a set of photos and labels. The labels in this case would define the category of elements and coordinates of their occurrence for each photo. Because we lacked such labels, we went with an object detection model from the Tensorflow Models (Google Vision API) repository, which allows quick detection and searches on 545 categories. [irp posts=”18852″ name=”How machine learning improves business efficiency – five practical examples”] The code that enables objects to be found in images from a set is located on the stared/hackart-you-in-artwork repository. Below you will see a fragment that likewise makes it possible to detect objects in an image, and to then be saved as a json file. The code for each photo (from the list of photos to be processed) launches a neural network that recognizes the objects in the picture. As a result, we receive thousands of frame proposals along with the class and probability of the occurrence of this class in the area in question. We rejected results that were too uncertain, while adding the remaining ones to the dictionary to use in further stages of working with data. The fragment of code pictured here displays our results on the image and saves them to files, thanks to which we can visually verify the results obtained.results = dict() for image_path in tqdm(list(TEST_IMAGE_PATHS)): base_name = image_path.split('/')[-1][:-4] image = Image.open(image_path) image_np = load_image_into_numpy_array(image) image_np_expanded = np.expand_dims(image_np, axis=0) (boxes, scores, classes, num) = sess.run( [detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded}) image_np = cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB) objects = [] for s, c, b in zip(scores[0], classes[0], boxes[0]): if s > THRESH: b = list(b) objects.append({"prob": s, "name": str(category_index[c]['name']), "xmin": b[0], "ymin": b[1], "xmax": b[2], "ymax": b[3]}) results[image_path.split('/')[-1]] = objects # Visualization of the results of a detection. vis_util.visualize_boxes_and_labels_on_image_array( image_np, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, min_score_thresh=THRESH, use_normalized_coordinates=True, line_thickness=8) cv2.imwrite('%s.jpg' % base_name, image_np) counter = counter+1 print(results) with open('oidv3.json', 'w') as f: json.dump(results, f)Source: https://github.com/stared/hackart-you-in-artwork/blob/master/aux/scripts_karol/evaluate_on_images.py
Outline of how the game works
Preparing the dictionary of fragments and their categories was only part of the task. At the same time, the work of developing the application demo awaited us. We had to focus on basic functionalities to be able to build the skeleton of the solution: The application was intended to work automatically. After entering a new set of photos, a dictionary of categories was created (with additional filters to improve the quality of the items received), from which the user is then presented with 5 categories to choose from. During the hackathon we had a limited set of photos (→ lower credibility of automatically generated elements) and time constraints, so we supervised some of the tasks. For example, we checked the quality of the elements generated and merged several categories into one: cat, dog, fish, … → animals. We created a web application and used Vue.js to create it. We assumed the following about the “Museum Treasures” game:- It could be played in an “analog” version: downloading a pdf with fragments of images and information about which room they could be found in → a designated “path” through the rooms; in this case, the player doesn’t need to use a smartphone or tablet, which may be important for parents and school trips.
- it could also be played electronically: using a smartphone or tablet, with analogous information as above.
Further development ideas
In its basic form, “Museum Treasures” has players look for images, but in further development verification, rewarding and gamification could be added to enrich the experience. Defining the rules of the game, goals and motivation was very important and involved determining the age of the players. The challenges that await a five-year-old will differ from those 12-year-olds will take on. We believed the game could also be interesting for adults, as paths could likewise be created for mature users. We had several ideas for introducing these elements and developing them further. You can read a few of them below.Verification | |
Metadata | To confirm that a painting has been found, the player enters information about the painter, the year the work was done, etc. These type of data can easily be entered into the dictionary, and questions can be generated based on them. |
photos | A more advanced form of verification involves requiring the participant to take a picture – an image or note down information. The application could be enriched with a module comparing the image photographed with the fragment source. This solution is much more technically complex. There may also be occasions when photography is of poor quality and can’t be verified. |
Clues | |
metadata | Hints about the painter |
Generating descriptions with AI | Using algorithms to generate photo captions defining the context of what is found in the picture. This could be an interesting extra, though such captions don’t always work properly as clues. |
- two groups follow separate paths, with their times compared at the end,
- two groups follow paths that end up at the same place, thus allowing the players to meet at the end of the game (and talk about who came in first).