New Computer Vision Tool Accelerates Annotation of Digital Images and Video

To make data annotation required for train the deep neural networks (DNNs) faster, Intel is conducting research to find better methods and deliver new tools.

Data scientists need annotated data to train the deep neural networks (DNNs) at the core of AI workflows. Obtaining annotated data, or annotating data yourself, is a challenging and time-consuming process. For example, Intel says that it took about 3,100 total hours for members of its own data annotation team to annotate the more than 769,000 objects used for just one of our algorithms.

To solve this challenge, Intel presented a new open source program called Computer Vision Annotation Tool (CVAT, pronounced “si-vi-eɪ-ti”) that accelerates the process of annotating digital images and videos for use in training computer vision algorithms.

Though there is a wealth of training data available on the Internet, it isn’t always possible to use online data to train a deep learning algorithm. For example, there may not be pre-annotated data available for new use cases. If pre-annotated training data does exist, the data may require license agreements that prevent their use in commercial products.

Intel decided to create and support an internal data annotation team.

CVAT was designed to provide users with a set of convenient instruments for annotating digital images and videos. CVAT supports supervised machine learning tasks pertaining to object detection, image classification, and image segmentation. It enables users to annotate images with four types of shapes: boxes, polygons (both generally and for segmentation tasks), polylines (which can be useful for annotating markings on roads), and points (e.g., for annotating face landmarks or pose estimation).

Additionally, CVAT provides features facilitating typical annotation tasks, such as a number of automation instruments (including the ability to copy and propagate objects, interpolation, and automatic annotation using the TensorFlow* Object Detection API), visual settings, shortcuts, filters, and more.

CVAT is accessible via a browser-based interface; following a simple deployment via Docker, no further installation is necessary. CVAT supports collaboration between teams as well as work by individuals. Users can create public tasks and split up work between other users. CVAT is also highly flexible, with support for many different annotation scenarios, a variety of optional tools, and the ability to be embedded into platforms such as Onepanel.

Like many early open-source projects, CVAT also has some known limitations. Its client has only been tested in Google Chrom* and may not perform well in other browsers. Though CVAT supports some automatic testing, all checks must be done manually, which can slow the development process. CVAT’s documentation is currently somewhat limited, which can impede participation in the tool’s development. Finally, CVAT can have performance issues in certain use cases due to the limitations of Chrome Sandbox. Despite these disadvantages, Intel says that CVAT should remain a useful tool for image annotation workflows.

By using feedback from users, Intel will determine future directions for CVAT’s development. The compay hopes to improve the tool’s user experience, feature set, stability, automation features, and ability to be integrated with other services, and encourage members of the community to take an active part in CVAT’s development.

For a deeper dive into how CVAT works, visit Intel's post on the Intel Developer Zone.