Object detection refers to the ability of computer systems to locate desired types of objects from an image/scene.
For object detection, the train data is either represented using XML files or JSON files. Each representation has its pros and cons.
In this article, we will be understanding how one such dataset representation helps us with object detection.
We will discuss what the PASCAL VOC format is, the history behind it, and how we use it for object detection.
We will also build a simple dataset format validator using Python to verify if the dataset adheres to the rules of the PASCAL VOC format.
Table of contents
To follow along, the reader must have the following:
- A good understanding of how to work with machine learning datasets.
- A decent understanding of object detection.
- Good knowledge of Python.
- A code editor of your choice.
For a machine learning model to detect objects of an image, it must be trained with a dataset that holds all information about the objects present in an image.
The dataset that contains information about all objects present in an image is built using a process called Annotation.
In the context of object detection, annotation helps us map an object to its respective label by drawing a rectangular box (called bounding box) over the object.
Source: An example of annotation by becominghuman.ai
As you can see in the above image, we map the objects with their respective labels like a
Each object-label mapping is represented with a rectangular box called “Bounding box”. Bounding boxes are a series of coordinates or values that represent the position of an object in an image.
The representation of bounding boxes might vary according to the dataset.
Let’s discuss more about bounding boxes in the upcoming sections.
This dataset provides standardized images for object detection and segmentation problems.
These datasets are built using tools that follow standardized procedures for the evaluation and comparison of different methods.
In 2008, PASCAL VOC datasets were declared as the benchmark for object detection.
History behind PASCAL VOC
Pattern Analysis, Statistical Modelling, and Computational Learning (PASCAL) ran a series of challenges for object detection from 2005 to 2012 following a standardized file structure for holding these image annotations.
The PASCAL Visual Object Classes (VOC) challenge had two main components:
- A publicly available dataset with standardized evaluation software.
- An annual competition and a workshop.
The main objectives of this challenge were to find out the ability of models to perform:
Classification- Check if an object is part of the image.
Detection- Locate the position of the objects present in the image.
This series of challenges came to end in 2012 with major enhancements and improvements to the dataset.
Now, PASCAL VOC provides standardized image datasets for over 20 different classes that are commonly used for tasks like object detection, semantic segmentation, and other classification tasks.
To understand more about PASCAL VOC, it is highly recommended to read this research paper.
PASCAL VOC taxonomy
Here is a sample of what the structure of the PASCAL VOC dataset looks like:
Source: Marmot dataset for table recognition
You can find the above sample dataset here.
As you can see in the above image, these object annotations are represented using the following fields:
The name of the parent folder that the dataset is present in. This field helps us locate the annotated images within a directory.
Here, as you can see, the image file is present within a folder named
The image filename where the data is annotated on. This field specifies a relative path of the annotated image file.
Here, the file we are working on is
The absolute path where the image file is present.
Here, we have all the image files present under the absolute path
Specifies the original location of the file in a database.
Since we do not use a database, it is set to
Unknown by default.
depth of an image.
As you can see the image is
793 pixels wide,
1123 pixels tall, and
3 pixels deep.
In images, usually,
depthfield represents the RGB color scale i.e. 3.
This field signifies if the images contain annotations that are non-linear (irregular) in shape - commonly referred to as polygons.
By default, the
segmented value is set as
0 (linear shape).
This field specifies the name of the annotated label. Here, the label is a
Specifies the skewness or orientation of the image. By default, it is specified as
Unspecified, which means that the image is not skewed.
Tells if an object is fully or partially visible (can be either 0 or 1 respectively).
Tells if an object is difficult to recognize from an image (can be either 0 - easy or 1 - difficult).
These are coordinates that determine the location of the object.
These coordinates are represented as
[xmin, ymin, xmax, ymax] where they correspond to
(x, y) coordinates of top-left and bottom-right positions of an object.
Here, the values of bounding boxes are
[458, 710, 517, 785].
PASCAL VOC validator
Having understood the overall structure of how the PASCAL VOC dataset looks like, let’s now dive into implementing a simple dataset validator using Python.
We will use 2 libraries for handling XML files:
xmltodict- to work with XML files as we work with JSON files or dictionaries.
xml.etree- used for parsing and creating XML data.
Import them as shown below:
import xmltodict import xml.etree.ElementTree as ET
Here, we will be reading the dataset file by parsing it with an XML parser as shown:
dataset_file = r'/sample.xml' # The path to the XML file xml_tree = ET.parse(dataset_file) # Parse the XML file root = xml_tree.getroot() # Find the root element
To verify the validity of a PASCAL VOC dataset, we will be using
assert() assertion statements in Python.
In simple words,
assert() is used to debug code by testing for certain criteria. If it does not meet the criteria, it throws a default error. Although, we can customize the errors to be raised.
To learn more about assertions in Python, it is recommended to read this article.
It is highly recommended to learn by keeping the sample of the PASCAL VOC dataset open in a new tab or window.
You can find the sample dataset here.
assert root.tag == 'annotation' or root.attrib['verified'] == 'yes', "PASCAL VOC does not contain a root element" # Check if the root element is "annotation" assert len(root.findtext('folder')) > 0, "XML file does not contain a 'folder' element" assert len(root.findtext('filename')) > 0, "XML file does not contain a 'filename'" assert len(root.findtext('path')) > 0, "XML file does not contain 'path' element" assert len(root.find('source')) == 1 and len(root.find('source').findtext('database')) > 0, "XML file does not contain 'source' element with a 'database'" assert len(root.find('size')) == 3, "XML file doesn not contain 'size' element" assert root.find('size').find('width').text and root.find('size').find('height').text and root.find('size').find('depth').text, "XML file does not contain either 'width', 'height', or 'depth' element" assert root.find('segmented').text == '0' or len(root.find('segmented')) > 0, "'segmented' element is neither 0 or a list" assert len(root.findall('object')) > 0, "XML file contains no 'object' element" # Check if the root contains zero or more 'objects'
The code above does the following:
- Checks if the
annotation. Having the
verifiedattribute to be
yes, is optional.
- Checks if the dataset contains a
sourceby verifying the length to be greater than
- Checks for the
sizeobject to contain
- Finally, it checks for the
segmentedparameter. It must either contain a value of
0or an empty list.
segmentedlist denotes that the object is not in linear shape. Therefore, the mask values for the polygon (non-linear) shape must be present to identify such objects. You can read more about this here.
Having covered all the meta-data about the image, let’s move into validating each object.
annotationkey, there may be more than one object. Therefore, we loop through all the
required_objects = ['name', 'pose', 'truncated', 'difficult', 'bndbox'] # All possible meta-data about an object for obj in root.findall('object'): assert len(obj.findtext(required_objects)) > 0, "Object does not contain a parameter 'name'" assert len(obj.findtext(required_objects)) > 0, "Object does not contain a parameter 'pose'" assert int(obj.findtext(required_objects)) in [0, 1], "Object does not contain a parameter 'truncated'" assert int(obj.findtext(required_objects)) in [0, 1], "Object does not contain a parameter 'difficult'" assert len(obj.findall(required_objects)) > 0, "Object does not contain a parameter 'bndbox'" for bbox in obj.findall(required_objects): assert int(bbox.findtext('xmin')) > 0, "'xmin' value for the bounding box is missing " assert int(bbox.findtext('ymin')) > 0, "'ymin' value for the bounding box is missing " assert int(bbox.findtext('xmax')) > 0, "'xmax' value for the bounding box is missing " assert int(bbox.findtext('ymax')) > 0, "'ymax' value for the bounding box is missing " print('The dataset format is PASCAL VOC!')
The above code does the following:
- Declares a list
required_objectscontaining all possible meta-data keys that are present within the
- Loops through each
objectto check for the presence of keys in
- The possible values for
difficultare binary. Therefore, we check if the extracted value is either
If all the assertions are passed successfully, we may call the dataset to be in PASCAL VOC format.
The above code snippets help us validate and point out errors if we have missed out on any required key.
PASCAL VOC dataset is used for object detection and segmentation. Its representation as XML files helps us customize datasets easily while using a standardized format for representation.
To summarize, the reader learned:
- How objects are detected by training the annotations.
- What PASCAL VOC is and how it originated.
- The different meta-data parameters required for PASCAL VOC dataset representation.
- Finally, the reader implemented a simple Python validation script to verify the authenticity of the PASCAL VOC dataset.
You can find the source code here.
Peer Review Contributions by: Wanja Mike