Page History: Datamatrix (2D-Barcode)
Compare Page Revisions
Page Revision: 2010/06/26 22:06
Principle of Datamatrix Barcodes
Datamatrix (or Data Matrix) is a high density 2 dimensional barcode that can encode up to 3116 characters from the entire 256 byte ASCII character set. Compared with DF417 barcode symbology the datamatrix barcode belongs to newer family of 2 dimensional barcodes that makes better use of both dimensions and thus can achieve higher data capacity than the PDF417 symbology (~3kB vs ~2kB). The symbol is built on a square grid which have a finder pattern around the edges of the symbol to allow a scanner to identify the barcode. The finder pattern makes it possible to read the barcode regardless of the physical orientation of the code.
In the same way as with other 2 dimensional barcodes the datamatrix code includes error correction capability in order to be resilient towards physical damages of a code. Originally data matrix used an older convolutional error correction schema (ECC) but that has later been changed to use a Reed-Solomon type of error correction which is much more efficient. The older ECC version is known as ECC 000 to ECC 140 and should be considered obsolete and should not be used in new applications.
The newer error correction schema (with Reed-Solomon codes) is known as ECC 200 schema and is the current and recommended schema. By default the library will use the newer schema but support also exists for legacy applications to use the older ECC schema.
The image shows an annotated Datamatrix where the finder and synchronization patterns have been highlighted.
Even though it is primarily designed to handle the the Western alphabet (ISO-8859/x code tables) it will support user prepared Unicode characters through the use of the "Extended Channel Interpretation" (ECI) mechanism. However description of the ECI standard is out of scope for this manual and the interested reader are referred to the official ECI standard document.
Datamatrix standard has been adopted by (among others) "The American National Standards Institute" (ANSI) as a standard symbology and a number of industry standard associations (e.g. EIA, SEMI, AIAG, ATA) where it has been recommended for use.
Summary of features offered in the library
The following list summarizes the features that the library offers for
Datamatrix barcodes. Some of the terms used here assumes familiarity with
Datamatrix barcodes. All terms are also described in the remainder of this
chapter.
-
Supports both the new ECC 200 variant and the older ECC 140
-
Output formats
-
Supports all recommended encodation formats
-
ASCII
-
C40
-
BASE256
-
Text
-
X12
-
Supports all specified symbol sizes
-
Supports both auto and user selectable encodation
-
Supports both auto and user selectable symbol size
-
Supports user specified module size
-
Supports custom color specification (foreground, background)
-
Supports user specified quiet zone
-
Supports easy handling of non-printable characters through the use of special
escape sequences ("Tilde" - processing)
-
Supports concatenated symbols
-
Symbols can be written directly to a file or sent back as an image to the
browser
Limitation of the JpGraph Datamatrix implementation
This version of the library does not support the EDIFACT compaction standard due to the very specialized and limited use of this encodation schema.
Datamatrix standard
Datamatrix as a standard is fully described in the ISO/IEC 16022E
International Standard and is available for purchase from the
ISO Standard Organization.
Additional information about Data Matrix code is available in the following
United States patents: 4,939,354; 5,053,609; 5,124,536. See
US
patent Office
for full disclosures of these patents.
Structure of Data Matrix codes
Datamatrix is a two-dimensional symbology in the shape of a rectangle. The size and shape of the symbol is usually chosen either automatically or by the user. Usually it is chosen to be the smallest size that will have enough data capacity to encode the given data. The symbol rectangle is build up by square dots whose size "the module" is also user specified.
The Data Matrix symbol rectangle comes in two basic shapes.
- It is either a square between the sizes of 10x10 up to 144x144 modules in even steps
- It is a rectangle between the size of 8x16 up to 16x48
Datamatrix - Square symbol shape
Datamatrix - Rectangle symbol shape
The maximum capacity for Data Matrix codes is up to 3116 numeric characters or up to 2335 alphanumeric characters or up to 1555 bytes of binary information.
The exact number of characters that can fit in a Data Matrix symbol depends on the actual encoding (or compaction) schema used. In short this is used to more efficiently encode ASCII characters to fit more data into a fixed number of bytes. For example if only numeric data is to be encoded then instead of using one byte to hold each digit two digits is stored in a single byte hence doubling the amount of data that can be stored in a given number of bytes.
To encode data into a Datamatrix symbol the following (principal) steps are taken.
1. The input string (which can be any ASCII values between 0-255) is encoded using the selected encoding or encodings (it is possible to switch encoding mid-way through the string). The primary purpose of the encoding is to compress the data into a much shorter form.
2. If needed the data is padded to fill up to the capacity of the selected symbol size.
3. Once the string has been encoded (and possible padded) a number of error correcting code words are added so that the data can be recovered even if part of the printed symbol have been destroyed (perhaps a corner has been teared off).
4. Finally the encoded data and the error correcting words are placed in the symbol according to an algorithm specified in the standard. This is done by placing each bit of every data byte in a specific position in the data matrix symbol.
The above explanation is by necessity simplified and for those interested into the specific details we refer to the official standard. It is also possible to review the code itself to understand the details.
Encodation efficiency
As explained in the previous section several compaction schema are used to encode the data to enable more data to fit in a given symbol. Depending on the actual data there are several compaction schema that can be used in order to achieve the greatest possible compression. The standard specifies six different schema. The compaction efficiency are given in Table 26.1.
Depending on the application the user of the library may chose to either select a fixed encodation mode but it is usually best to let the library automatically select a combination of encodation schema that will give the smallest possible symbol size.
Table : Datamatrix encodation efficiency
More on ECC Datamatrix subsets
As was mentioned in the introduction there are two main subsets of Datamatrix symbols. Those using convolutional codes for error correction which were used for most of the initial installations of Datamatrix systems, these earlier versions are referenced as ECC-000 to ECC-140 (the number specifies the level of convolutional error correcting code).
This first subset will be commonly referred to as ECC-140 in the remainder of this manual.
The second subset is referenced ECC-200 and uses Reed-Solomon error correction techniques. The two subsets have the following characteristic:
- ECC-000 to ECC-140 symbols all have an odd number of modules along each square side.
- ECC-200 symbols have an even number of modules on each side. ECC-200 can have non-square symbol sizes.
Hence the type of encoding used is auto-discriminative. The maximum data capacity of an ECC-200 symbol is 3116 numeric digits, or 2335 alpha numeric characters, in the largest 144 modules square symbol.
Even though the library supports the creation of both type of Datamatrix symbols it is recommended that all new applications uses the more modern ECC-200 subset. This is also the recommendation in the standard. ECC-140 should only be used in legacy system where old equipment is used which have not be upgraded to handle the modern ECC-200 subset.
Symbology Data capacity
As was mentioned in the previous section the actual data capacity depends on the symbol size. By default the library will select the smallest possible symbol size that will encode a given character string with the chosen encoding (possibly automatic). Table 2 below gives the maximum capacity for the three most common encoding schema for each symbol size as well as robustness in each symbol specified as the number of errors (destroyed data) that can be recovered.
Table : Maximum data capacity for the different symbol sizes in ECC-200 Data Matrix