Thursday May 18, 2006 | cn=Directory Manager All about Directory Server |
A Quick Introduction to ASN.1 BERMany network protocols are text-based, which has the advantages of being relatively easy to understand if you examine the network traffic, and in many cases you can even interact with the target server by simply telnetting to it and typing in the appropriate commands. However, there are disadvantages as well, including that they are generally more verbose and less efficient to parse than they need to be. On the other hand, other protocols use a binary encoding that is more compact and more efficient. LDAP falls into this category, and uses the ASN.1 (abstract syntax notation one) mechanism, and more specifically the BER (basic encoding rules) flavor of ASN.1. There are a number of other encoding rules (e.g., DER, PER, CER, etc.) that fall under the ASN.1 umbrella, but since LDAP uses BER that's what I'll focus on in this post. In general, when I talk about ASN.1, I mean BER.I should first point out that this is a very cursory overview of ASN.1 and doesn't attempt to cover everything. I'm largely focusing on the subset of BER that is actually used by LDAP, and there are some obscure special cases that I'll not get into as well. For a much more in-depth reference, check out the excellent ASN.1 Complete reference book by John Larmouth which is freely available in PDF form (although you do have to fill out a form to be able to download it) or you can buy the book in "dead tree" form. I should also say that this discussion assumes that you have at least a basic understanding of binary and hexadecimal numbering systems. If you aren't familiar with that or need to brush up on it, then I'm sure you'll be able to find plenty of sites to help with that. BER elements use a TLV structure, where TLV stands for "type, length, and value". That is, each BER element has one or more bytes (in all cases I'm aware of in LDAP, it's only a single byte) that indicates the data type for the element, one or more bytes that indicates the length of the value, and the encoded value itself (where the form of the encoded value depends on the data type) which can be zero or more bytes. I'll expand on each of these in the next sections. The BER Type The BER type indicates the data type for the value of the element. There are lots of different data types available, but the most commonly-used (at least in LDAP) include OCTET STRING (which can be either a text string or just some binary data), INTEGER, BOOLEAN, NULL, ENUMERATED (like an integer, but where each value has a special meaning), SEQUENCE (an ordered collection of other elements, kind of like an array), and SET (the same as a sequence, except that the order doesn't matter). There is also a CHOICE element, but most of the time it just means that you can have one of a few different kinds of elements. As I mentioned above, the BER type is usually only a single byte, and this byte has data encoded in it. The two most significant bits (i.e., the two leftmost bits, since BER always uses big endian/network ordering) are used to indicate the class for the element. The possible class values are:
The next bit (the third from the left) is the primitive/constructed bit. If it is set to zero (i.e., "off"), then the element is considered primitive and therefore the value would be encoded in accordance with the rules of that data type. If it is set to one (i.e., "on"), then it means that the value is constructed from zero or more other ASN.1 elements that are concatenated together in their encoded forms. For example, if you look at the universal SEQUENCE type of 0x30, the binary encoding is "00110000" and the primitive/constructed bit is set to one indicating that the value of the sequence is constructed from zero or more encoded elements. The final five bits of the BER type byte are used to specify the value of that type, and it's treated as a simple integer value (where "00000" is zero, "00001" is one, "00010" is two, "00011" is three, etc.). The only special value is "11111", which means that the type value is larger than can fit in the five bits allowed so multiple bytes will be required. Since this doesn't happen in LDAP we'll ignore it in this discussion. The BER Length The second component in the TLV structure of a BER element is the length. This specifies the size in bytes of the encoded value. For the most part, this uses a straightforward binary encoding of the integer value (e.g., so if the encoded value is five bytes long, then it would be encoded as 00000101 binary, or 0x05 hex), but if the value is longer than 127 bytes then it will be necessary to use multiple bytes to encode the length. In that case, the first byte has the leftmost bit set to one and the remaining seven bits are used to specify the number of bytes required to encode the full length. For example, if there are 500 bytes in the length (hex 0x01F4), then the encoded length will actually consist of three bytes: 82 01 F4. Note that there is an alternate form for encoding the length called the indefinite form. In this mechanism, only a part of the length is given at a time, kind of like the chunked encoding that is available in HTTP 1.1. However, this form is not used in LDAP (as per RFC 2251 section 5.1), so it won't be discussed here any further. The BER Value The value is the heart of the BER element because it contains the actual data of the element. Because BER is a binary encoding, the encodings can take advantage of that to represent the data in a compact form. As such, each data type has its own encoded form. These encodings include:
BER Encoding Examples Now that we've covered the basics of encoding the type, length, and value components of a BER element, we can put together some examples. The example above for encoding a SEQUENCE value actually had two complete BER elements concatenated together: the OCTET STRING representations of the strings "Hello" and "there". They are:
In both of these cases, the first byte is the type (0x04, which is the universal primitive OCTET STRING type), and the second is the length (0x05, indicating that there are five bytes in the value). The remaining five bytes are the encoded representations of the strings "Hello" and "there". Another simple examle would be to encode the integer value 3. This time, though, let's use a context-specific type value of 5 rather than the universal INTEGER type. In this case, the encoding would be:
Now let's go for a little more involved (and more practical) example. Let's encode an LDAP bind request protocol op as defined in RFC 2251 section 4.2. A simplified BNF representation of this element is as follows:
In this case, we'll encode a bind request using simple authentication for the user "cn=test" with a password of "password". The complete encoding for this bind request protocol op is: 60 16 02 01 03 04 07 63 6E 3D 74 65 73 74 80 08 70 61 73 73 77 6F 72 64 That's a fairly long string of bytes, but let's break it down to make it simpler:
I realize that was a pretty significant jump in complexity between my examples. However, hopefully if you can follow the explanation of the encoding of the bind request element, then you're well on your way to being able to debug LDAP protocol communication. For additional help, check out the LDAPDecoder tool provided as part of SLAMD (if you use the " -b" option, it will show you the raw bytes for the communication along with the decoded human-readable representation. You can also check out the code in the com.sun.slamd.asn1 package in the SLAMD source code for a Java implementation of a simple BER encoder/decoder (it's what the LDAPDecoder uses behind the scenes to translate between raw bytes and BER elements).
Posted by cn_equals_directory_manager ( May 18 2006, 08:03:47 AM CDT ) Permalink Comments [2] |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||