extensible indexes in OpenDS
Extensible index is a new index introduced in OpenDS 2.0. Since there is not much of documentation available, I will explain here in detail. Before you dive into the content here, I will advise you to have a look at the wiki for collation matching rules. A collation matching rule helps you in performing an internationalized search. As explained on wiki, you can perform equality, ordering (less than, less than equal to , greater than and greater than equal to) and substring-based searches using the matching rules. Another good thing about them is that you can index the matching rules for faster searches. I will also like to clarify that an extensible index is much more than a collation index; however, we have just collation-based indexes at the moment. For the sake of clarify I will use extensible index when I mean only a subset of it - collation indexes. Interestingly, a collation index can contain all kinds of indexes ( except presence) I have blogged before. So you can configure to use an equality, substring or various kinds of ordering-based indexes. Before you get confused let us see how it works. Let us create an equality, substring and less-than-equal-to index for attribute "cn". >>>> Configure the properties of the Local DB Index Property Value(s) ----------------------------------------------------------------------- 1) attribute cn 2) index-entry-limit 4000 3) index-extensible-matching-rule No extensible matching rules will be indexed. 4) index-type equality, extensible, ordering,presence, substring ?) help f) finish - apply any changes to the Local DB Index c) cancel q) quit Enter choice [f]: 3 >>>> Configuring the "index-extensible-matching-rule" property The extensible matching rule in an extensible index. An extensible matching rule must be specified using either LOCALE or OID of the matching rule. Syntax: LOCALE | OID - A Locale or an OID. Do you want to modify the "index-extensible-matching-rule" property? 1) Keep the default behavior: No extensible matching rules will be indexed. 2) Add one or more values ?) help q) quit Enter choice [1]: 2 Enter a value for the "index-extensible-matching-rule" property [continue]: en.eq Enter another value for the "index-extensible-matching-rule" property [continue]: en.sub Enter another value for the "index-extensible-matching-rule" property [continue]: en.lte Enter another value for the "index-extensible-matching-rule" property [continue]: >>>> Configuring the "index-extensible-matching-rule" property (Continued) The "index-extensible-matching-rule" property has the following values: *) en.eq *) en.lte *) en.sub Do you want to modify the "index-extensible-matching-rule" property? 1) Use these values 2) Add one or more values 3) Remove one or more values 4) Reset to the default behavior: No extensible matching rules will be indexed. 5) Revert changes ?) help q) quit Enter choice [1]: Press RETURN to continue >>>> Configure the properties of the Local DB Index Property Value(s) ----------------------------------------------------------------------- 1) attribute cn 2) index-entry-limit 4000 3) index-extensible-matching-rule en.eq, en.lte, en.sub 4) index-type equality, extensible, ordering, presence, substring ?) help f) finish - apply any changes to the Local DB Index c) cancel q) quit Enter choice [f]: The Local DB Index was modified successfully Press RETURN to continue Rebuild the indexes because we already have some entries there with cn. sin > bin/rebuild-index -b dc=example,dc=com -i cn [21/May/2009:12:49:57 -0500] category=BACKEND severity=INFORMATION msgID=9437595 msg=Local DB backend ds-cfg-backend-id=userRoot,cn=Backends,cn=config does not specify the number of lock tables: defaulting to 53 [21/May/2009:12:49:57 -0500] category=BACKEND severity=INFORMATION msgID=9437594 msg=Local DB backend ds-cfg-backend-id=userRoot,cn=Backends,cn=config does not specify the number of cleaner threads: defaulting to 2 threads [21/May/2009:12:49:57 -0500] category=JEB severity=NOTICE msgID=8847510 msg=Due to changes in the configuration, index dc_example_dc_com_cn is currently operating in a degraded state and must be rebuilt before it can be used [21/May/2009:12:49:57 -0500] category=JEB severity=NOTICE msgID=8847497 msg=Rebuild of index(es) cn started with 3 total records to process [21/May/2009:12:49:57 -0500] category=JEB severity=NOTICE msgID=8847493 msg=Rebuild complete. Processed 3 records in 0 seconds (average rate 115.4/sec) See the list of all the databases including the extensible ones: sin > bin/dbtest list-database-containers -n userRoot -b "dc=example,dc=com" Database Name Database Type JE Database Name Entry Count ------------------------------------------------------------------------------ dn2id DN2ID dc_example_dc_com_dn2id 3 id2entry ID2Entry dc_example_dc_com_id2entry 3 referral DN2URI dc_example_dc_com_referral 0 id2children Index dc_example_dc_com_id2children 1 id2subtree Index dc_example_dc_com_id2subtree 1 state State dc_example_dc_com_state 23 aci.presence Index dc_example_dc_com_aci.presence 0 cn.equality Index dc_example_dc_com_cn.equality 2 cn.presence Index dc_example_dc_com_cn.presence 1 cn.substring Index dc_example_dc_com_cn.substring 7 cn.ordering Index dc_example_dc_com_cn.ordering 2 cn.en.shared Index dc_example_dc_com_cn.en.shared 2 ----> extensible index cn.en.substring Index dc_example_dc_com_cn.en.substring 7 ---> extensible index An extensible (only collation-based) index database is named as "attribute.locale.index_type". You may want to note that an equality or ordering index will created a "shared" index database (cn.en.shared) because the content is same in both. Let us dump the content of each. Using dbtest sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.en.shared Indexed Value (10 bytes): STU Entry ID List (8 bytes): 2 Indexed Value (12 bytes): hfXe Entry ID List (8 bytes): 3 Total Records: 2 Total / Average Key Size: 24 bytes / 12 bytes Total / Average Data Size: 16 bytes / 8 bytes sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.en.substring Indexed Value (6 bytes): STU Entry ID List (8 bytes): 2 Indexed Value (4 bytes): TU Entry ID List (8 bytes): 2 Indexed Value (2 bytes): U Entry ID List (8 bytes): 2 Indexed Value (4 bytes): Xe Entry ID List (8 bytes): 3 Indexed Value (2 bytes): e Entry ID List (8 bytes): 3 Indexed Value (6 bytes): fXe Entry ID List (8 bytes): 3 Indexed Value (8 bytes): hfXe Entry ID List (8 bytes): 3 Total Records: 7 Total / Average Key Size: 34 bytes / 4 bytes Total / Average Data Size: 56 bytes / 8 bytes Using dbdump sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.en.shared VERSION=3 format=print type=btree dupsort=0 HEADER=END \00S\00T\00U\00\00\00\00 \00\00\00\00\00\00\00\02 \00h\00f\00X\00e\00\00\00\00 \00\00\00\00\00\00\00\03 DATA=END sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.en.substring VERSION=3 format=print type=btree dupsort=0 HEADER=END \00S\00T\00U \00\00\00\00\00\00\00\02 \00T\00U \00\00\00\00\00\00\00\02 \00U \00\00\00\00\00\00\00\02 \00X\00e \00\00\00\00\00\00\00\03 \00e \00\00\00\00\00\00\00\03 \00f\00X\00e \00\00\00\00\00\00\00\03 \00h\00f\00X\00e \00\00\00\00\00\00\00\03 DATA=END If you look carefully you would find that the contents of an equality index and extensible equality indexes are different for same value of cn( i.e. 'user" ). Like a normal equality index, the extensible equality index also contains a normalized value as the key. However, the value is normalized according to Unicode standards (NFKC) which requires using java.text.Normalizer.
