substring index in OpenDS
A substring index has the largest footprint in terms of the database size. When a substring index is created on a particular value, a number of combinations are tried to generate the keys. For example, a value "user" will have the keys such as : "user", "ser", "er" and "r". A substring filter (such as "u*s*e*r" ) typically has 3 parts
subInitial -- It is the value before the * , i.e. "u"
subAny -- List of all the values between subInitial and subFinal separated by *, i.e. ["s", "e"]
subFinal -- Final value after the *, i.e. "r"
You are free to select any combinations of the these components. For example, "u*r", "*r" and "u*" etc. However, you may want to use a filter intelligently so that it filters out most of the entries while building a list of EntryIDs. If you are familiar with index limit, you may be aware that if the size of the list crosses this mark, the indexing won't be used and it may be a costly search. See how the database dump looks like for substring indexes:
using dbtest
sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.substring
Indexed Value (3 bytes): abc
Entry ID List (8 bytes): 2
Indexed Value (2 bytes): bc
Entry ID List (8 bytes): 2
Indexed Value (1 bytes): c
Entry ID List (8 bytes): 2
Indexed Value (2 bytes): er
Entry ID List (8 bytes): 3
Indexed Value (1 bytes): r
Entry ID List (8 bytes): 3
Indexed Value (3 bytes): ser
Entry ID List (8 bytes): 3
Indexed Value (4 bytes): user
Entry ID List (8 bytes): 3
Total Records: 7
Total / Average Key Size: 17 bytes / 2 bytes
Total / Average Data Size: 56 bytes / 8 bytes
Using dbdump
sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.substring
VERSION=3
format=print
type=btree
dupsort=0
HEADER=END
abc
\00\00\00\00\00\00\00\02
bc
\00\00\00\00\00\00\00\02
c
\00\00\00\00\00\00\00\02
er
\00\00\00\00\00\00\00\03
r
\00\00\00\00\00\00\00\03
ser
\00\00\00\00\00\00\00\03
user
\00\00\00\00\00\00\00\03
DATA=END
