DWARF and relocations
In order to explain why relocation processing is important when reading DWARF info, brief introduction into DWARF2 format is necessary.
A little something about DWARF
DWARF is widely used debugging data format. It is suitable for virtually any source language and in Sun Studio is used to represent debugging information for C, C++ and Fortran. There exists three versions of the standard - 1.1, 2.0 and most recent 3.0. Sun Studio compilers generate version 2.0, commonly referred as DWARF2.
DWARF2 information is stored in a table of records (contained in .debug_info ELF section) called Debug Information Entries (DIE). Each DIE has a
- type (for example, DW_TAG_compile_unit),
- several attribute/value pairs that describe the entry (for example, DW_AT_language=DW_LANG_C_plus_plus).
DW_TAG_compile_unit DW_children_yes DW_AT_name DW_FORM_string DW_AT_language DW_FORM_data1 ... DW_AT_low_poc DW_FORM_addr DW_AT_high_pc DW_FORM_addr ...This entry describes a CU, which contains name (string), source language attribute (1 byte data), addresses of beginning and the end (4/8-byte data, depending on memory model) and other fields.
So DIE type is just an index into that table of abbreviations:
.debug_info
CU abbrev_offset=0
DW_TAG_compile_unit (code=1) ----+
... |
... |
| (points to debug_abbrev entry #1)
.debug_abbrev |
DW_TAG_compile_unit (code=1) <--+
DW_AT_producer DW_FORM_strp
DW_AT_language DW_FORM_data1
...
DW_TAG_variable (code=2)
DW_AT_name DW_FORM_strp
DW_AT_decl_file DW_FORM_data1
...
Relocations that affect DWARF data
When several object files are linked
together with
$ ld -r file1.o file2.o -o combined.o
to produce relocatable file combined.o, all
.debug_abbrev sections are glued together by the linker into one
section, which invalidates indexes of abbreviation tables for all
but first object file.
For example, if second file, file2.o, contained description of DW_TAG_variable in its own .debug_abbrev table at index 4 and first file, file1.o, had, say, DW_TAG_typedef at that index, dwarfdump for combined.o would look in .debug_abbrev table at index 4 thinking that it describes variable, while this entry actually describes typedef. There's no way to validate such a reference. Results vary from wrong data printed to a crash.
In order to solve this problem, debug info header for every compilation unit has "abbrev offset" field, which points to the beginning of abbrev table part of that compilation unit. This field is always 0 for .o files produced from one source file; since there's only one compilation unit, abbreviations table starts from byte 0 of .debug_abbrev section. This abbrev_offset field is updated by corresponding relocation record when object files are linked together.
When linker is asked to generate executable or shared library, it applies this kind of relocations and resulting load object has correct abbrev_offset for each CU. When -r linker option is in effect, it is supposed to generate a file that has all relocations intact, so ld copies (updated versions of) relocations from input files into output file.
Let's take a look at this relocation record. On Solaris, for sparcv9 (64-bit) object file, it looks like this:
$ readelf -r file2.o Relocation section '.rela.debug_info' at offset 0x4a8 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000000e 000300000036 R_SPARC_UA64 0000000000000000 .debug_abbrev + 0 ...
There are more relocation records, but only one refers to section .debug_abbrev, which gives a good hint: after all, only one field in debug_info depends on knowing the "address" of debug_abbrev section. More thorough examination (or rather, calculation) involving offset = 14 (0xe) leads to the same conclusion: this relocation record updates abbrev_offset.
After file1.o and file2.o are linked together with -r linker option, combined.o would have two relocations records relative to .debug_abbrev section:
$ readelf -r combined.o | grep debug_abbrev 00000000000e 000c00000036 R_SPARC_UA64 0000000000000000 .debug_abbrev + 0 000000000136 000c00000036 R_SPARC_UA64 0000000000000000 .debug_abbrev + 3b
First one is obviously for the first
file as the offset is too small and the second one is intended to
update abbrev offset for second file. Note that it is RELA-type
relocation, relocation with addend, which in this case if 0x3b or 59.
It means that abbreviations table of the second file starts at offset
59 bytes in .debug_abbrev section. It also means that location this record is supposed to update probably contains zero (for REL-type relocations, it would contain addend - 59 in this case).
Here's how it looks from DWARF point of view (combined.o):
.debug_info section
CU file1.o, abbrev_offset=0
DW_TAG_compile_unit (code=1) ---> (points to debug_abbrev entry #1)
...
...
CU file2.o, abbrev_offset=59
DW_TAG_compile_unit (code=1) ---> (points to debug_abbrev entry #59+1=60)
...
...
.debug_abbrev section
1: DW_TAG_compile_unit (code=1) <-- part of table for file1.o starts from here
DW_AT_producer DW_FORM_strp
DW_AT_language DW_FORM_data1
...
2: DW_TAG_variable (code=2)
DW_AT_name DW_FORM_strp
DW_AT_decl_file DW_FORM_data1
...
...
60: DW_TAG_compile_unit (code=1) <-- part of table for file2.o starts from here
DW_AT_producer DW_FORM_strp
DW_AT_language DW_FORM_data1
...
On x86, relocation record is of type
REL, which means that addend is supposed to be in the location to be
modified; in other words, in abbrev_offset field. Therefore, on x86
linker writes correct offset into debug info header, making
relocation entry for debug_abbrev redundant, at least for dwarfdump. Which is why dwarfdump will always work on x86 and sparcv8.
On SPARCv9 (as well as on x64), relocation record is of type RELA, meaning that addend is stored in relocation entry itself. So when producing relocatable object file (ld -r) linker does not touch abbrev_offset field in the section, it changes relocation record for second compilation unit (file2.o) and puts correct offset into that relocation record. In order to obtain right value of abbrev_offset, one has to perform relocation first.
Recent versions of dwarfdump have built-in relocation processing for x64, sparcv9 and MIPS.
References
- DWARF standard.
- David Anderson's page, the source of dwarfdump and libdwarf.