I may still be behind the latest Nevada build, but on my build 46 install of Solaris Nevada on my laptop, I decided to try out pgbench using the PostgreSQL binaries bundled in Solaris Nevada. My version of PostgreSQL is

 PostgreSQL 8.1.4 on i386-pc-solaris2.11, compiled by /opt/SUNWspro.40/SOS10/bin/cc -Xa

I created pgsample and started loading the database using pgbench. Zpool iostat reported peaks about 7MB/sec writes and visually it looked slow. So I opened the postgresql.conf file and added my favorite modifications to it:

wal_sync_method = fdatasync
wal_buffers = 128
checkpoint_segments = 128
bgwriter_lru_percent = 0
bgwriter_all_percent = 0

and the same load now feeled better with peaks at about 10.5MB/sec though I would say it is still slow since neither the CPU is saturated nor the IO is saturated. I used Brendan Gregg's hotuser utility which uses DTrace to sample the process to find out what's happening.

It reported the following (tail-end)

libc.so.1`strtol                                          160   3.2%
postgres`CopyFrom                                         200   4.0%
postgres`PageAddItem                                      250   5.0%
libc.so.1`memcpy                                          270   5.4%
libc.so.1`memset                                          300   6.0%
postgres`tas                                              430   8.7%
postgres`XLogInsert                                      1031  20.7%

So it seems if I improve the performance of XLogInsert, I might actually load the data faster with the inbuild postresql in Solaris Nevada. This is where Sun Studio 11 tools comes handy now that it is Free!!.

# PATH=$PATH:/opt/SUNWspro/bin
# umask 000
# dbx /usr/bin/postgres 2690
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc
Reading postgres
Reading ld.so.1
...
Attached to process 2690
stopped in _write at 0xd13464d5
0xd13464d5: _write+0x0015:      jae      _write+0x21    [ 0xd13464e1, .+0xc ]
(dbx) collector enable
(dbx) cont
Creating experiment database test.1.er ...
Reading libmp.so.2
Reading libmd.so.1
Reading libscf.so.1
Reading libuutil.so.1


execution completed, exit code is 0
(dbx) quit

Now that I have the experiment collected, time to do some reviews using er_print

# er_print test.1.er
(er_print) func
Functions sorted by metric: Exclusive User CPU Time

Excl.     Incl.      Name
User CPU  User CPU
  sec.      sec.
20.920    20.920     
 3.360     6.830     comparetup_index
 2.170     8.690     tuplesort_heap_siftup
 1.610     2.110     XLogInsert
 1.290     2.410     myFunctionCall2
 1.210     3.630     inlineApplySortFunction

Since I had continued the experiment till the end, it also collected information on building indexes at the end which shows up at the top. However XLogInsert still shows up out there which needs to be investigated.

Had it been a debug build, I could have just used the lines command to point to the linenumber in the source code file.

(er_print) lines
Objects sorted by metric: Exclusive User CPU Time

Excl.     Incl.      Name
User CPU  User CPU
  sec.      sec.
20.920    20.920     Total
 3.360     6.830     Function: comparetup_index, instructions from source file tuplesort.c
 2.170     8.690     Function: tuplesort_heap_siftup, instructions from source file tuplesort.c
 1.610     2.110     Function: XLogInsert, instructions without line numbers
 1.290     2.410     Function: myFunctionCall2, instructions from source file tuplesort.c
 1.210     3.630     Function: inlineApplySortFunction, instructions from source file tuplesort.c

Hence I used the disassembly feature to see if there are quick hotspots. Sun Studio tools marks the hotspots with "##" so it becomes easier to notice them.

(er_print) dis XLogInsert
Source file: (unknown)
Object file: /usr/bin/postgres
Load Object: /usr/bin/postgres

   Excl.     Incl.
   User CPU  User CPU
    sec.      sec.
                                
   0.010     0.010              [?]  80a5ab0:  pushl   %ebp
   0.        0.                 [?]  80a5ab1:  movl    %esp,%ebp
   0.        0.                 [?]  80a5ab3:  subl    $0x1c0,%esp
   0.        0.                 [?]  80a5ab9:  andl    $-0x10,%esp
.....

   0.        0.                 [?]  80a5bb8:  call    XLogCheckBuffer [ 0x80a64a1, .+0x8e9 ]

.....

   0.        0.                 [?]  80a5c3a:  testl   %eax,%eax
   0.        0.                 [?]  80a5c3c:  jbe     .+0x65 [ 0x80a5ca1 ]
   0.070     0.070              [?]  80a5c3e:  movl    %esi,%eax
   0.030     0.030              [?]  80a5c40:  shrl    $0x18,%eax
   0.060     0.060              [?]  80a5c43:  movzbl  (%ecx),%edi
   0.030     0.030              [?]  80a5c46:  xorl    %edi,%eax
   0.020     0.020              [?]  80a5c48:  movzbl  %al,%eax
   0.210     0.210              [?]  80a5c4b:  incl    %ecx
   0.        0.                 [?]  80a5c4c:  shll    $8,%esi
   0.        0.                 [?]  80a5c4f:  xorl    0x82499b0(,%eax,4),%esi
## 0.360     0.360              [?]  80a5c56:  movl    %edx,%eax
   0.        0.                 [?]  80a5c58:  decl    %eax
   0.        0.                 [?]  80a5c59:  movl    %edx,%edi
   0.050     0.050              [?]  80a5c5b:  movl    %eax,%edx
   0.        0.                 [?]  80a5c5d:  testl   %edi,%edi


.....
  0.        0.                 [?]  80a5dfc:  call    elog_start [ 0x81f97ab, .+0x1539af ]
.....



   0.        0.                 [?]  80a6499:  popl    %esi
   0.        0.                 [?]  80a649a:  popl    %ebx
   0.        0.                 [?]  80a649b:  movl    %ebp,%esp
   0.        0.                 [?]  80a649d:  popl    %ebp
   0.        0.                 [?]  80a649e:  ret     $4
(er_print)

So based on the disassembly it seems somewhere between the call to XLogCheckBuffer and elog_start there is a bottleneck.

Well maybe one of the Postgresql hacker can use this information and figure out and modify it so that it is no longer a bottleneck. It looks like a loop and could use some loop optimizations.

Similarly we can use it to debug and find hotspots in other functions also

(er_print) dis comparetup_index
Source file: tuplesort.c
Object file: /usr/bin/postgres
Load Object: /usr/bin/postgres

   Excl.     Incl.
   User CPU  User CPU
    sec.      sec.
                                Function: comparetup_index
   0.130     0.130              [?]  820ef9d:  pushl   %ebp
   0.        0.                 [?]  820ef9e:  movl    %esp,%ebp
   0.030     0.030              [?]  820efa0:  subl    $0x20,%esp
   0.        0.                 [?]  820efa3:  andl    $-0x10,%esp
   0.        0.                 [?]  820efa6:  pushl   %ebx


........

   0.        0.                 [?]  820f15b:  call    nocache_index_getattr [ 0x8080ec8, .-0x18e293 ]
   0.        0.                 [?]  820f160:  addl    $0x10,%esp
   0.110     0.110              [?]  820f163:  movsbl  -4(%ebp),%ecx
   0.        0.                 [?]  820f167:  pushl   %ecx
   0.        0.                 [?]  820f168:  pushl   %eax
   0.030     0.030              [?]  820f169:  movsbl  -8(%ebp),%eax
   0.        0.                 [?]  820f16d:  pushl   %eax
   0.        0.                 [?]  820f16e:  movl    -0x10(%ebp),%eax
   0.040     0.040              [?]  820f171:  pushl   %eax
   0.        0.                 [?]  820f172:  pushl   $2
   0.030     0.030              [?]  820f174:  leal    0xc(%edi),%eax
   0.        0.                 [?]  820f177:  pushl   %eax
## 0.        3.490              [?]  820f178:  call    inlineApplySortFunction [ 0x820e9e0, .-0x798 ]
   0.150     0.150              [?]  820f17d:  addl    $0x18,%esp
   0.        0.                 [?]  820f180:  testl   %eax,%eax
   0.110     0.110              [?]  820f182:  jne     .+0xe0 [ 0x820f262 ]
   0.        0.                 [?]  820f188:  cmpb    $0,-8(%ebp)
   0.        0.                 [?]  820f18c:  setne   %al
...

Looks like index building also has a bottleneck in inlineApplySortFunction.

Comments:

Well, a quick Google search for XlogInsert immediately finds a thread with the subject "Cost of XLogInsert CRC calculations" :)

In other words: it looks like XLogInsert calculates a CRC of the data being inserted. Yeah, that's gonna cost.

Posted by Nico on September 18, 2006 at 05:04 PM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by Jignesh Shah