Dhanaraj M

Tuesday Aug 25, 2009

Solaris - Tickless Kernel Architecture

In order to save power, CPUs can be kept in a low power state (only when the CPU is IDLE).
This can be achieved through Tickless kernel architecture. Hence, the clock cyclic fires
based on events rather than periodically for every tick.

For more details about the project, visit here
CR 6567390 clock efficiency optimizations ('tickless' clock)
CR 6875377clock() service decomposition

This project is divided into several tasks
# Callout / Timeout Re-implementation
    - Integrated into snv_103, tracked by CRs 6565503 (Discussed in my previous blog)
# clock() decoupled lbolt / lbolt64
# Event based historical load average implementation
# Software PLL time adjustment / NTP, tod migration
# Tick processing, Tick accounting
    - CR 6860423 thread tick accounting and time slice enforcement needs to be tickless

Monday Aug 24, 2009

How scalable is Solaris?

In the recent time, we have started seeing the customers running multi-threaded applications which use
CPUs/disks heavily. The increase in number of CPUs helps to speed up the processing time significantly.
How the OS is going to support this? Madhavan.T Venkataraman contributed two scalability projects to
Solaris community and the details are given below.

In Solaris every tick, Clock() performs the following book-keeping & accounting activities
   - Lbolt updates
   - Load average computations
   - User thread Tick accounting
   - Callouts
   - Miscellaneous activities

Tick accounting needs to be made scalable (CR 6619224)

Every tick, the following tasks are performed as part of Tick accounting
 - User thread, running on a CPU, gets charged with one tick
 - Account for CPU time usage by user thread
 - Time quantum used by a thread
 - Dispatching decisions are made using this
 - LWP interval timers (virtual/profiling timers) are processed

The clock() handler walks the CPU list and performs tick accounting.
As the number of CPUs increases, the tick accounting loop gets larger.
Since only one CPU is engaged in doing this, this is single-threaded.

In this project, CPU set is created and a cross-trap is sent to one of the CPUs in the set.
Hence, tick accounting is scheduled on multiple CPUs by the clock() handler.

Callout processing is single threaded, throttling applications that rely on scalable callouts (CR 6565503)

Callout() types are
1) Real-time callout()
     Consumers -> Interval-timers/sleep()/nanosleep()/timed-wait
2) Normal callout()
     Consumers -> Error coditions at Networking Layer/drivers

Callouts are currently processed in a single threaded manner on the clock CPU.
The more the number of CPUs in the system, the more severe the problem.
This basically causes mutex contention, throttling of applications and the
clock CPU being consumed wholly by system activity.

This projects implements per-CPU callout tables and cyclics. Furthermore, the implementation
will be event-based as opposed to polling for expired callouts every tick.

Saturday Jul 18, 2009

An Interesting debugging D-Script

In Solaris, segmap/KPM/VPM are the available interfaces to cache file system pages.
This segment driver in VM layer plays a major role and improves UFS read/write performance.

The KPM address is mapped once at the boot time and it never gets unmapped/remapped.
Hence, this avoids hat_unload() calls and reduces number of cross-calls.

6836343 - a bug in segmap_release() code. The KPM address is getting unmapped by mistake.
We came up with a d-trace script to capture the thread/stack, which is unmapping KPM address.

#dtrace -w -n hat_unload:entry'/args[1] >= `kpm_vbase && args[1] <= (`kpm_vbase + `kpm_size)/ { panic()}'

Friday Jul 17, 2009

VPM is enabled for SPARC platforms

With the following CR got integrated into SNV_113 and S10U8,
6811473 VPM interfaces should provide multiple pagelength mappings per request,
we see the following changes

1. VPM is enabled by default for SPARC platform also
2. Multi page length support improves the sendfilev performance

SPARC:

By default VPM is enabled.

1. To enable segmap/KPM, add the following in /etc/system
   set vpm_enable = 0
2. To enable segmap, add the following in /etc/system
   set kpm_enable = 0

x86:

By default segmap is enabled.

x64:

By default VPM is enabled.

1. To enable segmap/KPM, add the following in /etc/system
   set vpm_enable = 0
2. To enable segmap, add the following in /etc/system
   set kpm_desired = 0

NOTE: Till S10U7, VPM was supported for X64 platform.
             'set kpm_desired = 0' was used to disable VPM.

Friday Feb 27, 2009

Fair Share Scheduler (FSS) does not honor CPU shares to zones

When applying 1000 cpu shares to the global zone and 1 share to non global zones,
the non-global zone process 'cpuHog' gets 70-60% of the CPU use reported by prstat (BugID - 6708094)

The current implementation may not scale to this extent. The solution is to tune (reduce) QUANTUM value.

1. Redirect the output to a file
# dispadmin -c FSS -g -r 1000 > fss.conf

2. Edit/reduce QUANTUM value to 10
# vi fss.conf
 #
 # Fair Share Scheduler Configuration
 #
 RES=1000
 #
 # Time Quantum
 #
 QUANTUM=10

3. Set the new QUANTUM value
# dispadmin -c FSS -s fss.conf

Wednesday Jul 02, 2008

Tips - Vnode Page Mapping for AMD64 machines

VPM has been part of SNV code and this is going to be out with S10U6 release.
Some important tips here..

VPM - enable/disable:

The VPM is enabled by default for amd64 machines. In order to disable VPM
(use seg_map), set "kpm_desired = 0" in /etc/system.
("vpm_enable = 0" does not reflect anything - Refer: 6431013)

Performance tips:

1. The default value of maxcontig is 128. Setting a higher value of maxcontig gives
a better performance.

2. The default value of UFS_HW is 16MB. This value can also be tuned (increased) to
achieve a better performance. However, increasing ufs_HW much higher can result in
disk saturation. (The optimum value can be determined by running performance tests
and the optimum value varies between different systems and disk speeds)

Friday Feb 15, 2008

pgAdmin3 - Failed to start (BadAlloc problem)

Recently I heard from the customer that pgadmin fails to start with the information..

     BadAlloc (insufficient resources for operation)

	(Details: serial 172 error_code 11 request_code 131 minor_code 5)
(Note to programmers: normally, X errors are reported asynchronously;
that is, you will receive the error a while after causing it.
To debug your program, run it with the --sync command line
option to change this behavior. You can then get a meaningful
backtrace from your debugger if you break on the gdk_x_error() function.)

I root-caused the problem. The summary is here..

There has been a Xserver fix integrated which affects both Xsun/Xorg (sprac/x86). It was integrated into SNV_79a & S10U5-05.
Hence, 119050-36 (sparc) & 119060-35 (x86) bad patches cause the Badalloc problem.

This is fixed in SNV_84 & S10U5-09. Another solution is to upgrade to the recent 119059-40 (sparc) & 119060-40 (x86) patch level or the above.

Another workaround for X86:

1. To work around the described issues for the Xorg(1) server, the affected X Server extensions may be disabled.

    $ /usr/X11/bin/Xorg -extension MIT-SHM

2. The X Server extensions may also be disabled by editing the xorg.conf(4) file. For example, to disable the MIT-SHM extension, the following lines may be added to the xorg.conf(4) file:

    Section "Extensions" Option "MIT-SHM" "disable"  EndSection

Saturday Dec 22, 2007

pgAdmin3 integrated into Solaris

pgAdmin III is the most popular and feature rich open source administration and development platform for PostgreSQL. This has been integrated into Solaris. This is part of OpenSolaris/Solaris from SXDE (Solaris) b79 and S10U5. Alternatively, the Solaris packages for Sparc and Intel machines can be downloaded using Solaris package download link (including the recent release of pgAdmin III v1.8.0).

Tuesday Nov 27, 2007

Common issues/solutions during Solaris installation on Laptops

This blog will help to resolve issues during Solaris installation on Laptops.

1. Partitioning the hard disk
    - Any software (Belenix/Gparted/etc) can be used
    - If the windows drive is not accepting to make partitions, then windows drive might have errors. You might need to run "chkdsk /f" from the cmd line (or Boot into Recovery-mode and run CHKDSK
    - The type of Solaris partition must be "unallocated"
    - Goto cmd line & make that partition as Solaris partition (fdisk -l.... etc)
    - If any other drives need to be shared across the Solaris partition, the partition type to be assigned is "Fat 32"

2. Installing SXDE
    - If the RAM size is >= 1GB, choose Solaris-Express from the menu
    - Otherwise choose "Solaris"  and "Text and Console session" in the first and second menu respectively

3. Network detection - issue
    - After the installation, if the network is not detected, then perform this "svcadm enable nwam"

4. Mount other drives using vfstab
    -Add an entry for each of the drives in "/etc/vfstab" file

Example: 
    #device         device          mount           FS      fsck    mount   mount
    #to mount       to fsck         point           type    pass    at boot options
    /dev/dsk/c1d0p0:1       -       /win_c  pcfs    no      yes     -
    /dev/dsk/c1d0p0:2       -       /win_d  pcfs    no      yes     -

5. Audio driver - issue
    - Download & install this package
    - Download OSS driver

6. References
    - Mount issues

Wednesday Aug 29, 2007

Memory leak problem- An interesting solution

I came across a memory leak problem. It is interesting that Solaris 10 provides a very good mechanism with the help of libumem.so library and mdb debugger to solve this problem. The following steps help to identify memory leaks.

  • $export LD_PRELOAD=libumem.so
  • $export UMEM_DEBUG=default
  • $ mdb  ./myTest
  • > ::sysbp _exit
  • > ::run
  • > ::findleaks

If you find the following error message

  • "mdb: invalid command '::findleaks': unknown dcmd name"

then the workaround is

  • >::load libumem.so

Saturday May 26, 2007

OpenGrok setup - source code search and cross reference

This tool is very helpful for the developers to search, cross-reference, and navigate the source code. The following steps will explain how to setup this tool.

1. Download OpenGrok package - OSOLopengrok-0.4.pkg (same for Both sparc/x86)
    > Download link
    > pkgadd -d OSOLopengrok-0.4.pkg

2. Download Exuberant Ctags - ctags-5.6.tar.gz
    > Download link
    > Untar/configure/gmake/gmake install

3. Download  apache-TomCat  - apache-tomcat-6.0.13.zip
    > Download link 

4. Configuration steps
    > cp  /opt/OSOLopengrok/source.war   apache-tomcat-6.0.13/webapps/
    > export JAVA_HOME=/usr/java
    > mkdir INDEX
    > java -jar /opt/OSOLopengrok/opengrok.jar
       - Generate Indexes by giving the source path
    > vi apache-tomcat-6.0.13/webapps/source/WEB-INF/web.xml
        - Assign DATA_ROOT and SRC_ROOT dir variables
    > ./apache-tomcat-6.0.13/bin/startup.sh
    > http://localhost:8080/source/xref/
        - Use the above URL to navigate the source code

5. For changing the source directory
    > ./apache-tomcat-6.0.13/bin/shutdown.sh
    > rm -rf INDEX/*
    > Repeat step-4

6. Whenever the machine is restarted
    > ./apache-tomcat-6.0.13/bin/startup.sh

Wednesday Apr 11, 2007

pgAdmin packages for Solaris available

pgAdmin III is the most popular and feature rich open source administration and development platform for PostgreSQL. The current release is pgAdmin III v1.6.3 and the Solaris packages for Sparc and Intel machines can be downloaded using Solaris package download link.

Saturday Jan 20, 2007

Exploring other possiblities with D-Trace in PostgreSQL

User-level D-Trace probes in PostgreSQL  is an article written by Robert Lor. This gives a good overview on how to debug/analyse PostgreSQL code using D-Trace (Community help documents). Here, I like to add some points which are not covered already.

D-Trace scripts provide some useful information including processID, thread-local variables, PC location and easy way to analyze the performance issues in terms of time estimation, count. 

Building --enable-dtrace:

The user-level D-Trace probes have been integrated into PostgreSQL-8.2. They work only on Solaris Express (Solaris Nevada build 55 and above). The work-around on Solaris-10 is to remove static from the following functions: StartTransaction(), CommitTranaction(), and AbortTransaction()

Playing with probes:

View the existing probes and their details (provider, module, function, probe name)

> dtrace -l | grep postgres

Add additional arguments into the probes like start-transaction(int, int) and view them in the d-script using the following.

printf("ARG0:%d, ARG1:%d", arg0, arg1); 

New probes can be declared in src/backend/utils/probes.d and they can be inserted at the required location with necessary arguments.

Tracing PostgreSQL code without D-Trace probes:

When the PostgreSQL server is started, the main process waits for the connection-request from the clients. As soon as it receives a client's connection request, the new process is forked and it takes-over the upcoming requests from that particular client. In addition to the main process + N - child/client processes, there are two active processes to write the logs.

To trace the control flow including the user-defined functions and system calls, a script can be written. For tracing at the time of making connection with clients, assign main process ID as processID here (fork_process as  functionName). For tracing  the  query execution  module,  assign process ID of that particular client (pg_parse_query as functionName). Here, functionName plays an important role because the tracing will start only after the control reaches that functionName.

> dtrace -s myScript.d -p <processID>

pid$target::functionName:entry
{ self->trace = 1; }
pid$target::functionName:return
/self->trace/
{ self->trace = 0; }
pid$target:::entry,
pid$target:::return
/self->trace/
{}  

The following script helps to track only user defined functions.

> dtrace -F -s  myScript.d <processID>

pid$1:a.out::entry
{}
pid$1:a.out::return
{}

Monday Jan 08, 2007

Autoconf/Automake - Life becomes easy

"Configure/make/make install" is the way to compile and install the packages. However, manually writing the configure and makefile files is not the easy job. This task can be done very easily once the developer knows the basics of automake/autoconf.

configure.in and Makefile.am are the files to be created by the developer. Automake and autoconf should be installed. The main aim of them is to simplify the developer's task and produce the configure and makefile scripts systematically.

./autoconf -> generates configure file from configure.in/configure.ac file
./automake -> generates Makefile.in file from Makefile.am
./configure -> generates Makefile from Makefile.in (This also checks for the necessary packages, compilers, etc and then sets the compilers/flags, install location, include/library locations)

Calendar

Feeds

Search

Links

Navigation

Referrers