Saurabh Mishra's Weblog

20080813 Wednesday August 13, 2008

4 Days backpack in Glacier National Park, Montana

4 Day and 3 Night backpack in Glacier National Park turned out to be one the best backpacks I did in US. The route taken was : Appekuny Falls Trailhead -> Poia Lake Campground -> Many Glacier -> Swiftcurrent Pass-> Granite Park Campground -> Logan Pass which was done over 4 days of backpack with 10 miles round-trip day-hike to Ice Berg Lake in Many Glacier area. We, four of us and including an 11 yrs old kid, successfully managed to backpack in one of the most beautiful part of Glacier National Park.  I had to use park & private shuttle to get back to Many Glacier since we didn't end the backpack at the same place. We didn't see a Grizzly Bear but I manage to see a Black Bear. I really liked the meadows and reminded me of Indian Himalayan backpacks.

Statistics were : 28.7 miles of backpack and 10 miles of day hike. 

(2008-08-13 19:06:06.0) Permalink

20080727 Sunday July 27, 2008

Clouds Rest and Half Dome, Yosemite National Park

Half Dome hike is one of the most famous hike in the bay area not just because of the cables but also because you get to see the birds eye-view of Yosemite Valley and other Domes. We, a group of 11 people, started the hike at 5.45 am from Tenaya Lake. The distance was long and we were little apprehensive about reaching the cables on time mainly to avoid the rush. Without thinking twice we were mostly speeding towards half dome with a stop at Clouds Rest peak (9,900 feet). The Clouds Rest has fantastic views of Yosemite's high country and walking on a tiny cliff made it more fun. We ended up reaching the Cables at 11.00 am and were happy to see not-so-busy traffic on the cables. I picked a pair of gloves and with in half an hour we were at the top of Half Dome. We took rest and had lunch up there with usual photography session. I was not so surprised to see quite a few climbing/making their way to the cables at around 1.30 pm. On the way back, we took the Valley Trail to Yosemite Valley and the trail took us through Vernal and Nevada Water Falls. I always had not-so-good impression about Yosemite National Park but now I do believe that it's a nice park indeed.

Stats : 21.2 miles from Tenaya Lake to Yosemite Valley via Clouds Rest and Half Dome. Half Dome  is around 13 miles from Tenaya Lake. We were little lucky you see a Brown Bear next to the creek.


(2008-07-27 20:27:31.0) Permalink Comments [1]

20080707 Monday July 07, 2008

Backpack in Redwood State Park

Photographs of 16 miles round trip backpack in Redwood State Park. We started our hike from visitor of Prairie Creek Redwoods State Park and followed the West Ridge Trail camping at Ossagon trail wilderness campground. On the way back we took the coastal trail and had rather scary encounter with wild Elk and luckily it was a female Elk and didn't attack four of us.

(2008-07-07 20:49:30.0) Permalink

20080624 Tuesday June 24, 2008

Summit climb to Mt. Whitney

On 22nd June 2008, we (four of us) were able to summit the heights of Mt. Whitney which is the tallest mountain in the lower 48 states of the United States (US) and falls under Eastern Sierra Nevada Mountain Range of Sequoia National Park.

The day hike which is 22 miles round-trip was started at 1.40 am as there were thunder storms predicted by rangers. We somehow got up at 1.00 am but didn't get any sleep. From the fact that we were suppose to climb Mt Whitney in a day with 6000 feet altitude gain/loss, it kept us awake whole night. In order to gain some ground, we did a practice hike in the near-by area of Mt.Whitney and scaled 11,000 feet & 8 miles round-trip on 21st June. That turned out to be a good warm-up.

Since we got an early start, I was expecting to reach by 8-9 am but I was struggling. I'm yet to figure out why I struggled at high-altitude which delayed the summit climb by more than two hours. Its not that I have never hiked on higher altitudes but it seems I 'm slowly losing the moment. I  finally reached the top of Mt. Whitney mountain at 10.35 am which were below my expectation and it did disappoint me a lot. The hike in a day is a very-very strenuous and it took everything out of me. On a strenuous hike like this you must have lot of patience to climb since it can take longer than you expect.

Anyway...time to share photographs and the camera added extra 4 pounds on the pack.

(2008-06-24 21:38:59.0) Permalink

20080608 Sunday June 08, 2008

Costa Rica Adventure...

We, six of us, went to Costa Rica - a small country in Central America which is famous for its rain forest and wildlife. This trip of mine was so spectacular that it is forcing me to write a small diary about the trip.  With lots of confusion and asking around, I first fixed up the places to visit in Costa Rica and in a specific order -- Rincon de la Vieja National Park, Monteverde Cloud Forest Reserve, and Tortuguero National Park. Arenal Volcano National park was not in the plan but we got two more days in Costa Rica due to heavy rains and tropical storm which stranded us in Manual Antonio National Park for two nights. Everything in Costa Rica was good and especially food. In fact we also got to eat sweet Mangoes.

We spent first two days in Rincon de la Vieja National Park with one full day reserved for hikes in the national park and the other day for Canyon ZiplineSteam Sauna/mud-bath/hot-springs and White Water River Tubing. The most spectacular part was tubing and zipline over the canyon with a bit of rock-climbing and lots of other adventure. The only bad part of zipline was it didn't last long (only an hour). The place where we stayed and had organized all this was Hacienda Guachipelin.

The next place on our agenda was Monteverde Cloud Forest Reserve famous for cloud & rain forest and wildlife. Unfortunately we didn't get to see much wildlife here other than the birds which were hiding from us most of the time. The museum and hanging bridge at Selvatura Park was great though.

We were little disappointed from Monteverde visit since we didn't spot frogs and were hoping that Tortuguero National Park will not disappoint us. We started our journey at 6.30 am from San Jose to a small town called CaƱo Blanco (3 hours bus ride). The bus ride goes through Braulio Carrillo National Park and the boat ride from the deck of Cano Blanco is just one hour 45 minutes. Guess that...the journey to Tortuguero National Park did not disappoint us at all. The boat ride and wildlife we saw (mainly birds) were great. We reached the resort Laguna Lodge at around 1.30 pm and welcomed by the staff. After having lunch we went to Tortuguero Village and the Museum there. We also went to the beach and the beach here is on Caribbean side (Atlantic Ocean). The most interesting part of the whole trip was the boat safari which was planned for tomorrow's morning. In fact we saw most of the wildlife in this part of Costa Rica. We got up early, had breakfast and started the safari by spotting Cayman alligators, Howler & Spider Monkeys, birds, Frogs including poisonous ones.  We also went for a hike which gave us a birds eye view of Tortuguero Canals. Unfortunately we couldn't spot Turtles because month of May is not their breading season. During our stay at Laguna Lodge for two nights and three days, we enjoyed the most.

We had only one day left before we were suppose to return to US. After a lot of confusion, we rented a car and drove to Manual Antonio National Park at night. The drive on Costa Rica's single lane highways reminded me of Indian road conditions and fun I used to have back home. Anyway, without knowing that a tropical storm Alma (Hurricane Class-I) is on the way and beginning to strengthen, we drove at night while it was raining. When we got up next morning and we immediately decided that we should head back because the rain and wind had picked up. By 9.30 am we were ready and hoped that rain will slow down but it didn't..it kept strengthening. We decided to start but we could hardly go 22 kms before police stopped us and asked us to return back since the roads were flooding and flood water had started going onto the firm lands and houses. The devastation being done by the storm looked very bad. We decided to stay at Quepos (a small town before the National Park). We knew that we will miss the flight next day and locals told us it may be two days before the roads open again.  The next day all of a sudden weather cleared up and we could start early morning to make alternative arrangements. The return flight we got is on 2nd June and today is 30th May. So we got two more days and we decided to go to Arenal (La Fortuna town) by local public bus from San Jose. Without knowing Spanish, we manage to find the bus station and board the right bus but the bus stopped at all the places and whenever someone was willing to board the bus, it stopped as people waved their hands towards the bus. Anyway, we got to Arenal in 5 hours of slow and steady public bus ride. Since we needed a car, I rented a manual 4x4 SUV for some off-roading to Arenal Observatory and Arenal Volcano National Park. We couldn't see the active lava at night/evening and missed by an hour which disappointed us a lot. Anyway, the trip to Arenal was unplanned but was beautiful in every aspect.

And yeah...I forgot to mention that I was stopped for over speeding in Costa Rica...I was doing 80 kms when posted speed limit was 60 kms. The cop was very understanding and after repeated pledging (sorry sir), I was lucky to get away with just a warning... 

Lets look at the photographs I took...

Fauna of Costa Rica

Flora of Costa Rica



Rain Forests of Costa Rica



People and Landscape of Costa Rica

(2008-06-08 09:51:03.0) Permalink

20080425 Friday April 25, 2008

Photographs of a recent backpack and trip
22 miles backpack to Sykes hot springs in Big Sur


A trip to Kings Canyon National Park.
(2008-04-25 19:42:36.0) Permalink Comments [1]

20080419 Saturday April 19, 2008

Install-Time-Update (ITU) and Driver Binding in Solaris

If you ever wonder how to create install time driver updates for Solaris 10 and Nevada, then you may want to read this blog entry as it involves few tricks here and there.  There are two ways to make your device work with Solaris. The install-time-update (aka ITU DU or ITU diskette) is only required for the case where the disk drive will become the Solaris boot drive. For all other case, you should be able to generate a package and run pkgadd(1m) command to install the driver package on running Solaris.

ITU Method

In order to install Solaris onto a bootable drive supported by your driver, you can use an Install Time Update (ITU). The ITU must have your driver (both 32-bit and 64-bit binaries) and PCI-IDs of the device your driver supports.

How to construct an ITU

  • Make sure you have Solaris 10 and Nevada binaries of yours driver for both the 32-bit and 64-bit Operating System and the your_driver.conf (driver configuration) file. You should get the pkg_drv(1m) command by installing the SUNWpkgd package from this link

    In order to create an ITU for Solaris 10 and Nevada, you would want to create two directories and run pkg_drv(1m) there.

For Solaris 10

# mkdir -p /var/tmp/your_driver.5.10
# cd /var/tmp/your_driver.5.10

Copy your driver and your_driver.conf file in the current directory.

# mkdir -p kernel/drv/amd64
# cp <32-bit binary of your driver> .
# cp <32-bit binary of your driver> kernel/drv/your_driver
# cp <64-bit binary of your driver> kernel/drv/amd64
# cp your_driver.conf .
# pkg_drv -i '"pciVVVV,DDDD.SSSS.ssss"' -o `pwd`/PKG -c scsi -r 5.10 your_driver

VVVV = Vendor-id
DDDD = Device-id
SSSSS = Subsystem-vendor-id
ssss = Subsystem-device-id
PKG = your_driver.
'-c scsi' is for device class and in this example we have been discussing about disk drive.

The output of the pkg_drv(1m) will resemble the output below :-

input file: drv=your_driver
input file: conf=your_driver.conf
WARNING: pkg_drv: pkg/driver name exists in /etc/driver_aliases
Suggested Package Naming Conventions: 8 characters, with the first capitalized characters uniquely specifying the company (e.g. stock market ticker). The remaining characters specify the driver (e.g. SUNWcadd for a CAD driver from Sun Microsystems). The driver name must be unique across all Solaris platforms and releases.

## Building pkgmap from package prototype file.
## Processing pkginfo file.
## Attempting to volumize 8 entries in pkgmap.
part 1 -- 276 blocks, 29 entries
## Packaging one part.
/tmp/12546/PKG/pkgmap
/tmp/12546/PKG/pkginfo
/tmp/12546/PKG/reloc/boot/solaris/devicedb/master
/tmp/12546/PKG/install/copyright
/tmp/12546/PKG/install/depend
/tmp/12546/PKG/install/i.master
/tmp/12546/PKG/reloc/kernel/drv/your_driver
/tmp/12546/PKG/reloc/kernel/drv/your_driver.conf
/tmp/12546/PKG/install/postinstall
/tmp/12546/PKG/install/postremove
/tmp/12546/PKG/install/r.master
## Validating control scripts.
## Packaging complete.
output pkg: See package directory PKG in /tmp/12546
pkg_drv: 2 warnings 0 errors


bash-3.2# find /tmp/12546
/tmp/12546
/tmp/12546/PKG
/tmp/12546/PKG/pkgmap
/tmp/12546/PKG/pkginfo
/tmp/12546/PKG/reloc
/tmp/12546/PKG/reloc/boot
/tmp/12546/PKG/reloc/boot/solaris
/tmp/12546/PKG/reloc/boot/solaris/devicedb
/tmp/12546/PKG/reloc/boot/solaris/devicedb/master
/tmp/12546/PKG/reloc/kernel
/tmp/12546/PKG/reloc/kernel/drv
/tmp/12546/PKG/reloc/kernel/drv/your_driver
/tmp/12546/PKG/reloc/kernel/drv/your_driver.conf
/tmp/12546/PKG/install
/tmp/12546/PKG/install/copyright
/tmp/12546/PKG/install/depend
/tmp/12546/PKG/install/i.master
/tmp/12546/PKG/install/postinstall
/tmp/12546/PKG/install/postremove
/tmp/12546/PKG/install/r.master

Copy the following files from '/tmp/12546' as follows :-

# cd /var/tmp/your_driver.5.10
# cp /tmp/12546/PKG/pkgmap .
# cp /tmp/12546/PKG/install/postinstall .
# cp /tmp/12546/PKG/install/postremove .
# cp /tmp/12546/PKG/install/copyright .

You can run 'pkgproto' command or make a prototype file manually :

bash-3.2# cat > prototype
i copyright
i postremove
i postinstall
i pkginfo
d none kernel 0755 root sys
d none kernel/drv 0755 root sys
d none kernel/drv/amd64 0755 root sys
f none kernel/drv/amd64/your_driver 0644 root sys
f none kernel/drv/your_driver 0644 root sys
f none kernel/drv/your_driver.conf 0644 root sys

Make sure you include both the 32-bit and 64-bit binaries of your driver. Once this is completed, we will construct the package again to include 64-bit binary of the driver.

# cd /var/tmp/your_driver.5.10
# pkgmk -r . -d /tmp

This will create '/tmp/PKG' directory under /tmp and that's where the package is. For example :-

bash-3.2# pkgmk -r . -d /tmp
## Building pkgmap from package prototype file.
## Processing pkginfo file.
## Attempting to volumize 6 entries in pkgmap.
part 1 -- 444 blocks, 23 entries
## Packaging one part.
/tmp/PKG/pkgmap
/tmp/PKG/pkginfo
/tmp/PKG/install/copyright
/tmp/PKG/reloc/kernel/drv/amd64/your_driver
/tmp/PKG/reloc/kernel/drv/your_driver
/tmp/PKG/reloc/kernel/drv/your_driver.conf
/tmp/PKG/install/postinstall
/tmp/PKG/install/postremove
## Validating control scripts.
## Packaging complete.
bash-3.2#

Do following things to repack package in DU (Diskette) :-

# cd /tmp
# find PKG -print | cpio -o > /tmp/pkg_of_your_driver
# compress /tmp/pkg_of_your_driver
# cd /var/tmp/your_driver.5.10/PKG
# cp /tmp/pkg_of_your_driver.Z PKG/DU/sol_210/i86pc/Product/your_driver.Z

For Solaris Neavda

Repeat the same steps as we did for Solaris 10 except for following things :-

  • Create a new directory '/var/tmp/your_driver.5.11' since you are working on Solaris Nevada. Make sure pkg_drv(1m) command run with '-r 5.11'.

  • When copying your_driver.Z copy to DU, make sure you change the path to 'sol_211' in ' PKG/DU/sol_210/i86pc/Product/your_driver.Z'.

Once you have created ITU for Solaris 10 and Nevada, we will bundle them in one DVD/CD (or ISO file). In the directories '/var/tmp/your_driver.5.11' and '/var/tmp/your_driver.5.10', you will find a directory called 'PKG'. You must copy the files under 'PKG' to one directory in order to bundle them together.

# mkdir -p /var/tmp/YOUR_DRIVER-DU
# cd /var/tmp/YOUR_DRIVER-DU
# cp -rf /var/tmp/your_driver.5.11/PKG/* .
# cp -rf /var/tmp/your_driver.5.10/PKG/* .


Please run the following command to make an ISO file from the directory /var/tmp/YOUR_DRIVER-DU :

# mkisofs -o your_driver.iso -r /var/tmp/YOUR_DRIVER-DU

This will create an ISO file 'your_driver.iso' and a DVD/CD can be burned by running the following command line at the prompt :-

# cdrw -i /var/tmp/YOUR_DRIVER-DU/your_driver.iso

In order to install Solaris on boot drives, you use Solaris Installer DVD and choose option '5' (Apply Driver Updates)'. Kindly follow the instructions when prompted.

The other way is to bundle the device driver in Solaris bootable media itself or for network installation. Kindly follow the instructions described at this link.At the above link, it describes how to pack/unpack Solaris miniroot in order to make changes to Solaris bootable media.

Driver Binding in Solaris

Driver binding in Solaris is not so easy to understand. The way Solaris binds a driver is based on the precedence.  This precedence list is maintained in the 'compatible' property of the device driver.  The two functions which are responsible for creating 'compatible' property and finding the correct binding for the driver are - add_compatible() and ddi_compatible_driver_major() respectively.

The responsibility of add_compatible() function is to create 'compatible property' for driver binding in the order described below. For PCI Card, the precedence is created as follows :-

 *   pciVVVV,DDDD.SSSS.ssss.RR   (0)
 *   pciVVVV,DDDD.SSSS.ssss         (1)
 *   pciSSSS,ssss                                   (2)
 *   pciVVVV,DDDD.RR                    (3)
 *   pciVVVV,DDDD                          (4)
 *   pciclass,CCSSPP                            (5)
 *   pciclass,CCSS                                (6)

For PCI Express card, the precedence will look like this :

 *   pciexVVVV,DDDD.SSSS.ssss.RR   (0)
 *   pciexVVVV,DDDD.SSSS.ssss         (1)
 *   pciexVVVV,DDDD.RR                    (2)
 *   pciexVVVV,DDDD                          (3)
 *   pciexclass,CCSSPP                            (4)
 *   pciexclass,CCSS                                (5)
 *   pciVVVV,DDDD.SSSS.ssss.RR     (6)
 *   pciVVVV,DDDD.SSSS.ssss            (7)
 *   pciSSSS,ssss                                      (8)
 *   pciVVVV,DDDD.RR                       (9)
 *   pciVVVV,DDDD                             (10)
 *   pciclass,CCSSPP                               (11)
 *   pciclass,CCSS                                   (12)

RR = Revision number
CC = Class code
(0) = being the highest precedence
(12) = being the least precedence.

You can get the 'compatible' property by running 'prtconf -vp' command. If the Solaris fails to find a binding using 'compatible' property, then it tries by 'nodename' and the 'nodename' is constructed from Subsystem-vendor-id (SSSS) and Subsystem-device-id (ssss) of the device. The PCI-ID which we have been seeing here is embedded in the PCI Config space of the device.

Device Drivers and device firmware must make sure that the proper PCI-IDs are chosen to avoid conflict with existing PCI-IDs. If your device is PCI-Express based card, then you must add 'pciexVVVV,DDDD.SSSS' like PCI-IDs in /etc/driver_aliases or via add_drv(1m) or pkg_drv(1m) command.

(2008-04-19 13:24:57.0) Permalink

20080319 Wednesday March 19, 2008

Desert Wildflowers
These photographs were taken at Joshua Tree National Park and Anza Borrego State Park in Southern California.

(2008-03-19 20:32:26.0) Permalink

20080306 Thursday March 06, 2008

Solaris APIC implementation with respect to MSI/MSI-x interrupts
Here's some basic information on APIC before we dive into Solaris details and if you want more detail on APIC then you can refer to this Wiki.  Solaris details are based on Solaris Neavda Build 84.

What's Local APIC 

Local APIC (LAPIC) is part of the CPU chip and it contains (a) mechanism for generating/accepting interrupts (b) a timer (c) manages all external interrupts for the processor and (d) accept and generate inter-processor-interrupts (IPIs).

What's IOAPIC

This is a separate chip that is wired to the local APIC so that it can forward interrupts to the appropriate CPU (and to local APIC). 

What's Local APIC Table 

Interrupt vectors are numbered 0x00 through 0xFF in APIC and 0x00...0x1F are reserved for exceptions. The interrupt vectors in the range 0x20...0xFF are available for programming the interrupts in APIC. Like the Local APIC's, the IOAPIC will assign a priority to the interrupt based on the vector number and and it uses 4 top bits of the vector number to distinguish priority and ignores the lower 4 bits. For example if the vector number is 0x3F then the priority would be 0x3. In Solaris, this priority mask is represented by APIC_IPL_MASK (0xF0) and the vector mask is represented by APIC_VECTOR_MASK (0x0F).  

Since we can't use vector range from 0x00...0x1F, Solaris represents APIC_BASE_VECT (0x20) as the base vector and  APIC_MAX_VECTOR (0xFF) being the maximum number of vectors in the local APIC. APIC_AVAIL_VECTOR is calculated based on this formula :-

APIC_MAX_VECTOR+1-APIC_BASE_VECT  and it translates to (0xFF+1-0x20) which is 224 vectors in decimal.

Note that vectors are grouped in 16 priority groups and each group has 0x10 number of vectors. These 16 vectors share the same priority.

APIC Data Structures in Solaris

Here is the big picture on how the various APIC data structures are related to each other. These data structures are described below :-




apic_irq_table[] - Holds all IRQ entires. Each entry is of type apic_irq_t and total size of the table is APIC_MAX_VECTOR + 1. Note that IRQ has no meaning with respect to MSI/MSI-x.

A typical apic_irq_t entry in the apic_ira_table[] looks like this :-

> ::interrupts
IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# ISR(s)
22   0x61 6   PCI    Lvl Fixed  1   2     0x0/0x16  bge_intr, ata_intr

> apic_irq_table+(0t22*8)/J
apic_irq_table+0xb0:            fffffffec10d7f38

> fffffffec10d7f38::print apic_irq_t
{
    airq_mps_intr_index = 0xfffd
    airq_intin_no = 0x16                 // set since it's FIXED type interrupt.
    airq_ioapicindex = 0
    airq_dip = 0xfffffffec01fd9c0    // dev info
    airq_major = 0xca
    airq_rdt_entry = 0xa061
    airq_cpu = 0x1
    airq_temp_cpu = 0x1
    airq_vector = 0x61    // note that it matches with ::interrupts output
    airq_share = 0x2       // two interrupts are sharing the same IRQ and vector
    airq_share_id = 0
    airq_ipl = 0x6         // IPL
    airq_iflag = {
        intr_po = 0x3
        intr_el = 0x3
        bustype = 0xd
    }
    airq_origirq = 0xa
    airq_busy = 0
    airq_next = 0
}
> 0xfffffffec01fd9c0::print 'struct dev_info' ! grep name
    devi_binding_name = 0xfffffffec01fcf88 "pci-ide"
    devi_node_name = 0xfffffffec01fcf88 "pci-ide"
    devi_compat_names = 0xfffffffec0206940 "pci1002,4379.1025.10a.80"
    devi_rebinding_name = 0
>

apic_ipltopri[]This array holds Solaris IPL priority to APIC priority. For example :-

> apic_ipltopri::print
[ 0x10, 0x20, 0x20, 0x20, 0x30, 0x50, 0x70, 0x80, 0x80, 0x80, 0x90, 0xa0, 0xb0, 0xc0, 0xd0,
0xf0, 0 ]
>

Note the order of priority assignment. Higher vector numbers are being assigned to higher IPL. Also note that 0x20 is given to index 1,2,3 which means that IPL 1,2,3 share the same vector range 0x20...0x2F.

And apic_ipltopri[] is declared as :- 

uchar_t apic_ipltopri[MAXIPL + 1];      /* unix ipl to apic pri */

apic_vectortoipl[] - This array is a bit complex. The main purpose of this array is to initialize apic_ipltopri[] array.

apic_init()
{
        [.]
        apic_ipltopri[0] = APIC_VECTOR_PER_IPL; /* leave 0 for idle */
        for (i = 0; i < (APIC_AVAIL_VECTOR / APIC_VECTOR_PER_IPL); i++) {
                if ((i < ((APIC_AVAIL_VECTOR / APIC_VECTOR_PER_IPL) - 1)) &&
                    (apic_vectortoipl[i + 1] == apic_vectortoipl[i]))
                        /* get to highest vector at the same ipl */
                        continue;
                for (; j <= apic_vectortoipl[i]; j++) {
                        apic_ipltopri[j] = (i << APIC_IPL_SHIFT) +
                            APIC_BASE_VECT;
                }
        }

        [.]

}

uchar_t apic_vectortoipl[APIC_AVAIL_VECTOR / APIC_VECTOR_PER_IPL] = {
        3, 4, 5, 5, 6, 6, 9, 10, 11, 12, 13, 14, 15, 15
};

Note that IPL 5  share vector range 0x40...0x5F (or 0x20...0x3F for optimization) and that's why vector index 2 and 3 have IPL 5. Similarly vector index 4,5 have IPL 6 (0x40...0x5F or 0x60...to 0x7F).

 *      IPL             Vector range.           as passed to intr_enter
 *      0               none.
 *      1,2,3           0x20-0x2f               0x0-0xf
 *      4               0x30-0x3f               0x10-0x1f
 *      5               0x40-0x5f               0x20-0x3f
 *      6               0x60-0x7f               0x40-0x5f
 *      7,8,9           0x80-0x8f               0x60-0x6f
 *      10              0x90-0x9f               0x70-0x7f
 *      11              0xa0-0xaf               0x80-0x8f
 *      ...             ...
 *      15              0xe0-0xef               0xc0-0xcf
 *      15              0xf0-0xff               0xd0-0xdf
 */

apic_vector_to_irq[] - This array holds IRQ number given the vector number. If an element of this array contains APIC_RESV_IRQ (0xFE) then it means that the vector is free and can be allocated. apic_navail_vector() function checks this array to figure out how many vectors are available.

Here an example on how IPL to vector priority is mapped in Solaris :-

Lets say we got network interrupt at IPL 6  (ath - wifi interrupt) having vector number 0x60 (as shown above in the ::interrupt output).  Now Solaris will block all interrupts at and below IPL 6 which is done by apic_intr_enter() function. In this function, the caller actually subtracts 0x20 (APIC_BASE_VECT) from the vector number. Anyway, this is done for optimization but lets come to the point - apic_ipls[] array is used to get to the IPL which will be programmed in the APIC register. So we first get nipl as

         nipl = apic_ipls[vector];      // vector is 0x40 not 0x60 as mentioned above and nipl will be 0x6
        *vectorp = irq = apic_vector_to_irq[vector + APIC_BASE_VECT];      // This is done to get actual vector and irq.

and then this statement blocks all the interrupts at and below the vector priority (or IPL).

        apicadr[APIC_TASK_REG] = apic_ipltopri[nipl];

So we write 0x70 to APIC task register to block interrupts. Note that Solaris uses range 0x60...0x7F for IPL 6 :-

*      IPL          Vector range.           as passed to apic_intr_enter()
*      6               0x60-0x7f               0x40-0x5f

and it does not matter whether you write 0x70 or 0x7F as they all do the same work which is block interrupts at IPL 6 or below.

Solaris x86 Interrupt Handling 

Now that we have glimpsed through the data structures involved, lets look at how Solaris x86 handles Interrupt. I prefer to describe interrupt handling before describing how interrupts are allocated because I felt interrupt handling is easier to understand.

Lets first go through how Solaris x86  is designed in terms of psm ops.  For example, PCI express has its own  psm ops which is apic_ops and PCI has its own psm_ops which is uppc_ops. In fact xVM (Zen based hypervisor) has its own psm_ops called xen_psm_ops. It is psm_install() who is responsible for installing psm in Solaris x86 world.

apic_probe_common() is what gets called when psm_install() jumps into psm_probe() for each psm_ops. apic_probe_common() does many things and one of them being mapping 'apicadr[]' (you would have seen this before; I referred it for setting APIC priority i.e task register). apic_cpus[] array also gets initialized by ACPI i.e acpi_probe() because ACPI tables have all the information like local apic cpu id, version etc.

Now lets see what happens when local APIC generates an interrupt. The interrupt could come from IOAPIC or MSI/MSI-x based generated interrupt (in-band message). Solaris calls cmnint() or _interrupt(). These are same and call do_interrupt() once regs is setup. do_interrupt() will first set the PIL so that CPU does not get any interrupt at or below PIL. Raising the priority of CPU is done using setlvl pointer to function. This pointer gets set to appropriate psm_ops's psm_intr_enter and in our case it will be apic_intr_enter(). Now comes the dispatching interrupt part which is done by calling switch_sp_and_call() once the stack of interrupt thread is setup. Recall that Solaris handles interrupts in thread context if PIL is at or below LOCK_LEVEL (0xa). High level interrupts (0xa...0xf) are handled in current thread's stack.

switch_sp_and_call() can dispatch three type of interrupts -- (a) software interrupts (b) high level interrupts and (c) normal device interrupts.

In our example, we have been looking at wifi interrupt and it will be (c) which maps to dispatch_hardint() routine. dispatch_hardint() calls av_dispatch_autovect() after enabling interrupts. Now that we are touching av_dispatch_autovect() routine, I must explain what is autovect[] array. If you remember add_avintr() which is responsible for registering a hardware interrupt handler then I think you can skip this part. autovect[] has MAX_VECT (256) elements and each element is of type 'struct av_head'. The first pointer in 'struct av_head' points to 'struct autovec' and autovec structure will have all the information about interrupt handler, arguments passed to interrupt handler, priority level etc. Note that more than one interrupt handler can share the same vector and they are linked by 'av_link' in 'struct autovec'. For example :-

> ::interrupts
IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# ISR(s)
22   0x61 6   PCI    Lvl Fixed  1   2     0x0/0x16  bge_intr, ata_intr

> ::sizeof 'struct av_head'
sizeof (struct av_head) = 0x10

> autovect+(0x10*0t22)=J                 // Take the IRQ and index into autovect[] array.
                fffffffffbc52ba0

> fffffffffbc52ba0::print 'struct av_head'
{
    avh_link = 0xfffffffec50d2cc0
    avh_hi_pri = 0x6        // take a look at bge_intr() and its priority below
    avh_lo_pri = 0x5        // take a look at ata_inr() and its priority below
}

> 0xfffffffec50d2cc0::print 'struct autovec'
{
    av_link = 0xfffffffec10d2f40
    av_vector = bge_intr
    av_intarg1 = 0xfffffffec50d5000
    av_intarg2 = 0
    av_ticksp = 0xfffffffec506ae20
    av_prilevel = 0x6
    av_intr_id = 0xfffffffec537a078
    av_dip = 0xfffffffec01f8400
}

> 0xfffffffec10d2f40::print 'struct autovec'
{
    av_link = 0
    av_vector = ata_intr
    av_intarg1 = 0xfffffffec00bc8c0
    av_intarg2 = 0
    av_ticksp = 0xfffffffec0528898
    av_prilevel = 0x5
    av_intr_id = 0xfffffffec10cbe78
    av_dip = 0xfffffffec01fd9c0
}
>

Here's an example which we have been discussing :-

bash-3.00# dtrace -n av_dispatch_autovect:entry'/`autovect[args[0]].avh_link->av_vector/{@[args[0]]=count(); printf("%a, %x", `autovect[args[0]].avh_link->av_vector, args[0])}'

  1   2391       av_dispatch_autovect:entry ath`ath_intr, 13
  1   2391       av_dispatch_autovect:entry ath`ath_intr, 13
  1   2391       av_dispatch_autovect:entry ath`ath_intr, 13
  1   2391       av_dispatch_autovect:entry ath`ath_intr, 13
 

There is a very interesting blog by Anish at this link on APIC and Solaris x86 interrupt handling.
 

How does Solaris APIC implementation allocates Interrupt 

Now that we looked at how APIC is structured in Solaris x86 and how interrupts are handled, lets look at how interrupts are allocated. There are three types of interrupts --  DDI_INTR_TYPE_FIXED, DDI_INTR_TYPE_MSI and DDI_INTR_TYPE_MSIX in the order they are evolved. Solaris DDI routine ddi_intr_get_supported_types() can be called to retrieve types of interrupt supported by the Bus.

In case of MSI, apic_alloc_msi_vectors() gets called and in case of MSI-x, apic_alloc_msix_vectors() gets called to allocate the appropriate number of interrupt vectors. Note that MSI supports 32 number of vectors per device function and MSI-x supports 2048 number of vectors per device function however in Solaris x86 we only support 2 MSI-x interrupt vectors per device (the reason for studying APIC and MSI-x by me). On SPARC, Solaris supports far more MSI-x interrupt and configured by #msix-request property in DDI. This hard limit is determined by i_ddi_get_msix_alloc_limit() function however even on SPARC it seems we limit to 8.

msix_alloc_limit = MAX(DDI_MAX_MSIX_ALLOC, ddi_msix_alloc_limit);

/* Default number of MSI-X resources to allocate */
#define DDI_DEFAULT_MSIX_ALLOC  2

/* Maximum number of MSI-X resources to allocate */
#define DDI_MAX_MSIX_ALLOC      8

These limits will change when Interrupt Resource Management (IRM) framework is integrated in Solaris.

Anyway, lets get back to the topic. Depending upon the interrupt type and bus intr ops, Solaris will jump to interrupt ops. In our case, we will get into pci_common_intr_ops() from ddi_intr_alloc(9F) to allocate the interrupts with cmd DDI_INTROP_ALLOC. We will not get into FIXED type interrupts as they are hard wired via IOAPIC and fairly easy (I suppose).  It's the psm_intr_ops which gets into action with cmd PSM_INTR_OP_ALLOC_VECTORS and we land up in apic_intr_ops().

apic_intr_ops
{
        [.]
        case PSM_INTR_OP_ALLOC_VECTORS:
                if (hdlp->ih_type == DDI_INTR_TYPE_MSI)
                        *result = apic_alloc_msi_vectors(dip, hdlp->ih_inum,
                            hdlp->ih_scratch1, hdlp->ih_pri,
                            (int)(uintptr_t)hdlp->ih_scratch2);
                else
                        *result = apic_alloc_msix_vectors(dip, hdlp->ih_inum,
                            hdlp->ih_scratch1, hdlp->ih_pri,
                            (int)(uintptr_t)hdlp->ih_scratch2);
                break;
                [.]
}


apic_alloc_msi_vectors() - This function allocates 'count' number of vectors for the device. 'count' has to be power of 2 and the priority is passed by the caller. The first thing which this function does is - it checks whether we have enough vectors available at the priority to satisfy the request and tt is done by routine apic_navail_vector(). We start our search whether we can get contiguous vectors and the value returned by apic_find_multi_vectors() is our starting point. It seems MSI has this constraint to give contiguous vectors only. I don't why.

The next step is to check whether we have enough irq's in the apic_irq_table[]. This is done by the function apic_check_free_irqs().  If we succeed in finding enough IRQ entries in the table, apic_alloc_msi_vector() proceeds to allocate irq which is done by apic_allocate_irq(). The IRQ no. returned by this function is finally used by autovect[] table to index into the appropriate vector. We will go into autovect[] again soon but for now lets see how we select CPU. The selection of CPU for this IRQ is done by apic_bind_intr() for the first interrupt in 'count' number of vectors and subsequent vectors are bound to the same CPU. These steps are done in a loop for 'count' number of times.

Now that we have setup IRQ in the apic_irq_table[] with priority, vector, target CPU etc, we are set to enable the interrupt. BTW, all this is mostly done in driver's attach(9E) entry point but mostly in two phases with in the attach(9E) entry point -- (i) add interrupts by allocating them (ii) enable interrupts.

apic_alloc_msix_vectors() - This function does similar work as done for MSI interrupts except that we allocate the vector (apart from allocating the IRQ entry in the apic_irq_table[]) and bind the interrupt to CPU by calling apic_bind_intr() for each request in 'count'). MSI-x does have the limitation of contiguous vectors as MSI has. Vector allocation is done by routine apic_allocate_vector() which returns the free vector by walking apic_vector_to_irq[] table and looking for APIC_RESV_IRQ slot. The range is determined by the priority passed to it. For example if priority passed is 6, then range would be

        highest = apic_ipltopri[ipl] + APIC_VECTOR_MASK;
        lowest = apic_ipltopri[ipl - 1] + APIC_VECTOR_PER_IPL;

        if (highest < lowest) /* Both ipl and ipl - 1 map to same pri */
                lowest -= APIC_VECTOR_PER_IPL;

highest is 0x7f (0x70 + 0x0f) and lowest would be 0x60 (0x50+0x10) and this matches with our observation in the beginning of the blog.

A typical flow of this dance is as follows :-

  1  22557    apic_alloc_msix_vectors:entry name pciex8086,10a7, inum : 0, count : 2, pri :6
              pcplusmp`apic_intr_ops+0x114
              npe`pci_common_intr_ops+0x8f1
              npe`npe_intr_ops+0x21
              unix`i_ddi_intr_ops+0x54
              unix`i_ddi_intr_ops+0x54
              genunix`ddi_intr_alloc+0x263
              igb`igb_alloc_intrs_msix+0x134
              igb`igb_alloc_intrs+0x64
              igb`igb_attach+0xcb
              genunix`devi_attach+0x87

  1  22485         apic_navail_vector:entry name : pciex8086,10a7, pri 6
  1  22486        apic_navail_vector:return                31
  1  22547          apic_allocate_irq:entry        72
  1  22419         apic_find_free_irq:entry start :72, end : 253
  1  22417          apic_find_io_intr:entry        72
  1  22548         apic_allocate_irq:return                72
  1  22479       apic_allocate_vector:entry ipl : 6, irq: 72, pri: 1
  1  22480      apic_allocate_vector:return                96
  1  22473             apic_bind_intr:entry name : pciex8086,10a7, irq  72
  1  22474            apic_bind_intr:return                 0

Now lets talk about how driver enables interrupts once they are allocated. Interrupts can be enabled in block (more than one at once by DDI ddi_intr_block_enable(9F)) or calling explicitly ddi_intr_enable(9F) for each  interrupt however we will discuss ddi_intr_enable(9F) . Once again we will end up in pci_common_intr_ops() and call pci_enable_intr() which does two things mainly :-

-  Translate the interrupt if needed. This is done by apic_introp_xlate(). If the interrupt is MSI or MSI-x, we call apic_setup_irq_table() if the IRQ entry in the apic_irq_table[] is not setup. In our example, we have already done this so apic_introp_xlate() just returns IRQ number from 'apic_vector_to_irq[airqp->airq_vector]'. airqp is an entry in the apic_irq_table[] which gets assigned by calling apic_find_irq().

-  Add the interrupt handler by calling add_avintr(). We have actually touched this routine in this blog but it is worth mentioning - when in the life cycle of setting up interrupts we bind an interrupt handler (ISR or Interrupt Service Routine) to vector. The main task of add_avintr() is to insert  'autovec' in the appropriate index and call insert_av(). The other and the most important thing is to program the interrupt which is done by addspl(). addspl() is another pointer to function from the family of setlvl, setspl etc. In APIC case, it will be apic_addspl() which is just a wrapper over apic_addspl_common(). There are four arguments passed to it :-

apic_addspl_common(int irqno, int ipl, int min_ipl, int max_ipl)

We first get the pointer from apic_irq_table[] by indexing irqno and check if we need to upgrade vector or just check IPL in case this interrupt needs to be shared.  Eventually we will land up in apic_setup_io_intr() which does the main task. In fact apic_rebind() binds an interrupt to a CPU and apic_rebind() is called from apic_setup_io_intr(). Since we are discussing MSI/MSI-x and once apic_rebind() does sanity checks it will call  apic_pci_msi_enable_vector(). The following statement is what we write to program the interrupt :-

        /* MSI Address */
        msi_addr = (MSI_ADDR_HDR | (target_apic_id << MSI_ADDR_DEST_SHIFT));
        msi_addr |= ((MSI_ADDR_RH_FIXED << MSI_ADDR_RH_SHIFT) |
            (MSI_ADDR_DM_PHYSICAL << MSI_ADDR_DM_SHIFT));

        /* MSI Data: MSI is edge triggered according to spec */
        msi_data = ((MSI_DATA_TM_EDGE << MSI_DATA_TM_SHIFT) | vector);

apic_pci_msi_enable_mode() is also called from apic_rebind() to enable the interrupt once it's programmed. That's how per-vector masking is controlled I suppose.

Since we are touch how we bind an interrupt to a CPU, I should also mention how Solaris selects CPU to bind an interrupt. The routine apic_bind_intr() is responsible for doing this and the decision is based on value of tunable 'apic_intr_policy'. You can define three type of policy -- (a) INTR_ROUND_ROBIN_WITH_AFFINITY - round robin and affinity based policy which returns same CPU for the same dip (or device). This is the default policy. (b) INTR_LOWEST_PRIORITY - I don't know because it's not implemented and (c) INTR_ROUND_ROBIN - select cpu in round-robin fashion using 'apic_next_bind_cpu' global variable. Choosing between INTR_ROUND_ROBIN_WITH_AFFINITY vs INTR_ROUND_ROBIN may not be easy but I think the decision should be based on throughput vs locality awareness.

(2008-03-06 11:50:32.0) Permalink

20080304 Tuesday March 04, 2008

Photographs of Point Reyes Backpack

(2008-03-04 18:08:14.0) Permalink

20080224 Sunday February 24, 2008

Photographs of Crater Lake and Lava Beds


(2008-02-24 15:57:30.0) Permalink

20080131 Thursday January 31, 2008

Photographs from Hawaii (Tropical Paradise in US)

Kauai (Waimea Canyon, Waliua Falls)



Kauai (Na Pali Coast and Kalalau Trail).



Big Island (Mauna Kea).



Volcano National Park and Beaches (Big Island).

(2008-01-31 21:55:24.0) Permalink

20080111 Friday January 11, 2008

xVM experience so far

I recently configured xVM on Solaris - HVM (hardware-assisted virtual machine) and PV (Paravirtualized) guest (domU) domains. I could easily install Solaris 10 Update 5 as HVM domU, boot, configure network interface and assign IP. The plan is to have multiple domU as testbed having Solaris 10 and Solaris Nevada. This would cut down on machines and sanity checks can be done quickly as I don't have to install/boot OS every time. I can easily run functional tests if not performance benchmarks. The performance of Solaris 10 as HVM domain is not as good as Solaris Nevada (PV domU) and especially when there are more than one VCPUs but I guess it's being worked. I think the performance would drastically improve when we have PV (Paravirtualized) drivers for Solaris 10. I'll soon experiment installing xVM on my laptop and configure Windows XP as HVM domain.

Here's a small demo describing my experience so far with xVM :-

For installing the Solaris PV domuU, I used this sample script.

bash-3.2# cat snv.1.py<>name = 'solaris-pv'
memory = '1024'
vcpus = 4
# for installation
disk = [ 'file:/var/tmp/solarisdvd.iso,6:cdrom,r', 'phy:/dev/zvol/dsk/snv-pool/vol,0,w' ]
on_poweroff = 'restart'
on_reboot = 'restart'
on_crash = 'preserve'

In 'disk', you will see 'file and 'phys' and they specify what kind of media it is. Once you have specified the location in 'disk', you also specify the type of access like read (r) or write (w).

Once you run '#xm create script.py', you will see OS installation screen and once the installation is completed, I used a similar script but removed solarisdvd paragraph from 'disk' (mentioned in the .py file).

name = 'solaris-pv'
memory = '1024'
vcpus = 4
disk = [ 'phy:/dev/zvol/dsk/snv-pool/vol,0,w' ]
on_poweroff = 'destroy'
on_reboot = 'restart'
on_crash = 'preserve'
vif = [ 'mac=0:14:4f:2:12:35, ip=10.5.63.98, bridge=nge1' ]

With the 'vif' property you can specify what network interface you want. You can also set 'config/default-nic' property in xvm/xend service if you want to override the NIC. Finally, once you have booted guest domain, you will see the interface as rtls0. You can run 'dlmadn show-dev' to see if network interface is really configured or not and run ifconfig(1m) to plumb the interface.

You can see the resources of each as follows.

bash-3.2# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 4973 4 r----- 4019.6
S10U5HVM 8 2056 1 r----- 40.8
solaris-pv 10 1024 1 r----- 5.0



I also found following links to be very helpful as I learnt how to configure domU.
Write-up from Chris Beal
Write-up from mbrowarski

(2008-01-11 16:43:43.0) Permalink

20080105 Saturday January 05, 2008

Photographs of winter-break trips (21st Dec 2007 to 2nd January 2008)
Antelope Slot Canyon, Page Arizona

>

Arches National Park, Utah.



Las Vegas.



Bryce Canyon National Park, Utah.



Capitol Reef National Park, Utah.



Zion National Park, Utah.



South Coastal Oregon, Oregon.



(2008-01-05 14:41:54.0)