20050722 Friday July 22, 2005

Detecting data/file corruption

Sometimes I get escalations that go along the lines of '...I moved this application data from machine fred to machine bob and now the application won't read it. What's happened?'
To try and debug the problem from the application down, is probably going to be quite long-winded. So, my first action is to verify that the file is actually the same on both machines. i.e. did it get corrupted in the transfer. If it did, then we can forget the appliction layer stuff, and concentrate on the method of transfer. It seems obvious when you think of it, but sometimes in the heat of the momemt, the simplest things get forgotten. What follows are some examples of how to use standard Solaris tools to detect data corruption.

For a long time we've had binaries that generate a checksum against a file - which is a simple way to tell if the source and destination copies are the same. There are sum, cksum and now in s10 digest. Also we have 'cmp' which will do a byte-for-byte conparison of two files.

Examples

All of these tools can be used on reguar files and raw devices.

!!Copy a raw disk slice to an image file using dd.

# dd if=/dev/rdsk/c0t0d0s3 of=/var/tmp/c0t0d0s3.img bs=1024k
41+1 records in
41+1 records out

!!Now we can use the comparison tools, they should all come back identical or
clean.  Remember cmp gives no output for a matching pair of files.  For sum and
cksum, the first column is the checksum, the second column, the size.

# cmp /dev/rdsk/c0t0d0s3 /var/tmp/c0t0d0s3.img

# sum  /dev/rdsk/c0t0d0s3 /var/tmp/c0t0d0s3.img
28918 85050 /dev/rdsk/c0t0d0s3
28918 85050 /var/tmp/c0t0d0s3.img

# cksum  /dev/rdsk/c0t0d0s3 /var/tmp/c0t0d0s3.img
3185788260      43545600        /dev/rdsk/c0t0d0s3
3185788260      43545600        /var/tmp/c0t0d0s3.img

# digest -a md5  /dev/rdsk/c0t0d0s3 /var/tmp/c0t0d0s3.img
(/dev/rdsk/c0t0d0s3) = 0616a55e0a4e30ecf49c974f23a56255
(/var/tmp/c0t0d0s3.img) = 0616a55e0a4e30ecf49c974f23a56255

To show what happens when a file is corrupted we will write a single byte to the
front of the file, which is currently all zero's.

The current contents of the first 10 bytes of the file (offsets are in octal)
# od -x -N 10 /var/tmp/c0t0d0s3.img
0000000 0000 0000 0000 0000 0000
0000012

Now we write the first byte of /etc/hosts (any file would do) to the front of
the image file, to simulate corruption.
# dd if=/etc/hosts of=/var/tmp/c0t0d0s3.img bs=1 count=1 conv=notrunc

We now see that the file has changed by one byte.
# od -x -N 10 /var/tmp/c0t0d0s3.img
0000000 3100 0000 0000 0000 0000
0000012

!!Now we will re-run the comparison commands to see what is shown for a
corrupted file.

# cmp /dev/rdsk/c0t0d0s3 /var/tmp/c0t0d0s3.img
/dev/rdsk/c0t0d0s3 /var/tmp/c0t0d0s3.img differ: char 1, line 1

# sum /dev/rdsk/c0t0d0s3 /var/tmp/c0t0d0s3.img
28918 85050 /dev/rdsk/c0t0d0s3
28967 85050 /var/tmp/c0t0d0s3.img

# cksum /dev/rdsk/c0t0d0s3 /var/tmp/c0t0d0s3.img
3185788260      43545600        /dev/rdsk/c0t0d0s3
1666608083      43545600        /var/tmp/c0t0d0s3.img

Again, note that for cksum and sum, that the second column is identical in the
original and corrupt version since we have not changed the file length.

Timings, comparing two identical files on filesystem.  Single disk Ultra10 Solaris10.  The
 timings are dominated by waiting for IO.

# timex cmp /dev/dsk/c0t0d0s3 c0t0d0s3.img.bak

real       12.83
user        4.86
sys         1.31

# timex sum /dev/dsk/c0t0d0s3 c0t0d0s3.img.bak
28918 85050 /dev/dsk/c0t0d0s3
28918 85050 c0t0d0s3.img.bak

real       15.17
user        3.89
sys         1.15

# timex cksum /dev/dsk/c0t0d0s3 c0t0d0s3.img.bak
3185788260      43545600        /dev/dsk/c0t0d0s3
3185788260      43545600        c0t0d0s3.img.bak

real       14.57
user        2.73
sys         1.33

# timex digest -a md5  /dev/dsk/c0t0d0s3 c0t0d0s3.img.bak
(/dev/dsk/c0t0d0s3) = 0616a55e0a4e30ecf49c974f23a56255
(c0t0d0s3.img.bak) = 0616a55e0a4e30ecf49c974f23a56255

real       15.82
user        4.07
sys         1.68
( Jul 22 2005, 11:15:35 AM BST ) Permalink Comments [0]
20050705 Tuesday July 05, 2005

A Simple way to increase shared memory in Solaris10

Short Version

To increase the shared memory available to a given user on Solaris 10.
  • Find out which project the user is in
  • Use prctl to raise the limit e.g. to 200mb, using the project ID returned by id -p.
arches $ id -p
uid=90712(garyli) gid=10(staff) projid=10(group.staff)
arches $ su
Password: 
# prctl -n project.max-shm-memory -r -  v 200mb -i project 10

Long Version

By default the maximum amount of shared memory that a process can use is around 25% of physical memory. If you try to create a shared memory sgement larger than the allowable limit, you will see an error in the messages file, and the shmget system call will fail with EINVAL.
Jul  4 17:51:53 arches genunix: [ID 883052 kern.notice] privileged rctl project.max-shm-memory (value 195078144) exceeded by project 10
For instance, on arches we have only 512Mb
SunOS arches 5.10 s10_43 sun4u sparc SUNW,Ultra-5_10
arches $ prtdiag | head
System Configuration:  Sun Microsystems  sun4u Sun Ultra 5/10 UPA/PCI (UltraSPAR
C-IIi 300MHz)
System clock frequency: 100 MHz
Memory size: 512 Megabytes
And we can see what the default maximum shared memory segment will be, by using prctl
arches $ prctl -n project.max-shm-memory  -i project 10        

25758:  prctl -n project.max-shm-memory -i project 10
project.max-shm-memory                   [ no-basic deny ]
                    128100352 privileged deny           
         18446744073709551615 system     deny           [ max ]
	 
arches $ bc
128100352/(1024*1024)
122
	 
In the above case we have a maximum of 128100352 (122 mb) which we can allocate using shmat()/shmget()

We can now demonstrate that it is the case, by trying to allocate first 122mb, then 123mb of shared memory. The program shm_var takes a single value as its input, which is the size in Mb of a shared memory segment that we want to create

arches $ ./shm_var 122
Attempting attach of 122  Mb shm base address = F7000000 shmid = 5 shmat time = 1 sec

arches $ ./shm_var 123
Attempting attach of 123 Mb shm base address = FFFFFFFF shmid = FFFFFFFF shmat time = 0 sec
In the above example, shmat() fails because shmget() returned -1 as the address after it failed to get the shared segment which we asked for. Using truss, we see shmget fail...
shmget(25851, 128974971, 0777|IPC_CREAT)        Err#22 EINVAL
So, how to change all this is actually quite simple, and can be done on the fly. IMHO the prtctl command doesn't do us any favours with what looks to me like an overy complex syntax. However, here's a cook-book approach.

Firstly, because the shared memory resource is controlled on a project basis, need to know the project to adjust. In the simple case, the project to change will be the project that the user belongs to. So, in the case of an oracle install - su to oracle and issue id -p. Unless you have changed things manually, the project will be '3' 'default'. However, in the example below, my project ID is based on my groupid - so my projectid is 10. Your project id can simply be found by issuing id -p

arches $ id -p
uid=90712(garyli) gid=10(staff) projid=10(group.staff)
Then we issue the magic prctl command to raise the value
# prctl -n project.max-shm-memory -r -v 200mb -i project 10
We can now allocate 200mb, but NOT 201mb
roxy $ ./shm_var 200
Attempting attach of 200 Mb shm base address = F2800000 shmid = 2 shmat time = 1 sec

roxy $ ./shm_var 201
Attempting attach of 201 Mb shm base address = FFFFFFFF shmid = FFFFFFFF shmat time = 0 sec
Interestingly the shmmax limit is cumulative - and so does away with the confusing shmmax, shmseg etc.
roxy $ ./shm_var 100
Attempting attach of 100 Mb shm base address = F8C00000 shmid = 3 shmat time = 0 sec
^Croxy $ ./shm_var 100
Attempting attach of 100 Mb shm base address = F8C00000 shmid = 4 shmat time = 1 sec
^Croxy $ ./shm_var 1
Attempting attach of 1 Mb shm base address = FFFFFFFF shmid = FFFFFFFF shmat time = 0 sec
Note, that in the above test we did not do a shmdt() between each run of shm_var, and so in ipcs -a we see 200Mb of shared memory across two segments
IPC status from  as of Mon Jul  4 17:59:50 BST 2005
T         ID      KEY        MODE        OWNER    GROUP  CREATOR   CGROUP CBYTES  QNUM QBYTES LSPID LRPID   STIME    RTIME    CTIME 
Message Queues:
T         ID      KEY        MODE        OWNER    GROUP  CREATOR   CGROUP NATTCH      SEGSZ  CPID  LPID   ATIME    DTIME    CTIME 
Shared Memory:
m          4   0x2f8b     --rw-rw-rw-   garyli    staff   garyli    staff      0  104857600 12171 12171 17:58:16 17:58:18 17:58:15
m          3   0x2f86     --rw-rw-rw-   garyli    staff   garyli    staff      0  104857600 12166 12166 17:58:07 17:58:12 17:58:07
m          1   0x43cb9a88 --rw-r-----   oracle      dba   oracle      dba      6   46235648   791   809 13:25:22 13:25:53 13:25:14
T         ID      KEY        MODE        OWNER    GROUP  CREATOR   CGROUP NSEMS   OTIME    CTIME 
Semaphores:
s          5   0xe49024ec --ra-r-----   oracle      dba   oracle      dba    39 17:56:57 13:25:14
s          1   0x71000b51 --ra-ra-ra-     root     root     root     root     1 13:18:56 13:18:34
s          0   0x187cf    --ra-ra-ra-     root      sys     root      sys     1 13:17:55 13:17:54
roxy $ 
Notice also that Oracle has 46Mb that is not affected by our allocation (or vice versa)
# su - oracle
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
-bash-3.00$ id -p
uid=101(oracle) gid=1001(dba) projid=11(user.oracle)
-bash-3.00$ prctl -n project.max-shm-memory -i project user.oracle
project: 11: user.oracle
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged       186MB      -   deny                                 -
        system          16.0EB    max   deny     
Notice also I used user.oracle, rather than the project ID, although -i project 11 would have achieved the same thing. ( Jul 05 2005, 11:00:36 AM BST ) Permalink Comments [0]