Amazon EC2 OpenSolaris re-bundling process trouble shutting
Středa IV 15, 2009
This entry is based on customer escalation, I hope it will help or at least inspire you in some extend too.
This entry is part of 'OpenSolaris on Amazon EC2' workshop
Target::
Customer wont to regularly backup running Amazon EC2 OpenSolaris instance, while APPs was running.
Issue::
Re-bundling take a,ling time and it looks frozen (plus it was occupy lot of resources)
Monitor ZFS cloning process - (rebundle.sh) script
Note: Rebundle script use ZFS mirror cloning mechanism implemented on system level, as such it has build it "share resources" capability.
Log with another connection and start this rebundle.sh script
Paste it in BASH, some $ and \ is escape with ONE more \
cat >/tmp/mon.ksh <<EOF
#!/bin/ksh
echo "ZFS Cloning started"
echo "Waiting , so clone process really start "
while true
do
zpool status rpool | grep "resilver in progress" >/dev/null
if [ \$? -eq 0 ]
then
break
else
print -n -e "\b-"
sleep 1
print -n -e "\b\\\"
sleep 1
print -n -e "\b|"
sleep 1
print -n -e "\b/"
sleep 1
fi
done
while true
do
zpool status rpool | grep "resilver in progress" >/dev/null
if [ \$? -eq 1 ]
then
break
else
status=\$(zpool status rpool | grep "resilver in progress"| gsed -e 's/ scrub: resilver in progress for/Elapsed/g' )
print -n -e "\r \$status "
print -n -e "\b-"
sleep 1
print -n -e "\b\\\"
sleep 1
print -n -e "\b|"
sleep 1
print -n -e "\b/"
sleep 1
fi
done
echo "ZFS Cloning ended"
exit 0
EOF
chmod 0777 /tmp/mon.ksh
/tmp/mon.ksh
Monitor and Control resources usage during ec2bundle command
- Try to add to params -v (or --verbose)
- ec2-bundle-image command is in ruby and simple use pipes with gzip, is possible then for some reason there is not enough cpu or mem to proccess it.
MEM: try to make couple "sync; sleep 10" commands so zfs cache cleans memory after cloning , monitor MEM, check, if you have swap and you don't swap when executing ec2-bundle command.
We need keep in mind then ec2tools was designed on Linux, where /tmp is by default on disk, on OpenSolaris? /tmp is by default in MEMORY backuped by SWAP.
CPU: try just to export GZIP zariable with level 1 compression, this will lover cpu load (undocumented by Amazon, but can help isolate issue)
(export GZIP='-1'; ec2-bundle-image ...)
Of course from CPU point of view you can try to run AMI in Second more powerful 32bit profile where you have 2 virtual CPUS
What ec2-bundle-image pipe does
cat /opt/ec2/lib/ec2/amitools/bundle.rb
# Bundle the AMI procedure:
# The image file is tarred - to maintain sparseness, gzipped
for compression and then encrypted with AES in CBC mode for
confidentiality.
# To minimize disk I/O the file is read from disk once and
piped via several processes. # The tee is used to allow a
digest of the file to be calculated without having to re-read it
from disk.
pipeline.concat([
['tar', "#{openssl} sha1 < #{digest_pipe} & " + tar.expand],
['tee', "tee #{digest_pipe}"],
['gzip', 'gzip'],
['encrypt', "#{openssl} enc -e -aes-128-cbc -K #{key} -iv #{iv} > #{bundled_file_path}"]
])
1. You can inspect ruby bundle pipe with Solaris commands
1.1 Get rebundle process PID
# ps -ef | grep "ec2-bundle-image -c" root 6205 2146 0 14:49:52 pts/1 0:00 /bin/bash /opt/ec2/bin/ec2-bundle-image -c
1.2 See command tree
# ptree 6205
6205 /bin/bash /opt/ec2/bin/ec2-bundle-image -c /mnt/keys/cert-GW---------------------CF
6206 ruby -I /opt/ec2/lib /opt/ec2/lib/ec2/amitools/bundleimage.rb -c /mnt/keys/cert
6210 /bin/bash -c /usr/sfw/bin/openssl sha1 < /tmp/ec2-bundle-image-digest-pipe & /u
6211 /usr/sfw/bin/openssl sha1
6212 /usr/sfw/bin/gtar -c -h -S -C /mnt Glassfish_2008.11_32_1.0.img
6213 tee /tmp/ec2-bundle-image-digest-pipe
6214 gzip
6215 /usr/sfw/bin/openssl enc -e -aes-128-cbc -K 48--keep-it-really-sercert--65
1.3 You can see whole constructed pipe as args of process next to ruby:
# pargs 6210
2 .You can try to monitor or even limit PIPE line bandwidth itself by using PV (Pipe Viewer)
Note: For this you need patch a Amazon EC2 ruby library, this is an supported, for debugging purposes only hack.
# pkg set-authority -O http://pkg.opensolaris.org/contrib contrib
# pkg install pv
# pv -V
pv 1.1.4 - Copyright(C) 2008 Andrew Wood <andrew.wood@ivarch.com>
# cp /opt/ec2/lib/ec2/amitools/bundle.rb /opt/ec2/lib/ec2/amitools/bundle.rb.org
# vim /opt/ec2/lib/ec2/amitools/bundle.rb
['tar', "#{openssl} sha1 < #{digest_pipe} & " + tar.expand],
['pv','pv -q -L 500k'],
['tee', "tee #{digest_pipe}"],
Hint: If you want to use pv just for progress monitoring use:
['pv','pv -N rebundle -Wpteb -s 10485761024 -B 500000 -f'],
You will see:
Rebundle: 753MB 0:02:36 [==> ] 7% ETA 0:31:56
# start ec2-bundle-image command
# Get pv pid
ps -ef | grep pv
root 6794 6791 0 15:38:46 pts/3 0:00 pv -N rebundle -Wptebf -s 10485761024 -B 68157440
Now you can limit bandwith with pv -R PID -L banwidth
# pv -R 6794 -L 100K
You can also play with -B buffer size param










