Wednesday Apr 23, 2008
This is the last (but not the least!) design pattern in our series of parallel programming design patterns: the ring pattern.
This pattern can be applied to those set of problems which can be modeled as a ring of processes communicating with each other in a circular fashion. The requirement in such applications is that a set of
data is to be repeatedly operated upon by a fixed set of operations. This pattern can be considered as an extension of pipeline Pattern where the output of last process goes as input to the first process and the data keeps on rotating in the initial set of processes.
Now that we have seen five parallel programming design patterns in previous posts, we should get our hands dirty with MPI code, next time we will see an example of MPI program, for doing matrix vector multiplication(for very large matrices).
Wednesday Apr 02, 2008
As promised, we will see the divide and conquer pattern implemented in MPI Plugin for Netbeans.
This pattern is employed in solving many sequential problems where a problem can be split into number of smaller problems which can be solved independently. The intermediate solutions are merged to get the final solution. The sub problems are generally independent of each other and structured such that they can not be further sub divided. Also if the program correctness is independent of whether the subproblems are solved sequentially or concurrently, a hybrid system can be designed that sometimes solves them sequentially and sometimes concurrently, based on which approach is likely to be more efficient. With this strategy, the subproblems can be solved directly, or they can in turn be solved using the same divide-and-conquer strategy, leading to an overall recursive program structure.In summary program following this pattern should have a recursive process creation, base case solving mechanism, and merging the result. Maintaining the right level or recursion depth and problem size may need to be tuned.
The final pattern of our series would be the ring pattern, for more information, please see MPI Plugin design patterns.
Tuesday Apr 01, 2008
The pipeline pattern implemented in MPI Plugin, can be applied to those set of problems which can be modeled as a set of data flowing through multiple sets of computations.
The computations are ordered and independent and can also be seen as series of time-step operations. In a sequential execution scenario, the output of first step of computation would serve as input to the second step of computation, and so on for all the sets of computation. Parallelism is introduced in the application by overlapping the operations through different time step operations. The first step of component start operating as soon as the input is available, and the output of this step is passed to the second step component. Not during the nest time unit, the first time step component is free to accept more input and it does so making available the output to the second time step component ion next iteration. In next iteration the second time step component passes on its output to the third time step component and it accepts the output produced by first time step component in this iteration. this cycle keeps on continuing till all input is exhausted and all sets of operations are completely applied over the ordered data. Also a point to be noted is that each computation step must be comparably equal in size to have equal length of time steps, only then substantial parallelism can be achieved.
Next pattern which we will see in this series is Divide and Conquer Pattern. Stay tuned!!
Monday Mar 31, 2008
This is the second pattern which is implemented in MPI Plugin, called Master Worker Pattern.
This pattern is used to solve those class of problems which need performing same set of operations over multiple data sets. The set of operations are generally independent of each other and can be performed concurrently. Parallelism is achieved here by dividing the number of computations amongst available processes and each process creating identical number of processes. There is generally a Master process also called a managerial component) present which is responsible for distributing the work amongst the worker processes and then collating the data as the computation completes. Also the distribution of data among the worker processes can generally be done in any specific order, but it is important to preserve the order of processed data. The responsibility of each worker task is to perform each computation repeatedly on multiple sets of data as given by the master process. The decisive factors for choosing this pattern are among (but not limited to) Load Balancing, data integrity and data distribution.
In the next post, we will have a look at the Pipeline pattern.
Thursday Mar 06, 2008
Recently we released MPI Development environment for Netbeans IDE, and this series is a consolidated summary of Parallel Programming Patterns implemented in the Plugin. The first Pattern which we will see is SPMD(Single Process Multiple Data) Pattern.
This is a technique used to achieve data level parallelism. One of the dominant style of parallel programming, where all processors use the same program, though each has its own data, SPMD pattern exploits data parallelism in applications where a large mass of data of a uniform type needs the same instruction performed on it. The data is divided among processes to be independently operated. The example provided in the MPI Netbeans plugin shows following:
- An array of elements is created on main process which is then distributed amongst other processes.
- All processes do independent processing of data which is sent to them.
- If the main process wants, it can collect the data from other processes for some final processing, etc.
For more details please refer to
MPI Plugin Download page and its
Development guide. This is the link to
Parallel Programming Patterns documentation.
Tuesday Aug 14, 2007
Recently we released a MPI plugin for Netbeans.
The purpose of this plugin is to allow application developers to access
Netbeans platform to develop, test, debug MPI applications for the Sun
Grid Compute Utility. This plugin includes an early access version of
the new MPI Development Plugin for NetBeans(tm) IDE, which is targeted
at C/C++ developers who are working with MPI applications that can be
modeled as a set of independent, compute-bound tasks. The software is
published under the GNU General Public License.
MPI Development Plugin for NetBeans(tm) IDE project offers following in its current early access state:
- MPI programming model to simplify the design and development of C/C++ MPI applications.
- Netbeans IDE framework built in features enhanced to support the efficient execution of C/C++ MPI applications on the Sun Grid Compute Utility.
- MPI testing plug-in for the NetBeans IDE to ease local development and testing of C/C++ MPI applications.
- Pre built collection of Sample MPI applications for illustrating
effective use of Parallel Programming Patterns to build C/C++ MPI
applications for Sun Grid.
Learn More:
More in this series:
In the next posts, look out for Parallel Programming Patterns and related examples for this plugin, which we have developed.
Monday May 14, 2007
GNU Linear Programming toolkit
The GLPK (GNU Linear Programming Kit) package is intended for solving large-scale linear programming (LP), mixed integer programming (MIP), and other related problems. It is a set of routines written in ANSI C and organized in the form of a callable library.
How to utilize GLPK on Sun Grid?
Detailed steps are available on https://gnu-glpk.dev.java.net/. As usual, for running GNU Linear Programming Toolkit on Sun Grid, you would need an account on http://www.network.com.
Sample data and example files
Example data files are available on the developer page of GNU LPK.
Resources:
GNU Linear Programming Toolkit download:
http://www.gnu.org/software/glpk/
Running GNU GLPK on Sun Grid:
https://gnu-glpk.dev.java.net/
Sunday May 13, 2007
Calculix
CalculiX is a software used to solve field problems by using the finite element method. With CalculiX Finite Element Models can be build, calculated and post-processed. The pre- and post-processor is an interactive 3D-tool using the openGL API. The solver is able to do linear and non-linear calculations. Static, dynamic and thermal solutions are available.
Why Calculix on Sun Grid?
Calculix is an ideal choice for running on Sun Grid as it is an compute intensive application. Making it available as a service would tremendously benefit scientists, mathematical solvers, etc. You only need to have your input files ready, without worrying about any other aspects of running Calculix.
How to run Calculix on Sun Grid?
Detailed steps are available on https://calculix.dev.java.net/. For running Calculix on Sun Grid, you would need an account on http://www.network.com. If you don't have an account on Sun Grid, you can request for an account here. Now that Sun Grid is available in 24 countries, Calculix has become much more accessible to end users in these countries.
Sample data and example files
To get a head start in running Calculix, example data files are available on the developer page of Calculix.
Resources:
Calculix download:
http://www.dhondt.de/
Calculix home page:
http://www.calculix.de/
Running Calculix on Sun Grid:
https://calculix.dev.java.net
Friday May 11, 2007
Sun Grid compute utility has added a bunch of new features making it more flexible and powerful. The new release of Sun Grid has following capabilities:
International access
Previous to this release Sun Grid was available only in the United States, but it's now available in 24 countries across the globe: United States, Australia, Austria, Belgium, Canada, China, Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, India, Ireland, Italy, Japan, New Zealand, Poland, Portugal, Singapore, Spain, Sweden, and the United Kingdom. This is a significant feature addition considering the legal implications.
Internet access(Bi directional)
A job which ran on SunGrid was helpless if it needed to access information or any service outside network.com. But now with Internet access being added to the feature list will enable an application running in Sun Grid compute environment to access outside world information/service. While the use of this feature may not be immediately evident, it has opened a host of opportunities for creating innovative applications. For example:
- A Bioinformatics application can access any public database
- Results of your application can be delivered to specific source.
- An application can be hosted in form of mashup to offer myriad ranges of services
- and the possibilities are endless..
Security is easy to achieve by pairing Internet access with a tunnel technology such as ssh.
Job submission API
Although in limited beta release, this feature will give an API for submission and management of jobs programmatically. Accessing network.com jobs can also be done via an command line interface. As an alternative to the web based interface, Job submission API gives the much desired flexibility to the end user of using a programmatic interface. This feature has to be requested from the Sun Grid Customer Care (only possible if you have an account on Sun Grid Compute Utility).
Resources:
Whats new on Sun Grid
http://www.sun.com/service/sungrid/whatsnew.jsp
Jim Parkinson's blog
http://blogs.sun.com/jpblog
Network.com news
http://network.com/news.html
Saturday Mar 17, 2007
Recently while reading an ebook on Beowulf
cluster this struck me: What is more expensive
affair? Reading an eBook or owning a printed book? Let us run through
some quick calculations for a book of 300 pages.
Reading an eBook:
Assuming you have standard pc (including 17" monitor, cable modem,
etc), it will consume typically 330 Watts in one hour (for more details
refer
How
much electricity do computers use?). Now if you have reading
speed of 100 words per minute (fairly average for reading
technical texts) it would take (1000 words per page
/ 100 words per minute) = 10 minutes per
page to read a page of 1000 words. If the book is of 300 pages then you
will take 300 pages per book * 10 minutes per page = 3000 minutes per
book or 50 hrs approximately to complete the book.
So the cost of reading the book once would be (50
hrs * 330 Watts / 1000 ) * 3.40 Rs per KWH =
56 Rs. and also the
cost repeats for each subsequent reads. The costs we have neglected (as
we don't have to pay those from our pockets) are the cost of
servers(electric, data centers' maintenance costs, etc.) which host the
book, electric costs of intermediate routers, proxy servers, etc.
Adding those costs would increase the costs many folds and my guess is
that they would make the original cost negligible.
Owning a physical book
A good technical book of 300 pages would cost anywhere between
Rs 300 to 500. But
this would afford us multiple readings at no extra cost. Some more
facts which come to my mind :
- Often eBooks are not legal, where as we are assured for the
originality of hard copies of books.
- Often one hard copy of book is read by multiple readers. This is at
no extra cost. On line books can also be shared easily, but each
reading of on line book will bring recurring electricity
costs.
- I find reading on line books a big strain on the eyes. I haven't ever
successfully competed an on line book !!
- An advantage of on line books is : more flexibility in organization
of contents (we can separate important contents and take prints if
necessary). We have no such flexibility for hard copies of books.
- We can read books in bed but not ebooks :)
Wednesday Mar 14, 2007
Are you a open source developer who has created a cool application and want to give it enhanced visibility? Are you a
research scientist lacking the infrastructure and service-provider know-how to run complex applications? Welcome to Sun Grid ! Additional features were announced for Network.com adding muscle to the already cool pay per use utility offering which would enable end users to tap into high performance computing (HPC), enterprise applications
and infrastructure for complex computations as a service.
From creating your own application for Sun Grid to publishing the application for other end users (making some bounty in the process if you choose to do so), Sun Grid would also enable you to instantly access popular ISV and open source applications on a pay-per-use basis. You can choose an already existing application in the Catalog, or you can create and publish your own application.(A how to is available here).
Some resources for Sun Grid users:
Tuesday Mar 13, 2007
What is ClustalW?
ClustalW is a general purpose multiple sequence alignment program for DNA or proteins.It produces biologically meaningful multiple sequence alignments of divergent sequences. There are three main steps for achieving alignment: pairwise alignment, guide-tree generation and progressive alignment.
Why Clustal-w ?
ClustalW-mpi is an ideal choice for running on Sun Grid as it has inherent support for MPI. In the MPI version of ClustalW, both the pairwise and progressive alignments stages are parallelized.
How to run ClustalW on Sun Grid?
Detailed steps are available on https://clustal-w.dev.java.net/. To run ClustalW on Sun Grid, you would need an account on http://www.network.com. If you don't have an account on Sun Grid, you can request for an account here.
Sample data and example files
To get a head start in running ClustalW, example data files are available on the developer page of ClustalW.
Links:
ClustalW download: ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/
ClustalW MPI home page: http://packages.debian.org/unstable/science/clustalw-mpi
Running ClustalW on Sun Grid: https://clustal-w.dev.java.net
Monday Dec 04, 2006
Well that is a bit old news, but what better news to start a blog :). Well for readers who are unaware, have a look here. Key Java implementations - Java Platform Standard Edition (Java SE), Java Platform Micro Edition (Java ME), and Java Platform Enterprise Edition (Java EE) - under the GNU General Public License version 2 (GPLv2), the same license as GNU/Linux. Read more here.
More about the Grid Engine @ FOSS.in in my next post !! Stay tuned !!