Ted H. Kim's Weblog

Musings of a Random Dude


20040914 Tuesday September 14, 2004

What can you do with InfiniBand?

Okay so it's time to post again on the InfiniBand topic. You might be wondering what can InfiniBand (IB) really be used for. In theory, InfiniBand could be used for a lot of things, including I/O, clustering, etc. But it is far more interesting to look at the cases where folks have actually gone forward with developing the Upper Level Protocols (ULPs) to use it.

For general networking, IP is king. So naturally, folks have implemented Internet Protocol on InfiniBand or IPonIB. Once you have done that your whole IP based stack (including TCP and whatever is above that) can ride on top. So for compatibility of IP infrastructure, this item is a must have. But IPonIB does not really use IB to its full advantage for performance. For one thing, it does not use RDMA, and often times deficiencies in the network stack prevent the full bandwidth potential from being used.

For reliable, connection-oriented services, most people think in terms of the socket interface on top of TCP/IP. InfiniBand has an equivalent called the "Sockets Direct Protocol" or SDP. SDP is described in chapter A4 of the volume 1 version 1.1 specification available from the InfiniBand Trade Association. SDP is not 100% compatible with all socket options, but it does support most common options. The advantage of SDP is that it can take advantage of the underlying reliability of IB's hardware implemented reliable connected transport type, so that software overhead for reliability is avoided. Further, the SDP protocol can use RDMA, though the socket interface must be extended to include such features as asynchronous operation. An effort to standardize these extensions is ongoing in the Socket API Extensions Work Group of the Interconnect Software Consortium (ICSC).

Then there are various sorts of existing protocols which have mapped onto IB and retooled to use RDMA to their advantage. For file access there is NFS over RDMA and "Direct Access File System" or DAFS. My understanding is that NFS and DAFS are merging together into a common protocol being developed in the IETF NFS v4 working group.

For block storage, there is SCSI RDMA Protocol or SRP and its successor SRP-2. Since much of block storage is actually Fibre Channel, there also an InfiniBand mapping for the Fibre Channel HBA API or FC-HBA to support storage management. Unfortunately, I have heard there are some hitches with SRP-2 going forward. Another obvious thing to do for block storage is to map iSCSI to IB, especially the RDMA version called iSCSI Extensions for RDMA or iSER being standardized in the IP Storage Working Group.The main problem is that InfiniBand iSCSI or iSER is not standardized anywhere. Also, there are some issues which need to be resolved with how you map the iSNS or SLP name services to IB. You can use companion IPonIB based services or do a more direct mapping to IB identifiers, but something has to be done given the basic assumption of IP based naming for iSCSI.

There always non-standard, proprietary uses of IB as well; but these are by their very nature vendor specific. But you might expect some IB native I/O device drivers to use this approach.

Another way to use InfiniBand is in an OS Bypass mode. In this type of architecture, the resources of an InfiniBand adapter are mapped directly into an application process. The application can talk directly to the hardware without any OS overhead. Obviously, you still have to involve the OS in the setup process or you have lost all your protections and policy. But once things are setup, this arrangement allows for the fastest possible access to the hardware. There are two APIs designed to enable this: uDAPL from the DAT Collaborative and ITAPI from ICSC Interconnect Transport API Working Group. An OS Bypass type of arrangement is attractive to HPTC people who want to implement the Message Passing Interface or MPI on IB.

Both uDAPL and ITAPI essentially provide a one to one mapping to IB objects in an OS bypass mode. The drawback is that this programming model is a specialized one, which many applications might not want to spend the effort to port to. Therefore, there is a tradeoff on squeezing every bit of performance from IB versus the learning curve/porting cost. In some cases, its better to choose the SDP/Extended Sockets approach to have a model closer to sockets. Except for the connection setup, these APIs do not really dictate a ULP per se. So its certainly possible that any or all of the previously mentioned ULPs could be done in an OS Bypass manner. It's just a matter of what makes sense in the overall system.

Technorati Tags:

(2004-09-14 13:17:56.0) Permalink

Calendar

« September 2004 »
SunMonTueWedThuFriSat
   
2
3
4
5
6
7
9
10
11
12
15
16
17
18
19
20
21
22
23
24
25
26
28
29
30
  
       
Today

RSS Feeds

XML
All
/Boardgames
/Books
/General
/InfiniBand
/IO
/iWARP

Search

Links






Navigation