My
previous post outlined a brief idea, I will try to flesh it out a
bit more in this post.
The main areas of focus would be :
- What are these 'named list' ? How do you use them ? What are its
required properties ?
- When would it be a good idea to use this proposal ? When not ?
- What all can be optimised this way ?
- Some examples on how to implement it goddamnit !
Hopefully this post will address my thoughts in the same order in which
I list them above.
Named lists
Named lists are a construct loosely based on
XEP 33.
That is, their idea came to me through that spec - but they dont really
share much with it except that I believe that XEP 33 could logically
include this as an extension.
In this proposal, the main properties of named lists (in this context)
are :
- Each named list has a 'sender jid' and a list of 'n' reciepent
jids.
- Named lists can be removed at any point of time by the hosting
server - so, remote servers cannot make assumptions about a lists
existance.
- For an xml stanza recieved for a named list, the hosting server
generates 'n' packets with 'from' replaced by the 'sender jid' and 'to'
being the ith reciepent in the list : and then processes
each of these stanza's as though they were recieved individually from
remote server.
Now, given these we have the following cases :
- serverA sends stanza to named list 'routing_list' in serverB,
list exists : so stanza's delivered -- Ideal case, maximum savings !
- serverA sends stanza to named list 'routing_list' in serverB,
list does not exist : error returned -- either serverA recreates a new
list, or fallsback on more traditional approach of stanza delivery
(suppose it finds that serverB is a bit too aggressive in list cleanup
and decides against using lists, etc).
In both of these cases, serverA should recieve some sort of
acknowledgement that the stanza was either recieved or error was
encountered.
In XMPP, request-response pattern is modelled using 'iq' stanza's, so
my current thoughts are to use that for this purpose.
The general structure will be :
- serverA sending stanza to serverB.
<iq from='domainA' to='named_list@domainB' type='set' id='someId'>
<A single xml stanza to be delivered to the list without from
and to attributes. />
</iq>
- Failure at serverB to deliver this stanza.
If this fails for whatever reason, we will have :
<iq from='named_list@domainB' to='domainA' type='error' id='someId'
/>
The reason is not important, what matters is that it failed and that
the sending server should retry.
It could be 'cos the list was removed, or there was some other error -
bottomline, cant deliver - so retry.
Typically, this will trigger serverA to either create a new list and
try against that list name or stop using lists itself and fallback on
'traditional' methods (impl detail).
Note: Here, I have removed the constrain for error responses that I
placed yesterday : namely that the response MUST contain the actual
stanza to be delivered within it.
This means that, until the iq response comes back from serverB, serverA
will need to hold on to that stanza for retransmission purposes.
The reason why I removed this MUST requirement was 'cos we cant enforce
constraints on the contained xml stanza returned in the error response
: neither will we be able to validate or enforce it in case remote
server is misbehaving for whatever reason.
This will also reduce the S2S traffic payload.
<iq from='named_list@domainB' to='domainA' type='result' id='someId'
/>
This indicates that the delivery was successful.
Note that, this is delivery to the list that was successful - that is,
names list existed when stanza was recieved and serverB could process
the stanza for that list.
There might be errors while processing the actual generated stanza's
later on - we are not concerned about that now.
Now that we have established 'how' serverA uses a named list on serverB
- let us look into how it can create it.
As stated initially, the list creation is a simple enough stanza.
<iq from='domainA' to='domainB' type='set' id='someId'>
<create_list xmlns='sun:im:namedlist'
sender='userA00@domainA'>
<j>userB00@domainB</j>
<j>userB01@domainB</j>
<j>userB02@domainB</j>
<j>userB03@domainB</j>
<j>userB04@domainB</j>
<j>userB05@domainB</j>
<j>userB06@domainB</j>
<j>userB07@domainB</j>
<j>userB08@domainB</j>
<j>userB09@domainB</j>
</create>
</iq>
That is, we specify the sender JID and the list of jid's who form the
list.
The response will either be a success (<iq from='domainB'
to=domainA' type='result' id='someId'
name='named_list@domainB'/> ) or error (<iq from='domainB'
to=domainA' type='error' id='someId' /> ) - if error, dont retry but
fallback on 'traditional methods' for delivery : like current XEP 33
defined methods or directed delivery.
Note:
- The name of the list is assigned by serverB - not serverA and no
meaning MUST be associated with this at serverA other than as a jid
(that is, no attempts to encode/decode info from node/resource, etc :
both are opaque to serverA)..
- serverB CANNOT control the participants of a list - it MUST
either create a list with all jids specified by serverA or return error
(like invalid jid, access denied to a jid, other policy constraints,
etc).
In case the endpoint for whom serverA was maintaining this routing info
on serverB does not need it anymore, then serverA could request serverB
to remove this list.
<iq from='domainA' to='domainB' type='set' id='someId'>
<remove_list xmlns='sun:im:namedlist'
sender='userA00@domainA' />
</iq>
The response to this stanza is not really relevent to serverA : it will
be an error if list was already removed or result in case removal
succeeds - in either case, there is nothing serverA can do or must
attempt to do - both responses essentially mean that the named list is
no longer present on serverB.
It is a MUST requirement that serverB periodically remove 'old' lists
after some internal timeout - so even if serverA 'forgets' to remove a
list, serverB MUST do its cleanup.
Note that, even if serverA does not request a explict list removal -
serverB is free to
kick a named list out at any point of time without notifying serverA
(as part of its cleanup).
It is expected that the lists are removed only after a reasonable
timeout - but it is still purely the discreation of the list hosting
server.
Similarly, serverB could refuse creation of a list without any reason -
serverA MUST have alternate mechanism to deliver stanza's (the current
- traditional approach).
- How do you advertise this ?
As of now, my thoughts are to advertise this as a stream feature.
The way I look at it, this is a basic enhancement to stream routing -
so servers will exhibit this as a stream feature and named list MUST be
enabled only if this stream feature is sucessfully enabled.
Ofcourse, it need not be enabled in both directions - so serverB might
expose and allow it (so serverA can use it) - but not vice versa.
When to use ?
A few things are obvious :
- Use this approach when number of reciepents on serverB is above
some minimum (Implementation detail of serverA - but obviously more
than 1
).
- When you are expecting to use the list frequently enough - or
atleast enough number of times to justify the cost of list creation .
- Presence broadcasts at start of a session would be a good usecase
: you have atleast two stanza's to be sent - one directed
presence, and a probe : so the cost is 'recovered'.
- Lists can, and MUST go out of scope - list hosting
implementations MUST NOT depend on list creators to remove a list
explictly : and removals MUST NOT be notified to the list creator.
What all can be optimised ?
A rough list would be :
- Presence information - both broadcasts and probes.
- Multicasting messages : in xmpp, this would typically mean MUC.
- All other usecases mentioned in XEP 33 which can recur.
The MUC usecase can become tricky and is implementation dependent - but
the basic idea would be that number of messages sent should be higher
than list (re)creation (when users in a remote server join or leave).
It also requires a higher amount of coupling between the server and the
MUC component.
An example.
Let us consider the same example as yesterday - but this time, we look
at the packets too !
When userA00 comes online, serverA does not have a named list
associated with userA00's contacts on serverB who should recieve his
presence updates.
Hence, server creates that first.
serverA:>
<iq from='domainA' to='domainB' type='set' id='someId1'>
<create_list xmlns='sun:im:namedlist'
sender='userA00@domainA/resource'>
<j>userB00@domainB</j>
<j>userB01@domainB</j>
<j>userB02@domainB</j>
<j>userB03@domainB</j>
<j>userB04@domainB</j>
<j>userB05@domainB</j>
<j>userB06@domainB</j>
<j>userB07@domainB</j>
<j>userB08@domainB</j>
<j>userB09@domainB</j>
</create>
</iq>
serverB:>
<iq from='domainB' to='domainA' type='result' id='someId1'
name='named_list@domainB' />
Now serverA will send the directed presence and probe to serverB.
serverA:>
<iq from='domainA' to='named_list@domainB' type='set'
id='someId2'>
<presence />
</iq>
<iq from='domainA' to='named_list@domainB' type='set'
id='someId3'>
<presence type='probe'/>
</iq>
serverB:>
<iq from='named_list@domainB' to='domainA' type='result'
id='someId2'
/>
<iq from='named_list@domainB' to='domainA' type='result'
id='someId3'
/>
Here I am assuming that the list of users who are subscribed to
userA00's presence and to whom userA00 has subscribed to are the same -
which was a constraint in our scenario.
In case both are not the same (like in case there are privacy rules
applied, etc) , you will end up creating two lists.
For each of the stanza's dispatched to the list, serverB ends up
creating these stanza's and processes them as though serverA directly
sent it across.
<presence from='userA00@domainA/resource' to='userB00@domainB'/>
<presence from='userA00@domainA/resource' to='userB01@domainB'/>
<presence from='userA00@domainA/resource' to='userB02@domainB'/>
<presence from='userA00@domainA/resource' to='userB03@domainB'/>
<presence from='userA00@domainA/resource' to='userB04@domainB'/>
<presence from='userA00@domainA/resource' to='userB05@domainB'/>
<presence from='userA00@domainA/resource' to='userB06@domainB'/>
<presence from='userA00@domainA/resource' to='userB07@domainB'/>
<presence from='userA00@domainA/resource' to='userB08@domainB'/>
<presence from='userA00@domainA/resource' to='userB09@domainB'/>
Similarly for probe.
serverB now responds back to serverA for the probe requests as though
it was individually sent by the server.
Let us consider a subsequent presence push by which time serverB has
already removed the list.
serverA:>
<iq from='domainA' to='named_list@domainB' type='set'
id='someId4'>
<presence xml:lang='en'>
<show>away</show>
<status>be right back</status>
</presence>
</iq>
(Note again - no from or to !).
server:B>
<iq from='named_list@domainB' to='domainA' type='error' id='someId4'
/>
serverA can not either fallback on current approach sending out the
stanza individually to the reciepents (serverA always knows who the
reciepents (participants in the list) are !).
or it can recreate the list as above and retry.
Hope this clarifies the proposal a bit more ....
Updates:
- The careful reader will notice that the way I am encapsulating a stanza to be sent to a list can result in a schema violation. To solve it ? Have a wrapper element 'x' in a custom namespace 'ns' - this element just gets discarded and is present to be conforment with the schema. The mashup above is illustrative, not normative or formal

- I do mention it in this post, but let me put it explictly here - if the presence-out and presence-in lists are different (privacy policy , assymetrical rosters, etc) you just end up creating different lists : and if the overhead is deemed high, just dont create a list ! There is nothing forcing server to use this approach in all cases ! It should be noted though that, these are slightly towards the corner usecases ... so the benifit to the server hosting a large number of users using it in a 'normal' way will be high enough.
Trackback URL: http://blogs.sun.com/mridul/entry/minimising_s2s_traffic_in_xmpp
Posted by Jean-Louis Seguineau on November 10, 2006 at 02:59 AM IST #
Posted by Mridul on November 10, 2006 at 03:12 AM IST #