arnaudq's blog

Friday Mar 13, 2009

URI encoding in DAV:href element

The DAV:href xml element, defined in the WebDAV specification is used in many request/response payloads and DAV properties. It is important to note that the element value is a URI as defined in RFC3986.

As such, this value must be percent encoded if it contains characters outside the allowed range. When clients and server misinterpret this requirement, things can get messy. Here is some illustration:

If a calendar client knows about the principal collection of a user, it can retrieve the CalDAV calendar-home-set property of the user:

PROPFIND /dav/principals/johnd/ HTTP/1.1
host: dav.example.com
content-type: text/xml
content-lengh: xxx
depth: 0

<?xml version="1.0" encoding="utf-8"?>
<D:propfind xmlns:C="urn:ietf:params:xml:ns:caldav" xmlns:D="DAV:">
 <D:prop>
  <C:calendar-home-set/>
 </D:prop>
</D:propfind>

In this case, the calendar home was identified internally as '/dav/home/John.Doe@example.com/'. Our server was sending the following response:

HTTP/1.1 207 Multistatus
Date: xxx
Content-Type: application/xml; charset="utf-8"
Content-Length: xxxx

<?xml version="1.0" encoding="UTF-8"?>
<D:multistatus xmlns:D="DAV:" xmlns:C="urn:ietf:params:xml:ns:caldav">
  <D:response>
    <D:href>/dav/principals/johnd/</D:href>
    <D:propstat>
      <D:prop>
        <C:calendar-home-set><D:href>/dav/home/John.Doe%40example.com/</D:href></C:calendar-home-set>
      </D:prop>
      <D:status>HTTP/1.1 200 OK</D:status>
    </D:propstat>
  </D:response>
</D:multistatus>

(Once percent encoded, the '@' character becomes '%40').

Using the returned calendar-home-set, the client would then get the list of calendars under this calendar home by issuing a Depth=1 PROPFIND query. But it would re encode the returned URI, resulting in a:

PROPFIND /dav/home/John.Doe%2540example.com/ HTTP/1.1

The client encoded '%' as '%25' and the server was interpreting the request as targeting a collection identified internally as /dav/home/John.Doe%40example.com/ which of course did not exist.

As it turned out, the '@' character is a reserved character but it is one of the few reserved characters allowed in the path component of a URI. So our server probably should not have encoded it in the first place.

But given that the URI was already encoded, the client should have either send it as is (i.e. without the need to either decode or reencode it):

PROPFIND /dav/home/John.Doe%40example.com/ HTTP/1.1

or with the '@' character decoded:

PROPFIND /dav/home/John.Doe@example.com/ HTTP/1.1

It looks like there are other cases of client/server misinterpreting those URIs.

Comments:

Clients (just like servers) need to decode the URL components to perform proper matching in their caches.

There is not just 'one' consistent encoding for HTTP URLs, anything _can_ be percent escaped. For example some servers encode non-ASCII chars, some don't.

Sample:
/user/John%20Doe/
vs
/user/%4Aohn%20Doe/

Represent the same thing per spec (unless I'm utterly wrong).

Obviously the client must be careful not to encode things twice ;-)

Posted by Helge on March 16, 2009 at 05:05 PM CET #

Post a Comment:
  • HTML Syntax: NOT allowed

Calendar

Feeds

Search

Links

Navigation

Referrers