Collected thoughts and musings George's Sun Blog

Wednesday Jul 29, 2009

One of the project that I've been working on recently has been to add metadata support to the Apache Avro project, and I wanted to post an update of that progress.

Avro is a multi-language serialization/deserialization library and RPC middleware framework that is a Hadoop subproject.  Avro will eventually replace Hadoop's internal RPC protocols and on-disk data structure representations.  A key advantage of using Avro will be that different language clients can interact with Hadoop, and the schemas for protocols and on-disk data structures can change and evolve over time and still be supported.

The specification for Avro now includes metadata support (https://issues.apache.org/jira/browse/AVRO-67).  This metadata takes the form of a Map of bytes (with Utf8 strings as keys).  In Java, this looks like:

Map<Utf8,ByteBuffer> meta;

There is a metadata map sent along the RPC connection handshake, and one sent on each individual RPC call.  The map is cleared out before sending along each call.  To set, query, or manipulate the metadata, there is going to be a "plugin" architecture where you can write a plugin that will have access to this map.  Note that all the plugins share a single map per request/handshake.

What can metadata be used for?

  • Authentication credentials and setup
  • Authorization of clients
  • Encryption
  • Compression
  • Tracing (e.g., X-Trace)
  • Accounting

Currently, I'm working on the implementation of this API (see https://issues.apache.org/jira/browse/AVRO-76).  You can see the slow but steady evolution of the API at that link.

What kind of functionality do you need from a metadata layer in RPC?  Interested in implementing any of the above functionalities?  I'd love to hear from you.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed