The Sun BabelFish Blog
Don't panic !
Duck Typing done right
Dynamic Languages such as Python, Ruby and Groovy, make a big deal of their flexibility. You can add new methods to classes, extend them, etc... at run time, and do all kinds of funky stuff. You can even treat an object as of a certain type by looking at it's methods. This is called Duck Typing: "If it quacks like a duck and swims like a Duck then it's a duck", goes the well known saying. The main criticism of Duck Typing has been that what is gained in flexibility is lost in precision: it may be good for small projects, but it does not scale. I want to show here both that the criticism is correct, and how to overcome it.
Let us look at Duck Typing a little more closely. If something is a bird that quacks like a duck and swims like a duck, then why not indeed treat it like a duck? Well one reason that occurs immediately, is that in nature there are always weird exceptions. It may be difficult to see the survival advantage of looking like a duck, as opposed to say looking like a lion, but one should never be surprised at the surprising nature of nature.
Anyway, that's not the type of problem people working with duck typing ever have. How come? Well it's simple: they usually limit the interactions of their objects to a certain context, where the objects being dealt with are such that if any one of them quacks like a duck, then it is a duck. And so here we in essence have the reason for the criticism: In order for duck typing to work, one has to limit the context, one has to limit the objects manipulated by the program, in such a way that the duck typing falls out right. Enlarge the context, and at some point you will find objects that don't fit the presuppositions of your code. So: for simple semantic reasons, those programs won't scale. The more the code is mixed and meshed with other code, the more likely it is that an exception will turn up. The context in which the duck typing works is a hidden assumption, usually held in the head of the small group of developers working on the code.
A slightly different way of coming to the same conclusion, is to realize that these programming languages don't really do an analysis of the sound of quacking ducks. Nor do they look at objects and try to classify the way these are swimming. What they do is look at the name of the methods attached on an object, and then do a simple string comparison. If an object has the swim method, they will assume that swim stands for the same type of thing that ducks do. Now of course it is well established that natural language is ambiguous and hence very context dependent. The methods names gain their meaning from their association to english words, which are ambiguous. There may for example be a method named swim, where those letters stand for the acronym "See What I Mean". That method may return a link to some page on the web that describes the subject of the method in more detail, and have no relation to water activities. Calling that method in expectation of a sound will lead to some unexpected results
But once more, this is not a problem duck typing programs usually have. Programmers developing in those languages will be careful to limit the execution of the program to only deal with objects where swim stand for the things ducks do. But it does not take much for that presupposition to fail. Extend the context somewhat by loading some foreign code, and at some point these presuppositions will break down and nasty difficult to locate bugs will surface. Once again, the criticism of duck typing not being scalable is perfectly valid.
So what is the solution? Well it requires one very simple step: one has to use identifiers that are context free. If you can use identifiers for swimming that are universal, then they will alway mean the same thing, and so the problem of ambiguity will never surface. Universal identifiers? Oh yes, we have those: they are called URIs.
Here is an example. Let us
- name the class of ducks
<http://a.com/Duck> a owl:Class; rdfs:subClassOf <http://a.com/Bird>; rdfs:comment "The class of ducks, those living things that waddle around in ponds" . - name the relation
<http://a.com/swimming>which relates a thing to the time it is swimming<http://a.com/swimming> a owl:DatatypeProperty; rdfs:domain <http://a.com/Animal> ; rdfs:range xsd:dateTime . - name the relation
<http://a.com/quacking>which relates a thing to the time it is quacking (like a duck)<http://a.com/quacking> a owl:DatatypeProperty; rdfs:domain <http://a.com/Duck> ; rdfs:range xsd:dateTime . - state that an duck is an animal
<http://a.com/Duck> rdfs:subClassOf <http://a.com/Animal> .
:d1 <http://a.com/quacking> "2007-05-25T16:43:02"^^xsd:dateTime .then you know that :d1 is a duck ( or that the relation is false, but that is another matter ), and this will be true whatever the context you find the relation in. You know this because the url
http://a.com/quacking always refers to the same relation, and that relation was defined as linking ducks to times. Furthermore notice how you may conclude many more things from the above statement. Perhaps you have an ontology of animals written in OWL, that states that Ducks are one of those animals that always has two parents. Given that, you would be able to conclude that
:d1 has two parents, even if you don't know which they are. Animals are physical beings, you may discover by clicking on the http://a.com/Animal URL, and in particular one of those physical things that always has a location. It would therefore be quite correct to query for the location of :d1... You can get to know a lot of things with just one simple statement. In fact with the semantic web, what that single statement tells you gets richer and richer the more you know. The wider the context of your knowledge the more you know when someone tells you something, since you can use inferencing to deduce all the things you have not been told. The more things you know, the easier it is to make inferences (see Metcalf's law).
In conclusion, duck typing is done right on the semantic web. You don't have to know everything about something to work with what you have, and the more you know the more you can do with the information given to you. You can have duck typing and scale.
Posted at 01:54AM May 26, 2007 [permalink/trackback] by Henry Story in Java | Comments[20]
Note on comments:
- I know the forms below are a little small. We have asked for years for this to be changed, but I don't think it's going to happen soon. In Apple's Safari you can resize the entry box with you mouse. For people using other browsers click on this javascript link, that should allow you to resize your form.
- Comments are moderated, so they will take a little time to appear. Currently moderation means I have to read them personally. Hopefully with OpenId deployment, this will become more automated.
- HTML markup no longer works here, due to some decision made somewhere. Sorry about that.
- Check your comments by using the preview button...

Posted by Kevin on May 26, 2007 at 11:53 AM CEST #
Posted by Perl Defender on May 26, 2007 at 03:46 PM CEST #
Posted by 67.176.49.253 on May 26, 2007 at 05:30 PM CEST #
"So what is the solution? Well it requires one very simple step: one has to use identifiers that are context free. If you can use identifiers for swimming that are universal, then they will alway mean the same thing, and so the problem of ambiguity will never surface."
I don't agree with this - just because the identifier is a URI doesn't mean that people will use it to mean the 'same thing'. Meaning isn't discrete - it's a continuum that varies with context of communication, and I don't think having identifiers with low risk of collision changes this.
Posted by Phil Dawes on May 26, 2007 at 05:30 PM CEST #
You state the need as being when you: "Extend the context somewhat by loading some foreign code". All foreign code would need to be understood and appropriate adapter methods used if signatures didn't match.
If I have a class method that takes an object and calls that objects talk() method expecting a string and have used this extensively in my application; if I were to re-use a body of code where that information is elsewhere in the objects, then I would have to insert a new talk() method in the objects that returned appropriate data. The original, duck-typing method still would not have to explicitly check for the type of object passed to it.
The almost canonical use of duck typing in python is in the acceptance of file-like objects in the standard library. This allows actual file objects to be substituted by instances of StringIO which allows a string to ook like a file by mimicking many of the file classes methods.
- Paddy.
Posted by paddy3118 on May 26, 2007 at 07:27 PM CEST #
Posted by Ricky Clarkson on May 26, 2007 at 08:42 PM CEST #
Posted by Jon Olson on May 26, 2007 at 08:58 PM CEST #
Posted by Eric Biesterfeld on May 26, 2007 at 11:11 PM CEST #
Posted by David Avraamides on May 27, 2007 at 12:25 AM CEST #
Posted by Henry Story on May 27, 2007 at 12:53 AM CEST #
You've essentially reinvented Lisp packages. From my reddit comment:
See, in Common Lisp a package owns its symbols: foo:bar is a different symbol from baz:bar. Furthermore, a package can use symbols from another package: quux might use foo, making quux:bar the same symbol as foo:bar. But since foo:bar/quux:bar on the one hand and baz:bar on the other are different symbols, calling a function named with the one symbol won't ever mistakenly call a function named with the other.
This is essentially no different from naming all functions, classes and objects with URL contexts save that foo:bar is somewhat more attractive than <http://www.foo.com/names/bar>. Packages aren't URLs, but they are unique identifiers (and since package names are themselves Lisp symbols, and symbols can be any string, one could use a URL as a package name if one wished: |http://www.foo.com/names/bar| is a valid Lisp package name, albeit an ugly one.
Posted by Bob Uhl on May 27, 2007 at 07:26 AM CEST #
Posted by Kiriai on May 27, 2007 at 08:12 AM CEST #
I don't see why URIs would be thought to be ugly though. They are well understood, and with namespaces become very readable. I usually write foaf:knows, rather than "http://xmlns.com/foaf/0.1/knows" . I used the full URIs without namespaces in the examples to emphasize the point.
URIs have the advantage of being well standardized, widely understood, language independent, and have been very successful in creating the largest information space know to man: the web we know today.
Posted by Henry Story on May 27, 2007 at 08:13 AM CEST #
Posted by Henry Story on May 29, 2007 at 04:34 PM CEST #
Posted by 24.85.147.243 on May 31, 2007 at 06:53 AM CEST #
Posted by riffraff on June 20, 2007 at 07:26 PM CEST #
I wrote a more detailed critique of this that I would like mentioned in the comments. The critique is at http://paddy3118.blogspot.com/2008/05/duck-typing-done-right-is-wrong.html entitled "Duck Typing Done Right Is Wrong!"
- Paddy.
Posted by Paddy3118 on May 24, 2008 at 08:36 AM CEST #
But what dynamic languages do with duck typing (very succesfully and in many large systems) 'other' languages use interfaces.
Just as a programmer who violates the contract inherent in duck typing by passing in an object that quacks like a duck but isn't a duck, a programmer can also violate the interface contract by 'implementing' the interface incorrectly.
Posted by Michael Foord on May 25, 2008 at 02:17 PM CEST #
> Just as a programmer who violates the contract inherent in duck typing by passing in an object that quacks like a duck but isn't a duck, a programmer can also violate the interface contract by 'implementing' the interface incorrectly.
With interfaces the responsibility is on the interface implementer to follow the contract of the interface. So if something breaks it should be quite clear who was wrong: the interface designer for being unclear in his specification, or the implementor with his broken class.
Now if the methods added in ducktyping had a global namespace, then the problem would be much reduced, because the global namespace would make it very clear what 'quack' was meant. Was it
- info.animals.duck.quack
- gov.us.nasa.bomb.quack ( Quick Attack )
and that would much reduce the danger of calling the wrong method on an object.
This is exactly what the semantic web gives us. It allows us to name things with Universal Resource Identifiers, thereby both making it easy to access information about the thing named, and clearly distinguish the things named.
Posted by Henry Story on May 25, 2008 at 07:35 PM CEST #
Hi Henry,
I can see how the semantic web could help us pin down any foreign IP we bring to a project, Knowing we have gov.us.nasa.bomb.quack/revision/6.3.1 might help us to easily convey any problems we have with external IP back to the vendor, and aid us in keeping track of what constitutes our system, but this has very little to do with Duck Typing, or its scalability. You have to know what that external IP _does_, if you use Duck typing or not.
P.S what happens in the semantic web if companies get bought-out/amalgamated? Often products are re-named, or subsumed into larger packages.
- Paddy.
Posted by Paddy3118 on May 26, 2008 at 10:56 AM CEST #