Thursday October 05, 2006
[Disclaimer: This is my summary of Jim's keynote from the last JCM. It is not a transcript; Any errors are probably mine! Thanks to Jerome Bernard for video taping Jim's presentation! The presentations and videos from the JCM can be found here]
Jim is thinking about things that cause a quantum jump in complexity. You are OK, and then something changes in a big way.
The basic jumps in complexity occur at these points (I add abbreviations here to refer to them later on)
Sequential (SEQ)– life is good, life is easy.
Multi-threaded (MT) – takes retooling and a competent programmer to think about MT
Multi-Process (MP) – For everyone other than kernel developers, this came before MT
Multiple Machines (MM) on the same network. Not the same as multi-process, but some people think it is
Multiple Untrusted Machines (MMU) – Essentially the web
All of the above cause discontinuities in the programming model. As you move through each stage, you lose something:
seq -.-> MT – you lose ordering (multiple things can happen at once) This is hard – as we naturally think sequentially.
Move to multiple processes – lose single context (i.e. A shared context that we can rely on). Global state is used all the time in development (think anything static).
Multiple Processes to Multiple Machines – state gets lost. Global State of your “system” is a fiction. There is no consistent state in an interesting distributed system (Jim references Lamport's work on this). Distributed OS projects attempt to introduce global state – they have largely failed.
Move to untrusted machines. You lose trust. In the difficult position of not knowing who you can trust.
But you also gain some things as you move through the discontinuities (otherwise why would you do it?)
Seq-> MT – you gain parallelism
MT – MP – you gain isolation (gives you safety)
MP to MM gives you independent failure (parts of your system can survive if things fail)
MM to MMU – gives you scale (web scale, Internet scale). Use someone else's resources (or allow someone to use ours).
The Platform
The platform – allows us to do the work. The model the developer sees:
SEQ – A batch OS is fine.
MT -language extensions needed for correctness (ensure system wont re-order things underneath you)
MP – need communication mechanism between the processes.
MM – Not clear what the platform is? Attempts include RPC (invented at PARC), CORBA, Jini, XML/SOAP. Not clear we have figured this out. We know what it is not. Grids trying to be the platform.
Most grids are an attempt to do batch on a large scale (scientific). Scheduling jobs. Use individual OS in any way you wish – not a platform, but a way of aggregating platforms. How do you give a programming abstraction. Jini is a good attempt – but only the beginning. What is outside the grid, vs inside. Outside – untrusted, inside is trusted – mutual trust
Two discontinuities – inside/outside – we are conflating them – trying to solve all the problems at once won't work. We need two solutions.
Jini 1.0 – built for the 1st discontinuities – MM but assumes full trust. Built on mobile code.
Ad-hoc organization – changes over time. When a service enters into the system, you could trust it.
Failure handled – but not a failure of trust.
Types – the way which you identity things
Jini 2.0 – Multiple Untrusted Machines. Adds security. This is hard to do with mobile code. Adds a lot of complexity (e.g. proxy verification). Perhaps too hard.
Configuration – deployment control. Deployment errors cause a lot of failures in distributed systems. A lot of the complexity is making services reliable for all possible deployments. Maybe we could communicate the reliability needs in the language. Needs to be part of the platform.
Program vs. deploy – we are trained to keep them separate – but this probably makes life harder. Famous disclaimers “We will put management in later” (just like “we will put security in later”).
Services – the things we assume are always there (but sometime they might not be!) Example: persistence. A file system may not always be there in all environments. What would a good persistence service look like?
Containers – is a function converting a type to another type. Must be able to import things, give bindings, yields another container, Containers are type functions.
Virtual machines. JVM is an example. VM is the important thing – not necessarily the JVM. Universal binary allows us to move code and data. Jini is Java VM thing. Other Vms might be interesting (e.g. ones that fix classloader problems?) The abstraction is the important thing.
Feeling of Deja Vu for Jim. Back when all kinds of different O/S existed there were lots of debates “Why mine is better than yours”. What arose was UNIX and the “other” (i.e. There was a lot of convergence about what an O/S looks like). Model of the platform happened after lots of experimentation and discussion. Are we at the same stage as 30 years ago with O/S – a breakthrough may be close.