David Coldrick's Weblog

     
 
New Version of H2 Database Released
From the website,

The main features of H2 are:
  • Very fast, free for everybody, source code is included
  • Written [in] Java; can be compiled with GCJ (Linux)
  • Embedded, Server and Cluster modes
  • JDBC and (partial) ODBC API; Web Client application
Very quick, especially for embedded use. I believe it's still doing table-level locking, though: see the roadmap and FAQ




powered by performancing firefox

@ 08:06 AM EST    Permalink [ Comments [19] ]
 
 
 
 
Trackback URL: http://blogs.sun.com/coldrick/entry/new_version_of_h2_database
Comments:

Java DB, Sun's distribution of Apache / Derby, can also be run in the embedded mode.

As of JDK 6, it is being shipped with the JDK for developer use!

It has high-grade transaction capabilities and very good performance and security features.

More work is being done on performance, security and more as we speak.

Java DB 10.2.2.0 has recently been released, and you can find out more about Java DB here!

Posted by M. Mortazavi on January 06, 2007 at 11:23 AM EST #

You're quick, Masood! :-) Be interesting to see how the performance improvements go, given Thomas' claims that H2 is significantly faster. Size of download is dramatically different, as well. Good to have diversity!

Posted by David Coldrick on January 06, 2007 at 04:17 PM EST #

Derby Version 10.2.1.6 was used for the test. Derby is clearly the slowest embedded database in this test. This seems to be a structural problem, because all operations are really slow. It will not be easy for the developers of Derby to improve the performance to a reasonable level. http://www.h2database.com/html/frame.html?performance.html&main

Posted by 84.202.209.63 on January 06, 2007 at 09:07 PM EST #

no comment.

Posted by 201.22.134.233 on January 08, 2007 at 09:33 PM EST #

Apologies to anonymous at 84.202.209.63: your comment got spam-bucketed. Should be fixed now. Regards, David

Posted by David Coldrick on January 09, 2007 at 05:24 AM EST #

Performance may be an important factor for a database system but this is all relative to the application requirements and needs - If some RDBMS performs faster than some other in a particular context, it does not mean that the slowest one is a slow database sytem per-se - There are other cases where Derby performs better than H2 or HSQLDB - For instance, if your server or embedded application has concurrent connections accessing the database system, Derby will very likely have less contention accessing rows as it supports row-level locking (by default). One can always demonstrates that a database system is slower than the other depending on the type of performance tests (i.e. benchmarks) being defined and run - If you run TPC-B with Derby, it will perform a lot better than H2 for instance - Does that make Derby a faster database then? Again it all depends on the application and context into which the database is being used.

Durability is an important aspect of a database system - I will even say that it is fundamental - Derby is fully-ACID compliant whereas H2 is not - H2 does *not* guarantee that all committed transactions will survive a power failure or an application crash. I don't think that loosing committed transactions is an viable option when it comes to a database system.

The downloadable size is not really a big problem these days - You can even compress derby.jar (~2MB) down to ~600k with Java 5+ Pack200 and the Java plug-in or Java Web Start support Pack200 compressed Archives / JARs downloaded on the client.

Finally, an Open Source product such as Apache Derby has a strong community of many developers, including some from IBM and SUN, contributing source code changes (i.e. fixes / enhancements) back to the product.

It is good to have diversity and it is also good to know what you're dealing with when it comes to database systems ;-)

Posted by Francois Orsini on January 09, 2007 at 05:41 AM EST #

Of course performance is relative. There are situations where a tractor is faster than a car. It would be good to have a test case where Derby is faster, tell me if you have one.

Currently H2 doesn't support row locks. But I don't think that most embedded applications need it. Anyway, the trend is towards multi version concurrency control (MVCC), and that's the next big thing that will be implemented in H2 (however this will take some time).

Durability: The default isolation level of Derby is read committed, right? As far as I understand it, for fully-ACID compliant the isolation level should be serialized (see Isolation in Wikipedia). I'm not sure if supporting 'full ACID' compliance by default would make sense. I have implemented and run a durability test with various databases and the file system (a simple power-off test using two computers), and things don't look good. The problem is, even if the database tries to flush to disk for each commit, the operating system and/or hard disk does not always do that. For details see ACID. If you really want to enforce flushing to disk, you need to wait at least 0.1 seconds per transaction, and even Derby doesn't do that by default. That means, even Derby does *not* guarantee that all committed transactions will survive a power failure or an application crash. If you have other results using common hardware / default settings and this test, or if you find a way that is faster, please tell me! Hopefully the next generation hard drives (with integrated flash memory) will be better... But if you need 'no single point of failure' then you anyway need clustering / mirroring. H2 support clustering, Derby does not.

Download size: I think David was talking about size of product download (16 MB for Derby versus 3 MB for H2) not about the jar file size (2.2 MB for Derby versus 1 MB for H2). By the way, the H2 jar also contains the Console web application and web server, and other tools. And debugging info (line numbers) is switched on in H2, and switched off in Derby. But I agree the jar file size is not the most important factor.

Community: Yes, Derby has more developers (4, according to Ohloh, not sure if this is correct), but that doesn't necessarily mean a better product. Development of Derby started in 1996 or earlier, while H2 started in 2004 (it is now one year online). H2 is a very young product, and currently doesn't have professional support from a bigger company. This will be available in the future when there is demand. You could also say Derby has a liability (big, old, slow code base). Anyway, H2 also has quite a big community, given how young it is. But of course Derby has the advantage the Apache name ('branding'), but this doesn't mean it's better (there are many failed Apache projects).

But only time can tell which database is more successful.

Posted by Thomas Mueller on January 09, 2007 at 07:58 PM EST #

Ok trying again... the complete comment is here.

Posted by Thomas Mueller on January 09, 2007 at 11:54 PM EST #

How annoying! Thomas, I've unbucketed all your comments, and requested that you be removed from whatever !@#$$% blacklist our spam filter has you on. Apologies, David

Posted by David Coldrick on January 10, 2007 at 05:49 AM EST #

> Of course performance is relative. There are situations where a tractor is faster than a car.
> It would be good to have a test case where Derby is faster, tell me if you have one.
>


I gave a specific database server stress test scenario case when I mentioned TPC-B (this is just 1 particular case) - A tractor does not compete in some F1 race and that is where your analogy is flawed because Derby actually performs more than decently in that context - Like I said, embedded applications is one particular facet of today's applications but that does not represent all of the applications out there - Everything is relevant to the particular tests one is defining and running (yours in that case) but that does not represent how a database performs in some other contexts (embedded or not).
> Currently H2 doesn't support row locks. But I don't think that most embedded applications need it.
> Anyway, the trend is towards multi version concurrency control (MVCC), and that's the next big
> thing that will be implemented in H2 (however this will take some time).
>


Good to hear this - Row-lock in Derby has been implemented since the first incarnation of Cloudscape in 96' - MVCC is good but not for applications which are doing intense updates and writes - The reason is pretty obvious versus a lock concurrency scheme and that is why "some" database(s) are supporting both approaches.

> Durability: The default isolation level of Derby is read committed, right?
> As far as I understand it, for fully-ACID compliant the isolation level should
> be serialized (see Isolation in Wikipedia). I'm not sure if supporting 'full ACID'
> compliance by default would make sense. I have implemented and run a durability test
> with various databases and the file system (a simple power-off test using two computers),
> and things don't look good. The problem is, even if the database tries to flush to disk
> for each commit, the operating system and/or hard disk does not always do that.
> For details see ACID. If you really want to enforce flushing to disk, you need to wait
> at least 0.1 seconds per transaction, and even Derby doesn't do that by default.
> That means, even Derby does *not* guarantee that all committed transactions will survive
> a power failure or an application crash. If you have other results using common
> hardware / default settings and this test, or if you find a way that is faster,
> please tell me! Hopefully the next generation hard drives (with integrated flash memory)
> will be better... But if you need 'no single point of failure' then you anyway need
> clustering / mirroring. H2 support clustering, Derby does not.
>


The golden rule is that you should not rely on the file system for write operations unless you have some means to force-flush & check I/O completions - that is why Unix Raw Devices were made available almost 20 years ago so that one could bypass the FS and use Async I/O's at the kernel level to retrieve status on a particular I/O (completion) and made sure it made it to disk(s) - there are technics such as write through-case where you don't rely on I/O write operations to be handled at all by the FS buffer (as it is bypassed) but rather expect a write I/O to be written to disks everytime you request it - it is a binary operation, either it works or not and you'd get an I/O error if an I/O has not complete to disk. Relying on the FS and some UPS hardware device is ok _but_ that is NOT what you usually find in every embedded devices or client desktop - You can't expect everyone to have a UPS to alleviate some issues due to a database system loosing committed rtansaction and therefore not handling ACID durability as it should and it is expected. I've worked at many database companies and dealing with critical-level type of applications and if I had told the customers that could loose committed transactions due to an application or system crash, then I don't think these database companies would have been as successful as they have been. Some things such as not loosing committed transactions have to be handled at the database level and that is what durability is all about. Today, Derby will not loose transactions that have been committed whether you have some UPS or not.

> Download size: I think David was talking about size of product download (16 MB for
> Derby versus 3 MB for H2) not about the jar file size (2.2 MB for Derby versus 1 MB
> for H2). By the way, the H2 jar also contains the Console web application and web server,
> and other tools. And debugging info (line numbers) is switched on in H2, and switched off
> in Derby. But I agree the jar file size is not the most important factor.
>


Download size is irrelevant in today's world except for web applications and in this case, one does NOT have to download the whole product - for embedded applications, it is only 1 JAR file basically and whether it is H2 or Derby, the size is not really an issue (as I mentioned in some earlier thread)

> Community: Yes, Derby has more developers (4, according to Ohloh, not sure if
> this is correct), but that doesn't necessarily mean a better product.
> Development of Derby started in 1996 or earlier, while H2 started in 2004
> (it is now one year online). H2 is a very young product, and currently doesn't
> have professional support from a bigger company. This will be available in the
> future when there is demand. You could also say Derby has a liability (big, old,
> slow code base). Anyway, H2 also has quite a big community, given how young it is.
> But of course Derby has the advantage the Apache name ('branding'), but this doesn't
> mean it's better (there are many failed Apache projects).
>


Derby has more than 30+ contributors - what you saw in Ohloh are the top committers for 2007 (new year eh) and this is why it is 4 - last year 23 committers checked-in code, so I'll let you do the stats as far as how many contributors there could be - not every contributor is a committer to the project - that's how Apache works and a lot of other open source projects. Derby has developers from Sun (Java DB), IBM (Cloudscape) as well as other independent contributors or companies. Derby is _not_ big - The footprint is not big (2MB) for the engine compared to some other databases out there and is more than adequate for a lot of today's embedded applications. Apache is not just about branding - it has and continue to be a set of communities for many successful projects with defined rules and guidelines. At the end of the day, it is all about Open Source projects and quite a few of them have made lots of noise in the past many years and still continue to do so.

> But only time can tell which database is more successful.
>



Again, I was not bashing H2 if this is the way you felt - I clearly mentioned that one has to know what type of database(s) one is dealing with before claiming it is faster for *all* use case scenarios out there.

Posted by Francois Orsini on January 10, 2007 at 10:59 AM EST #

So now Francois gets spam-bucketed: we're nothing if not unbiased :-( Maybe it's the topic . . .

Posted by David Coldrick on January 10, 2007 at 01:30 PM EST #

> I mentioned TPC-B

OK. The performance test (open source by the way) used by H2 currently uses algorithms similar to TPC-A and TPC-C, I will add one that is similar to TPC-B when I have time. I'm quite sure that Derby is not that much faster using this benchmark. I'm still waiting for a benchmark where Derby is nearly as fast as H2.

> MVCC is good but not for applications which are doing intense updates and writes

You mean concurrent connections updating the same rows again and again. I don't think this is such a big problem. All the newer engines are based on MVCC (MySQL Falcon), and MVCC is added to the older ones. But supporting both locking and MVCC does make sense of course.

> Derby will not loose transactions that have been committed

Well, unfortunately, this is not what I have found. Derby did loose transactions sometimes. You can test it yourself, the source code and documentation to do that is included in H2. See http://www.h2database.com/html/advanced.html#acid and 'Your Hard Drive Lies to You' (http://hardware.slashdot.org/article.pl?sid=05/05/13/0529252&tid=198&tid=128) Also, using FileDescriptor.sync() or FileChannel.force() after each file operation, only around 30 file operations per second can be made. And Derby does not call those functions for each commit.

> Derby is _not_ big - The footprint is not big (2MB) for the engine compared to some other databases out there

Well, compared to other Java databases, Derby is by far the biggest, right? I'm not sure if there is really so much more functionality than in H2...

> and whether it is H2 or Derby, the size is not really an issue (as I mentioned in some earlier thread)

It is a problem for some people. For those where even the size of H2 is a problem, I usually recommend PointBase Micro (50 KB jar file size).

Posted by Thomas Mueller on January 10, 2007 at 10:36 PM EST #

As someone without a vested interest in either, I can safely say that h2 is a far superior product. It's easier to embed and is far more pleasant to work with in unit tests. Derby is an odd product in that it was born to compete with 'real' databases, but then evolved into an embedded more lightweight solution, without fully shedding its initial aspirations. As a result, I've always found working with it to be uncomfortable and awkward (documentation never quite matches reality, installation isn't as smooth as it is for any other pure java app, etc etc).

Posted by Hani Suleiman on January 11, 2007 at 11:28 AM EST #

Hani,

I don't know when you used Derby for the last time but the latest documentation set is very complete with a lot of samples, as well as individual guides for the various contexts you need information about. In fact, Derby documentation has always been quite complete. Is it perfect? no, so are the documentation sets of many products or open source projects out there - but it is certainly way beyond the average. http://db.apache.org/derby/manuals/index.html#latest The community is very active and is always keen in helping users and/or developers.

I'm not sure how you can (especially without any vested interest as you mentioned) state that H2 is a far superior product as obviously Derby is a more mature database system (1st version in 97'), and that is expected as H2 is fairly recent. As far as embedding them, both products require a JDBC driver class and some URL to connect to the engine - both have their core database engine represented as 1 JAR file.
br> Derby installation is documented under: http://db.apache.org/derby/docs/dev/getstart/getstart-single.html#cgsinstallingderby and Java DB (based on Apache Derby) is bundled as part of Sun JDK 6.

Maybe what you're asking for in some kind of installer, but once the distribution archive is extracted on disk, you just need to set 2 environment variables and a 3rd optional one.

Regards,

Posted by Francois Orsini on January 24, 2007 at 07:54 AM EST #

> Also, using FileDescriptor.sync() or FileChannel.force()
> after each file operation, only around 30 file operations per second can be made.
Only when single user. Derby will happily support many more transactions per second when multiple threads are executing transactions. This report shows up to 500/sec when disk caching is disabled.
Derby Performance
> And Derby does not call those functions for each commit.
Yes it does, with optimizations that a single sync() can satisfy the commit of multiple transactions.

Posted by Dan Debrunner on January 26, 2007 at 04:18 AM EST #

Hi Dan,

Thanks for your comment! Using the write cache does not reduce the probability of recovery for H2. In the presentation, there is a scary statement: 'The write cache reduces probability of successful recovery after power failure' [for Derby]. Why is that? It this is true, then things don't look good for Derby. Because even when calling FileDescriptor.sync()/FileChannel.force() (this is called 'fsync'), data is not always flushed to disk. See also: 'Your Hard Drive Lies to You' http://hardware.slashdot.org/article.pl?sid=05/05/13/0529252&tid=198&tid=128

Or you can re-run the power off test that included in H2, see also: http://www.h2database.com/html/advanced.html#acid (Durability).

It this sounds bad, it gets worse. Today I run a test using FileDescriptor.sync(), FileChannel.force(), and Derby. When writing one byte and calling FileDescriptor.sync() or FileChannel.force(), I get about 50 operations per second. When using Derby, I get about 500 operations per second. That can only mean that Derby is actually not calling one of those functions. It looks like Derby uses RandomAccessFile(.., "rwd") instead (this is using O_SYNC flag when opening the file). This is actually not the same. fsync also writes through the hardware write cache. O_SYNC does not. With a hard drive running at 7200 rpms, you basically have an upper limit of about 7200/60=120 synchronous physical writes to the same position on the drive. This is also the result when using fsync. But instead of talking theory, I suggest you actually run the test. The source code is included in H2, as the source code of the H2 benchmarks is.

Unfortunately, the source code of the benchmark that was used in presentation you refer is closed source. I asked, and I was told it can not be released (now). May I ask you why you refer to a closed source benchmark? Is there no open source benchmark where Derby is faster than any competitor? After 10 years after Cloudscape / Derby was started?

Thomas

Posted by Thomas Müller on January 27, 2007 at 10:40 AM EST #

It's a very nice example. I'm playing with it now, but I couldn't make a @ManyToOne relationship work. I thing I need something like "Set<Person> persons" here. Is this possible with Groovy 1.1, since it doesn't support Generics yet? Greetings, Ivan

Posted by Ivan Dolvich on April 19, 2007 at 06:26 AM EST #

Just had to add some flame to the fire. I discovered h2 at the end of 2007 and agree that

1) setup is a breeze compared to derby, assuming you don't use the derby that is now packaged alongside glassfish with the jdk. For anyone who doubts this, its a simple download and try it yourself. The real kicker for h2 is the javascript sql client that fires up, derby 'ain't got nothin on it'.

2) in performance testing derby was a dog for me. Spent a week benching 6 different databases, and after reading all the forums and applying tweaks, etc., derby just wasn't up to the same speed as mysql (!), mckoi, hsqldb, or h2 (postgres was slowest for me). Durability was not a concern for me, I was wanting db setup and teardown to be as quick as possible for some functional testing using selenium.

Whoever speculated that derby has yet to shed its cloudbase heritage probably said it right.

Posted by Daniel Juliano on June 30, 2008 at 02:41 PM EST #

Hallo David, im going to be a little of the topic on this blog, i found it the only way i could have a message sent to you.
Im so looking for David coldrick, in switserland, im you name popped out first on my search engine, im from uganda...once studied im muyenga high school.
PLease do write back if you are familier with any thing like this.
Regards.
Edger Serungogi

Posted by Edger Serungogi on October 18, 2008 at 03:50 PM EST #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed
 
« October 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today

[RSS Newsfeed]

Valid XHTML or CSS?

[This is a Roller site]
Theme by Rowell Sotto.
 
© David Coldrick's Weblog