Julius Stroffek's Weblog

« Previous day (Dec 2, 2007) | Main | Next day (Dec 3, 2007) »

http://blogs.sun.com/julo/date/20071203 Monday December 03, 2007

Database World 2007, Zlin, Czech Republic

The Czech database conference called 'Database World' was held on 29th November at Tomas Bata University in Zlin. The audience was formed mostly by people from Faculty of Management and Economics of the local university mixed together with delegates of companies interested in databases from the user point of view. The lectures were therefore not so technically oriented and were trying to explain only the ideas behind the presented features. There were also couple of companies presenting features of their products using databases.

Motto for this year was "Data are important and information is more important" which was meant to bring up the issues related to processing of unstructured data. Some of the presenters did not take care about the motto and were presenting unrelated features.

 

I'll try to bring out couple of talks:


New features in Oracle Database 11g
David Krch, Oracle Czech

It was an overview presentation of some new features of a new Oracle Database 11g including:

  • Automatic capture and replay of workloads - you can capture a workload on a production database and replay it on your test/development environment to test various changes in configuration, etc.
  • Fault diagnosability and "self repairs". It is expected that this feature will decrease the amount of work the administrators have to do in case of the failure. In some cases (like corrupted data blocks) planner will try to generate another plan that will be expected not expose the error. Sounds a bit like a black magic. ;-)
  • Total recall - keeping unchangeable history of certain records. This is something that PostgreSQL has for couple of years called PITR = Point In Time Recovery. AFAIK it is not a heavily used feature.
  • New storage for LOB objects called Secure Files - it was mentioned that it performs better than any file system in OS. It is also possible to deduplicate the data so that if the value being inserted to the column already exist in different row only one copy of the data is stored. Which is probably implemented by hashing the documents and a look up to the index composed from the hash values followed by the comparison of whole documents. Since there is a need for index lookup for every inserted entry, I do not think that the insert operation with deduplication could be faster than file system. However, read operations might be much faster then file system, since it is possible to better maintain the column fragmentation due the fact that the whole columns is usually written at once and if it is replaced the whole content is changed. In this case you can allocate a new non-fragmented block for the column. I do not know how it is implemented in Oracle.
  • New optimizer features - correlated column statistics, expression statistics.
  • OLAP cube as a materialized view.
  • Automatic creation of partitions for interval partitioning. I do not know Oracle quite well but I thought that this feature was already there.
  • Reference partitioning - allow a partitioning based on the values in the referenced tables. This is the feature I met with with the need a couple of weeks before and I was also thinking about such an implementation for PostgreSQL.
  • Oracle Application Express - a web based spreadsheet-like application creation framework.

My impression is that Oracle brings a bunch of new features and tries to proves that it is a leader in the database market.

 

Why that new version is slower?
Martin Schyna, ABRA Software

The presentation tried to describe how to proceed in cases where you will perform and upgrade or install a new version of the software and you think that it is slower.

 

High Availability with new Informix Dynamic Server
Jan Musil, IBM Czech Republic

The presentation was dealing with a high availability setup of a new version of Informix Dynamic Server. There were couple of slides explaining different replication possibilities HDR - High Availability Data Replication, SDS - Shared Disk Secondary and RSS - Remote Standalone Secondary. Jan set up the environment and was killing up the servers one by one to show that the others could change their role and act as primary databases. However, no automatic fail over was shown and manual setup changes on the machines were made to change the role of secondary servers. I would be interested how these thinks work automatically.

 

Data are important and information is more important
Pavel Císař

Pavel tried to think about the definitions of data and information and to more exactly infer what of those two is more important and why. The result was that there is no difference between data and information and thus data are as same as important as information. He finished his talk with a couple of suggestions on how to make money with information/data processing.


Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.