Julius Stroffek's Weblog

http://blogs.sun.com/julo/date/20080214 Thursday February 14, 2008

Prague PostgreSQL Developers' Day 2008

Yesterday, Prague was hosting its first PostgreSQL developer conference. It was organized by the people around the Czech database portal - http://www.dbsvet.cz with support from companies CZ.Nic and Sun Microsystems. The conference was hosted by Czech Technical University, Faculty of Electrical Engineering. Peter Eisentraut, the PostgreSQL core team member was the guest of the conference.

At about 60 people interested in PostgreSQL from the both user and developer point of view attended the conference. A bunch of them were students from Faculty of Electrical Engineering or Faculty of Mathematics and Physics at Charles University.

The conference was followed by the unformal discussions at the pub in the evening as usual. We went to the pub where Peter was staying called "U Medvidku". Couple of guys here using PostGIS stressed the value of geographical data functionality that is offered by this extension and presented the opinion that this should become a part of the core distribution. However, this has some open issues, especially the licensing of PostGIS is GNU GPL and not BSD License as PostgreSQL. The idea was that it should not be much work to re-implement this from scratch as a contrib module. So there are no objections, start with a proposal and go ahead!

We all agreed that it was a great event and we all believe that we will meet again at the same conference next year.

 

I'll try to summarize the talks presented at the conference...

 

What's going on (PostgreSQL 8.3)
Peter Eisentraut

Peter presented an overview of new features introduced in new PostgreSQL version 8.3 released just a week before the conference.

 

Stored Procedures and SQL/PSM Language
Pavel Stehule

Pavel presented an overview of approaches to stored procedures/functions in different databases and ANSI SQL with a highlight to status in PostgreSQL. Pavel has done a lot work on support of different procedural languages in PostgreSQL and he is interested in SQL/PSM language support for PostgreSQL. He listed a couple of the features he would like to work on.

He was arguing against the standard three tier application architecture and introduced the idea of open database where all the functionality on the data is implemented on the database in stored procedures. The user then can access the data in the same way as the application does and do not need to rely on the proprietary communication protocol to the application server.

Personally, I liked the idea of the open database, however application server gives a bunch of other possibilities for the developers. You have a separated workload on the database and application server and scale each of these separately, you can also make your application portable across different database systems from simple embedded database like SQLite or JavaDB through MySQL to enterprise level databases like PostgreSQL or even proprietary databases like Oracle.

I think that it would be necessary to have a good standard of procedural language implemented among the various databases and have a bunch of libraries available to provide the functionality that is already available in different languages. Yes, of course I know you can implement stored procedures in whatever you like for PostgreSQL but then you are binding your application to PostgreSQL only and can not use an embedded database. The most likely, it's to late to achieve just the two tier application world.

 

PostgreSQL and XML
Peter Eisentraut

Peter presented a new XML functionality he was working on in a new version. He presented xml data type, storage, features for xml creation, XPath, XLabel and related features and possibilities to create a functional indexes on related functions. The current status of XML support contains just the basic functionality and has a lot of space for other features and improvements based and the feedback obtained from the users. So, feel free to provide your experience and suggestions.

 

Full-Text Search in PostgreSQL
Julius Stroffek

I gave a talk about the full text search feature in PostgreSQL. I presented related data types (tsvector, tsquery), operator, functions, ranking, database objects offered for full-text search (parsers, templates, dictionaries and configurations) and the way of how to provide your own full-text search configuration for the different language or tune the existing ones. Since there is no support for Czech language now in PostgreSQL, people were mostly interested in having the Czech dictionary for full-text search. I showed a way how to create a Czech dictionary from the ispell dictionary easily, however this approach has couple of issues especially with finding a root form for irregular words.

 

Data Integrity Checks in PostgreSQL
Robert Mach

Rober was working on analysis and algorithms for integrity checks for PostgreSQL as his thesis at Faculty of Electrical Engineering. He was talking about the functions a the way of how to deal with invalid blocks and/or checksums and possible repairs. There was a short discussion about the integration to PostgreSQL afterwards.

Robert just finished his final exams the day before the conference. Congratulations, Robert! ;-)

 

PostgreSQL on Solaris
Zdenek Kotala

Zdenek was talking about the integration of PostgreSQL to Solaris, problems that were caused during the integration of 8.1 into Solaris 10 U2. He presented Solaris features like ZFS, DTrace, RBAC that users can take an advantage of when using PostgreSQL on Solaris.

 

Query Processing in PostgreSQL
Tomas Kovarik

Tomas gave the overall overview of PostgreSQL architecture and the way how the queries are handled internally. He went through the parser, rewriter, trigger system, planner, join order optimization, handling of outer joins, executor, etc.

 

Registration Systems for .cz Domains and PostgreSQL
Jaromir Talir

Jaromir is working on the domain registration software for Czech domain registry. He was explaining the architecture of the project and the way they are using PostgreSQL as a database backend. He explained couple of problems they were dealing with the database. They started to develop the project two years ago and their system is running in production for about half a year now. They had to move the data from Informix database to PostgreSQL during the migration.

http://blogs.sun.com/julo/date/20071203 Monday December 03, 2007

Database World 2007, Zlin, Czech Republic

The Czech database conference called 'Database World' was held on 29th November at Tomas Bata University in Zlin. The audience was formed mostly by people from Faculty of Management and Economics of the local university mixed together with delegates of companies interested in databases from the user point of view. The lectures were therefore not so technically oriented and were trying to explain only the ideas behind the presented features. There were also couple of companies presenting features of their products using databases.

Motto for this year was "Data are important and information is more important" which was meant to bring up the issues related to processing of unstructured data. Some of the presenters did not take care about the motto and were presenting unrelated features.

 

I'll try to bring out couple of talks:


New features in Oracle Database 11g
David Krch, Oracle Czech

It was an overview presentation of some new features of a new Oracle Database 11g including:

  • Automatic capture and replay of workloads - you can capture a workload on a production database and replay it on your test/development environment to test various changes in configuration, etc.
  • Fault diagnosability and "self repairs". It is expected that this feature will decrease the amount of work the administrators have to do in case of the failure. In some cases (like corrupted data blocks) planner will try to generate another plan that will be expected not expose the error. Sounds a bit like a black magic. ;-)
  • Total recall - keeping unchangeable history of certain records. This is something that PostgreSQL has for couple of years called PITR = Point In Time Recovery. AFAIK it is not a heavily used feature.
  • New storage for LOB objects called Secure Files - it was mentioned that it performs better than any file system in OS. It is also possible to deduplicate the data so that if the value being inserted to the column already exist in different row only one copy of the data is stored. Which is probably implemented by hashing the documents and a look up to the index composed from the hash values followed by the comparison of whole documents. Since there is a need for index lookup for every inserted entry, I do not think that the insert operation with deduplication could be faster than file system. However, read operations might be much faster then file system, since it is possible to better maintain the column fragmentation due the fact that the whole columns is usually written at once and if it is replaced the whole content is changed. In this case you can allocate a new non-fragmented block for the column. I do not know how it is implemented in Oracle.
  • New optimizer features - correlated column statistics, expression statistics.
  • OLAP cube as a materialized view.
  • Automatic creation of partitions for interval partitioning. I do not know Oracle quite well but I thought that this feature was already there.
  • Reference partitioning - allow a partitioning based on the values in the referenced tables. This is the feature I met with with the need a couple of weeks before and I was also thinking about such an implementation for PostgreSQL.
  • Oracle Application Express - a web based spreadsheet-like application creation framework.

My impression is that Oracle brings a bunch of new features and tries to proves that it is a leader in the database market.

 

Why that new version is slower?
Martin Schyna, ABRA Software

The presentation tried to describe how to proceed in cases where you will perform and upgrade or install a new version of the software and you think that it is slower.

 

High Availability with new Informix Dynamic Server
Jan Musil, IBM Czech Republic

The presentation was dealing with a high availability setup of a new version of Informix Dynamic Server. There were couple of slides explaining different replication possibilities HDR - High Availability Data Replication, SDS - Shared Disk Secondary and RSS - Remote Standalone Secondary. Jan set up the environment and was killing up the servers one by one to show that the others could change their role and act as primary databases. However, no automatic fail over was shown and manual setup changes on the machines were made to change the role of secondary servers. I would be interested how these thinks work automatically.

 

Data are important and information is more important
Pavel Císař

Pavel tried to think about the definitions of data and information and to more exactly infer what of those two is more important and why. The result was that there is no difference between data and information and thus data are as same as important as information. He finished his talk with a couple of suggestions on how to make money with information/data processing.


Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.