Florian Reuter's Weblog

Florian Reuter's Weblog

Florian Reuter's Weblog

Alle | Breakpoint | General

20051216 Freitag Dezember 16, 2005

The community and me / The OpenBib project.

Bruce D'Arcus contacted me for help in the OpenBib project. In order to help Bruce to find some programmers who will help him coding in the project I will post here some first steps. The goal for me is not to do all the programming by myself --- in case I would do this everybody with an idea would contact me to do the work :-) --- but to teach people some basic knowledge, so that they could do the programming themselves.

In this column I would like to show how the citation data can be read and stored in a DOM instance. So lets hope you mastered the CWS creation... We will go in medias res:

Lets be more precise about what we want to do. Consider for example the OpenDocument fragment which includes citation data:

<bib:citation>
    <bib:citation-source>
        <bib:biblio-ref bib:linked="token" bib:citation-style="string"?>
        <bib:detail bib:begin="string"
                         bib:end="string"?
                         bib:units="pages|chapters|lines|paragraphs|figures|sections|formulas"/>?
        <bib:caption bin:position="before|after">paragraph-content*</bib:caption>?
    </bib:citation-source>
    <bib:citation-body>paragraph-content*</bib:citation-body>?
</bib:citation>

On loading we --- first --- want to store the information within a DOM tree. I know that there is a lot more to do; but lets just start with this task.

You can get a blueprint for storing data in a DOM instance in the file http://www.go-oo.org/lxr/source/xml/xmloff/source/xforms/XFormsInstanceContext.cxx#111. Obviously there already exists a DomBuilderContext (thanks dvo). OK; everything is there --- the question is, where to create the context.

Your debugger will tell you that http://www.go-oo.org/lxr/source/xml/xmloff/source/text/txtparai.cxx#1322 is the right place to create the child context.

OK; then we need to insert a case statement like in http://www.go-oo.org/lxr/source/xml/xmloff/source/text/txtparai.cxx#1322:

       case XML_TOK_BIB_CITATION:
            DomBuilderContext* pInstance = new DomBuilderContext( GetImport(), nPrefix, rLocalName );
            // the resulting tree will be stored in: pInstance->getTree();
            pContext = pInstance;
            break;

This won't compile, since we have not yet defined what XML_TOK_TEXT_CITATION is. Looking e.g. for XML_TOK_TEXT_SPAN (http://www.go-oo.org/lxr/ident?i=XML_TOK_TEXT_SPAN) we can figure out what to do: We will add XML_TOK_CITATION to the end of http://www.go-oo.org/lxr/ident?i=XMLTextPElemTokens (but before XML_TOK_TEXT_P_ELEM_END) and then we will add

{ XML_NAMESPACE_BIB, XML_CITATION, XML_TOK_TEXT_CITATION },
to http://www.go-oo.org/lxr/source/xml/xmloff/source/text/txtimp.cxx#245. Having done so we need to define XML_CITATION and XML_NAMESPACE_BIB. I'm sure you will figure out what to do...

Now we have the complete citation information stored within a DOM instance. Isn't that great AND easy. Stay tuned for the next steps.

Hope you had as much fun as I had.

Florian

( Dez 16 2005, 09:25:50 AM PST ) Permalink

20051006 Donnerstag Oktober 06, 2005

loading a document The code for loading a document in OpenOffice.org Writer can be found in sal_Bool SfxObjectShell::DoLoad( SfxMedium *pMed ) in the file sfx2/source/doc/objstor.cxx. ( Okt 06 2005, 07:27:26 AM PDT ) Permalink Kommentare [0]

20051004 Dienstag Oktober 04, 2005

OpenDocument and meta data

I'm new to RDF. There has been a lot of discussions in the OpenDocument TC about meta data in OpenDocument. I'd like to highlight my understandings from the discussion:

  1. Stefano Mazzochi wrote (forwarded by Bruce D'Arcus)

    "NOTE: I'm not suggesting that, all I'm saying is: choose your battles. A syntax battle is not worth fighting. A model battle is not worth fighting either. The unique identification of symbols is the only one worth fighting for."

    I love this statement and I totally agree with it.

  2. Restricted RDF/RDFX/XMP/Contrained RDF

    Restricted RDF/RDFX/XMP/Contrained RDF looks like a syntax/model battle for me. My understanding is, as long as we provide/specify a way to generated/derive triples from the OpenDocument all will be fine. With interest I read the GRDDL approach for a way to specify how to derive triples from OpenDocuments.

  3. XMP

    I'd like to have a clarification about the following statement about XMP(posted by Duane Nickull):

    "XMP does not "wrap around" RDF. XMP is expressed in a small subset of RDF. All valid XMP can be expressed in RDF. There is a lot of RDF that is not valid XMP."

    Does RDF in that case stand for RDF/triples or RDF/XML representation? Can every set of RDF/triples be expressed in XMP?

  4. unique identification of symbols/Making Statements about parts of the Content

    Very interesting. Current OpenDocument meta data allows to make statements about the current document, e.g. ("", dc:title, "Sample Document") or ("", dc:author, "Mr. X") using <office:meta><dc:title>Sample Document</dc:title><dc:author>Mr. X</dc:author></office:meta> and an appropriate mapping. Other interesting subjects may be parts of the document resp. content. This would require a URI naming convention for parts of the documents, i.e. there must be a way to uniquely reference a part of the document/content.

  5. Generic RDF store

    In order to store RDF statements about various subjects (i.e. generic RDF triples) in OpenDocument you need to have a place and a way of how to store generic RDF statements. E.g. in the bibliographic project run by Bruce D'Arcus, a RDF store would be very helpful in order to store bibliographic information.

    Here the syntax/model question seems to be a great deal. More direct, the question of a generic RDF/XML section, a restricted RDF/XML section, a XMP section or an application (e.g. bibliographic) specification XML slang plus an appropriate mapping arises.

I'd like to get feedback,

Florian

( Okt 04 2005, 06:31:22 AM PDT ) Permalink Kommentare [2]

20051002 Sonntag Oktober 02, 2005

Flat OpenDocument

As promised in my talk at the OpenOffice.org conference I will make the ident.xsl XSL(T) script available.

To install the Flat OpenDocument filter open an empty OpenOffice.org/StarOffice Writer document; go to Tools/XSLT Filter Settings... and click on New.... Fill out the requested information as shown in the figures below:

General tab page

Transformation tab page

Finally you can test the Flat OpenDocument filter by selecting Test XSL(T) Filter... and Current document, which will show you the Flat OpenDocument representation of the current document.

( Okt 02 2005, 06:28:58 AM PDT ) Permalink Kommentare [7]

20050311 Freitag März 11, 2005

Real Life Bugs! For everybody interested in *real life bugs* consider OpenOffice bug #i35653#. The story is as follows: The attached RTF bug-doc contains the sub-string BEGINŽEND, but Writer only shows BEGIN and discards the suffix ŽEND. The reason is a *real life bug*: The Czech character Ž has the unicode number 381, which is corretly returned by the GetNextChar() function. In the source there is a line nSlash = (sal_Char)GetNextChar(); which casts the Unicode character 381 to a sal_Char, which is a 8-Bit character. Unfortunately 381 cast to 8-bit is 125, and 125 is the ASCII code for "}". Here comes the real life characteristic of the bug. "}" is a special RTF token which causes the RTF reader to stop. There are so many Unicode characters. Why for gods sake must the Czech Ž cast to 8-bit must be the special "}" RTF token? And the moral is: *Beware of casts!* ( Mrz 11 2005, 10:14:21 AM PST ) Permalink Kommentare [1]


Page Hits heute: 3