Florian Reuter's Weblog
Florian Reuter's Weblog

Freitag Dezember 16, 2005
The community and me / The OpenBib project.
Bruce D'Arcus contacted me for help in the OpenBib project. In order to help Bruce to find some programmers who will help him coding in the project I will post here some first steps. The goal for me is not to do all the programming by myself --- in case I would do this everybody with an idea would contact me to do the work :-) --- but to teach people some basic knowledge, so that they could do the programming themselves.
In this column I would like to show how the citation data can be read and stored in a DOM instance.
So lets hope you mastered the CWS creation... We will go in medias res:
Lets be more precise about what we want to do. Consider for example the OpenDocument fragment which includes citation data:
<bib:citation>
<bib:citation-source>
<bib:biblio-ref bib:linked="token" bib:citation-style="string"?>
<bib:detail bib:begin="string"
bib:end="string"?
bib:units="pages|chapters|lines|paragraphs|figures|sections|formulas"/>?
<bib:caption bin:position="before|after">paragraph-content*</bib:caption>?
</bib:citation-source>
<bib:citation-body>paragraph-content*</bib:citation-body>?
</bib:citation>
On loading we --- first --- want to store the information within a DOM tree. I know that there is a lot more to do; but lets just start with this task.
You can get a blueprint for storing data in a DOM instance in the file http://www.go-oo.org/lxr/source/xml/xmloff/source/xforms/XFormsInstanceContext.cxx#111. Obviously there already exists a DomBuilderContext (thanks dvo). OK; everything is there --- the question is, where to create the context.
Your debugger will tell you that http://www.go-oo.org/lxr/source/xml/xmloff/source/text/txtparai.cxx#1322 is the right place to create the child context.
OK; then we need to insert a case statement like in http://www.go-oo.org/lxr/source/xml/xmloff/source/text/txtparai.cxx#1322:
case XML_TOK_BIB_CITATION:
DomBuilderContext* pInstance = new DomBuilderContext( GetImport(), nPrefix, rLocalName );
// the resulting tree will be stored in: pInstance->getTree();
pContext = pInstance;
break;
This won't compile, since we have not yet defined what XML_TOK_TEXT_CITATION is. Looking e.g. for XML_TOK_TEXT_SPAN (http://www.go-oo.org/lxr/ident?i=XML_TOK_TEXT_SPAN) we can figure out what to do: We will add XML_TOK_CITATION to the end of http://www.go-oo.org/lxr/ident?i=XMLTextPElemTokens (but before XML_TOK_TEXT_P_ELEM_END) and then we will add
{ XML_NAMESPACE_BIB, XML_CITATION, XML_TOK_TEXT_CITATION },
to http://www.go-oo.org/lxr/source/xml/xmloff/source/text/txtimp.cxx#245. Having done so we need to define XML_CITATION and XML_NAMESPACE_BIB. I'm sure you will figure out what to do...
Now we have the complete citation information stored within a DOM instance. Isn't that great AND easy. Stay tuned for the next steps.
Hope you had as much fun as I had.
Florian
( Dez 16 2005, 09:25:50 AM PST )
Permalink

Donnerstag Oktober 06, 2005
loading a document The code for loading a document in OpenOffice.org Writer can be found in sal_Bool SfxObjectShell::DoLoad( SfxMedium *pMed ) in the file sfx2/source/doc/objstor.cxx.
( Okt 06 2005, 07:27:26 AM PDT )
Permalink

Dienstag Oktober 04, 2005
OpenDocument and meta data
I'm new to RDF. There has been a lot of discussions in the OpenDocument TC about meta data in OpenDocument. I'd like to highlight my understandings from the discussion:
- Stefano Mazzochi wrote (forwarded by Bruce D'Arcus)
"NOTE: I'm not suggesting that, all I'm saying is: choose your battles. A syntax battle is not worth fighting. A model battle is not worth fighting either. The unique identification of symbols is the only one worth fighting for."
I love this statement and I totally agree with it.
-
Restricted RDF/RDFX/XMP/Contrained RDF
Restricted RDF/RDFX/XMP/Contrained RDF looks like a syntax/model battle for me. My understanding is, as long as we provide/specify a way to generated/derive triples from the OpenDocument all will be fine. With interest I read the GRDDL approach for a way to specify how to derive triples from OpenDocuments.
-
XMP
I'd like to have a clarification about the following statement about XMP(posted by Duane Nickull):
"XMP does not "wrap around" RDF. XMP is expressed in a small subset of RDF. All valid XMP can be expressed in RDF. There is a lot of RDF that is not valid XMP."
Does RDF in that case stand for RDF/triples or RDF/XML representation? Can every set of RDF/triples be expressed in XMP?
-
unique identification of symbols/Making Statements about parts of the Content
Very interesting. Current OpenDocument meta data allows to make statements about the current document, e.g. ("", dc:title, "Sample Document") or ("", dc:author, "Mr. X") using <office:meta><dc:title>Sample Document</dc:title><dc:author>Mr. X</dc:author></office:meta> and an appropriate mapping.
Other interesting subjects may be parts of the document resp. content. This would require a URI naming convention for parts of the documents, i.e. there must be a way to uniquely reference a part of the document/content.
-
Generic RDF store
In order to store RDF statements about various subjects (i.e. generic RDF triples) in OpenDocument you need to have a place and a way of how to store generic RDF statements. E.g. in the bibliographic project run by Bruce D'Arcus, a RDF store would be very helpful in order to store bibliographic information.
Here the syntax/model question seems to be a great deal. More direct, the question of a generic RDF/XML section, a restricted RDF/XML section, a XMP section or an application (e.g. bibliographic) specification XML slang plus an appropriate mapping arises.
I'd like to get feedback,
Florian
( Okt 04 2005, 06:31:22 AM PDT )
Permalink

Sonntag Oktober 02, 2005
Flat OpenDocument
As promised in my talk at
the OpenOffice.org conference I will make the ident.xsl XSL(T) script available.
To install the Flat OpenDocument filter open an empty OpenOffice.org/StarOffice Writer document; go to Tools/XSLT Filter Settings... and click on New.... Fill out the requested information as shown in the figures below:
Finally you can test the Flat OpenDocument filter by selecting Test XSL(T) Filter... and Current document, which will show you the Flat OpenDocument representation of the current document.
( Okt 02 2005, 06:28:58 AM PDT )
Permalink

Freitag März 11, 2005
Real Life Bugs! For everybody interested in *real life bugs* consider OpenOffice bug #i35653#. The story is as follows:
The attached RTF bug-doc contains the sub-string BEGINŽEND, but Writer only shows BEGIN and discards the suffix ŽEND.
The reason is a *real life bug*:
The Czech character Ž has the unicode number 381, which is corretly returned by the GetNextChar() function. In the source there is a line
nSlash = (sal_Char)GetNextChar();
which casts the Unicode character 381 to a sal_Char, which is a 8-Bit character. Unfortunately 381 cast to 8-bit is 125, and 125 is the ASCII code for "}". Here comes the real life characteristic of the bug. "}" is a special RTF token which causes the RTF reader to stop.
There are so many Unicode characters. Why for gods sake must the Czech Ž cast to 8-bit must be the special "}" RTF token?
And the moral is: *Beware of casts!*
( Mrz 11 2005, 10:14:21 AM PST )
Permalink
Page Hits heute: 3