Jan (Hanz) Jancura's Blog

pageicon Thursday Mar 08, 2007

Languages Embedding (in GLF)

During the last month I was working on languages embedding mainly, so I have two screenshots. Languages embedding is not easy and there are many interesting issues. But it is a long story.

The first picture shows HTML with JavaScript and CSS inside a PHP file:



You can see that blocks of PHP code can break one token of other language (JavaScript). It means that you have to preprocess PHP file and connect all blocks of inner code, if you want correctly parse language embedded inside PHP.

Current version of GLF (Generic Languages Framework) supports three types of embedding:

1) Token based embedding: One token of outside language can be reparsed by some other language's parser. For example CSS in HTML. Definitions of both languages can be merged together in this case (theoretically). So, you can define one common tokenizer for both languages, and one grammar.

HTML tokens <token type, token text>:
<tag,"<script>">
<text,"a {\n  color: #454545;\n}">
<end_tag,"<\script>">
HTML text token in this example is replaced by several CSS tokens. Tokens of outside language are not broken. So you can parse outside language without any constraints.

2) Preprocessor (templates) like embedding: Outside language contains blocks of some other language. Borders between languages are recognized by some preprocessor. (PHP, Velocity, Freemaker, EJS, RHTML, ...). EJS (Phobos) example:


There are two kinds of JavaScript in this example - server side and client side. Output of server side JavaScript can generate blocks of client side JavaScript code :-)

3) Last possibility is importing definition of one GLF based language to another one, or to some state of outer language tokenizer. So you can import definition of HTML language to PHP language definition.

So, GLF definition of PHP looks like:
IMPORT:html {
    mimeType:"text/html2";
    state: "DEFAULT";
}

IMPORT:php {
    mimeType:"text/x-php2";
    start:( "" );
    background_color:"#EEEEBB";
}
The first import loads definition of HTML (with all HTML embedded languages) to default state of PHP tokenizer. And second import defines "template" language (blocks between "" should be parsed by php2 language). "text/x-php2" language defines structure of plain PHP.

« March 2007
SunMonTueWedThuFriSat
    
1
2
3
4
5
6
7
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
       
Today

Feeds

Search this blog

Links

Weblog menu

Today's referrers

Today's Page Hits: 14