Bistro!
Alexis Moussine-Pouchkine's Weblog
public enum Topic { Java, GlassFish, Tools, Sun, InFrenchInZeText, SDPY }

20050421 jeudi avril 21, 2005

Trying to KISS
So I'm sure a lot of you have had to parse HTML using all sorts of String manipulations and most likely in a WebApp. This seems like a very typical problem with no easy solution given the nature of hand-written HTML (never a well formed-document to start with). You end up doing things like:

while ( (stringIndex = result.toString().toLowerCase().indexOf(myTags[j])) != -1 ) {
   ...
}


which is really buggy by nature.

If you consider a use-case where you'd need to trim some HTML content, one of the biggest issues is that you don't want to forget closing tags so the final document doesn't inherit the style from a previously unclosed style tag for instance (a blog entry in a page comes to mind). At the same time you don't want visible HTML tags (such as <br>) to show after the trimming has occured. You also don't want to be indefinitely looking for closing tags that do not exist.

So while you could be using this or that HTML parser, you could also be using the one that comes with Swing and that's been part of every JVM since 2000 and write something much safer using a single Swing component :

// Constructor
public SilentSwingHTMLTrim() {
   jEditorPane = new javax.swing.JEditorPane();
   jEditorPane.setContentType("text/html");
   for (int i = 0 ; i<entries.length; i++ ) {
     // trim after 200 characters
     String truncatedEntry = doTrim(entries[i], 200);
     System.out.println(truncatedEntry);
   }
}
  
// 5 lines to do the job
private String doTrim(String entry, int trimSize) {
   jEditorPane.setText(entry);
   // Select 'trimSize' characters from the rendered HTML content
   jEditorPane.select(0, trimSize);
   String result = jEditorPane.getSelectedText();
   if ( result.length() == trimSize)
     result += "...";
   // return the selected text
   return result;
}


Of course you never setVisible(true) the editor pane and all you get is plain text (no link, no formating, etc...). This is a very basic use of the parser. More advanced use is described here. Another issue is that the support HTML level by Swing's HTML parser is only 3.2.

So this is no silver bullet technique, just trying to keep things simple stupid.


Maybe this would work pretty nicely for spell-checking HTML source code...

( avr. 21 2005, 10:03:37 AM CEST ) Permalink

Comments:

Post a Comment:

Comments are closed for this entry.

GlassFish Podcast
Get GlassFish V3
Support GlassFish Enterprise

Today's Page Hits: 2147




bea conference glassfish ips java javaee javaee6 javafx javaone javazone jug mysql netbeans openesb openoffice opensource paris performance pkg podcast presentation sdpy spring sun swing techdays tips updatecenter v3 webservices
Links