Prasanna
Sunday Mar 16, 2008
Shell script: Matching the first occurrence of a string or a pattern in a file and exit
I was looking to solve the above problem, as lazy as ever googled for a solution, but couldn't find what I was looking for (though there were while loop solutions). If you need a single line shell script which matches the first occurrence of a string (or a pattern) from the input and exit, the following code could be useful.Input file: input.txt
Start
Line: 1
Line: 2
Line: 3
Line: 4
Line: 5
End
To get output 1 (second string in second line Line: 1) from the above file and exit, use the below grep and awk combination, although this may not be the efficient solution.
cat input.txt | grep "Line:" | awk '{ print $2; exit }'
Posted at 01:57PM Mar 16, 2008 by prasanna in Useful Code | Comments[4]
Thursday Sep 20, 2007
OpenOffice Parser: Extracting text from OpenOffice documents
With OpenDocument formats getting widespread acceptance, a lack of simple text extractor from OpenOffice documents is the main motivation for me in developing this one. The below code will extract text from Open Office documents (like odt, odp, etc). I have used the JDOM XML APIs for easier processing of OpenOffice XMLs. Hope this will make life a bit easier./*
* OpenOfficeParser.java
*
* Created on September 12, 2007, 4:24 PM
*
* To change this template, choose Tools | Template Manager
* and open the template in the editor.
*/
/**
*
* @author prasanna
*/
import java.io.InputStream;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.Text;
import org.jdom.input.SAXBuilder;
import java.util.zip.ZipFile;
import java.util.zip.ZipEntry;
import java.util.Enumeration;
import java.util.Iterator;
import java.util.List;
public class OpenOfficeParser {
StringBuffer TextBuffer;
/** Creates a new instance of OpenOfficeParser */
public OpenOfficeParser() {}
//Process text elements recursively
public void processElement(Object o) {
if (o instanceof Element) {
Element e = (Element) o;
String elementName = e.getQualifiedName();
if (elementName.startsWith("text")) {
if (elementName.equals("text:tab")) // add tab for text:tab
TextBuffer.append("\t");
else if (elementName.equals("text:s")) // add space for text:s
TextBuffer.append(" ");
else {
List children = e.getContent();
Iterator iterator = children.iterator();
while (iterator.hasNext()) {
Object child = iterator.next();
//If Child is a Text Node, then append the text
if (child instanceof Text) {
Text t = (Text) child;
TextBuffer.append(t.getValue());
}
else
processElement(child); // Recursively process the child element
}
}
if (elementName.equals("text:p"))
TextBuffer.append("\n");
}
else {
List non_text_list = e.getContent();
Iterator it = non_text_list.iterator();
while (it.hasNext()) {
Object non_text_child = it.next();
processElement(non_text_child);
}
}
}
}
public String getText(String fileName) throws Exception {
TextBuffer = new StringBuffer();
//Unzip the openOffice Document
ZipFile zipFile = new ZipFile(fileName);
Enumeration entries = zipFile.entries();
ZipEntry entry;
while(entries.hasMoreElements()) {
entry = (ZipEntry) entries.nextElement();
if (entry.getName().equals("content.xml")) {
TextBuffer = new StringBuffer();
SAXBuilder sax = new SAXBuilder();
Document doc = sax.build(zipFile.getInputStream(entry));
Element rootElement = doc.getRootElement();
processElement(rootElement);
break;
}
}
System.out.println("The text extracted from the OpenOffice document = " + TextBuffer.toString());
return TextBuffer.toString();
}
public static void main(String args[]) throws Exception
{
new OpenOfficeParser().getText("OpenDocumentFile.odt");
}
}
Posted at 02:01AM Sep 20, 2007 by prasanna in Useful Code | Comments[7]
Sunday Sep 09, 2007
Division of two numbers without using division operator
I was trying an efficient solution for this problem for sometime and came up with this.The logic is simple, just left shift (multiply by 2) the divisor till it reaches dividend/2, then continue this routine with the the difference between dividend and divisor and divisor till the point when dividend is less than divisor or the difference is zero. Its similar to the way binary search is used to find an element in a sorted list. Confused! go through the below recursive procedure in python.
#Division of two numbers without using division operator
dividend = int(raw_input("Enter the dividend:"))
divisor = int(raw_input("Enter the divisor:"))
tempdivisor = divisor
remainder = 0
def division (dividend, divisor):
global remainder
quotient = 1
if divisor == dividend:
remainder = 0
return 1
elif dividend < divisor:
remainder = dividend
return 0
while divisor <= dividend:
#Here divisor < dividend, therefore left shift (multiply by 2) divisor and quotient
divisor = divisor << 1
quotient = quotient << 1
#We have reached the point where divisor > dividend, therefore divide divisor and quotient by two
divisor = divisor >> 1
quotient = quotient >> 1
#Call division recursively for the difference to get the exact quotient
quotient = quotient + division(dividend - divisor, tempdivisor)
return quotient
print "%s / %s: quotient = %s" % (dividend, tempdivisor, division(dividend, divisor))
print "%s / %s: remainder = %s" % (dividend, tempdivisor, remainder)
Posted at 07:00PM Sep 09, 2007 by prasanna in Useful Code | Comments[4]
Monday Aug 27, 2007
Java Regular Expressions: Validating HTTP GET URIs, fetching GET Paramaters and values
I was wondering if there is a way to check the validity of HTTP GET URI using Java regular expressions and if its valid, it should fetch all the GET Parameters and their values. Fortunately after some time hacking around Java REs, I discovered an easy solution to accomplish the same, though I am not sure if its efficient.class URIMatcher {
public static void main (String args[]) {
String query = "Param1=1&Param2=23&Param3=3335&Param4=hello&Param5=&Param6=world";
Pattern ValidURI = Pattern.compile("(?:([a-zA-Z0-9]+)=([^=&]*)&)*([a-zA-Z0-9]+)=([^=&]*)");
Pattern getValues = Pattern.compile("([a-zA-Z0-9]+)=([^=&]*)&*");
Matcher ValidURIMatch = ValidURI.matcher(query);
Matcher getParams = getValues.matcher(query);
if (ValidURIMatch.matches()) {
while(getParams.find())
System.out.println("Name = " + getParams.group(1) + " Value = " + getParams.group(2));
} else {
System.out.println("URI is not valid");
}
}
}
The first pattern accepts a valid URI (URIs like Param1=&, Param1=hello&Param2, etc are invalid and are filtered out). From the valid URI, the second pattern fetches all GET Parameters and their values, for the above example it will be
Name = Param1 Value = 1
Name = Param2 Value = 23
Name = Param3 Value = 3335
Name = Param4 Value = hello
Name = Param5 Value =
Name = Param6 Value = world
Regular Expressions are really powerful indeed!
Posted at 10:53PM Aug 27, 2007 by prasanna in Useful Code | Comments[1]

