Note to self
Code, cooking, catman
Friday, January 26, 2007
Java: find nested balanced tags
Problem:
I needed to parse out some divs from a html page, and java's regex cant do it. atleast when you have arbitrary number of nested balanced elements.
Solution:
private String removeDivContents(String id, String clazz,
String text){
String div = "<div class="\">
+clazz.toLowerCase()+"\" id=\""+id.toLowerCase()+"\">";
String textl = text.toLowerCase();
int index = textl.indexOf(div);
if(index == -1){
return text;
}
int loc = index + div.length();
int end = findBalancedClosingTag(loc, textl,
"<div","</div>", 1);
return text.substring(0,index) + text.substring(end);
}
private int findBalancedClosingTag(int start,
String textl, String startTag, String endTag, int unClosedStarts){
int index = textl.indexOf(endTag,start);
if(index == -1){
return start;
}
Pattern p = Pattern.compile(startTag);
Matcher m = p.matcher(textl.substring(start, index));
unClosedStarts --; // we are in this function so we closed ONE start tag
while(m.find()){
unClosedStarts ++;
}
if(unClosedStarts == 0){
return index + endTag.length();
} else {
return findBalancedClosingTag(index + endTag.length(), textl, startTag, endTag, unClosedStarts);
}
}
Use at your own risk.
No comments:
Post a Comment
Newer Post
Older Post
Home
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment