I see nice regexes on the java forums from time to time, i am going to add them all here for use!
Showing posts with label regex. Show all posts
Showing posts with label regex. Show all posts
Tuesday, April 15, 2008
Wednesday, December 05, 2007
regex to split a document into words
Say you wanted to split a document into words, but works like are'nt shouldnt be split on the ' and numbers like 10,004,333 should remain intact, but other punctuation should be removed from the resulting word array.
A good way to do this is using Scanner's findWithinHorizon and a regular expression. This way you dont need to read the entire document into memory before processing it.
A good way to do this is using Scanner's findWithinHorizon and a regular expression. This way you dont need to read the entire document into memory before processing it.
Tuesday, October 16, 2007
extract links from a webpage
this is useful if you decide to make a web crawler, and dont want to bother with an html parser. You have to read the body of the page into a string, and then use this regex to extract all the links.
supported link types: <img src=...<a href=...
an absolute url is in the form of "http://something.com/blah"
a relative url is in the form of "/something/path.blee"
now you can figure out what to do with these...
supported link types: <img src=...<a href=...
an absolute url is in the form of "http://something.com/blah"
a relative url is in the form of "/something/path.blee"
now you can figure out what to do with these...
Monday, October 08, 2007
Remove HTML tags regex
when using an HTML parser is too much work, you may want to use a small regex to remove all the html tags
this is java:
this is java:
Wednesday, February 07, 2007
Javascript: matching with regex
There seem to be 3 ways to use regular expressions in javascript:
1.
var val = document.getElementById("some_id").value;
val.replace(/^\s+|\s+$/g, ""); //trim using regex
2.
var alpha = /^[A-Za-z]+$/ ;
var val = document.getElementById("some_id").value;
if(alpha.test(val)){
//val matches alpha
}
3.
1.
var val = document.getElementById("some_id").value;
val.replace(/^\s+|\s+$/g, ""); //trim using regex
2.
var alpha = /^[A-Za-z]+$/ ;
var val = document.getElementById("some_id").value;
if(alpha.test(val)){
//val matches alpha
}
3.
var val = document.getElementById("some_id").value;if anyone knows a better way, feel free to share.
var alpha ='((?:[a-z][a-z]+))'; // Word 1
var p = new RegExp(alpha,["i"]);
var m = p.exec(val);
if (m.length>0)
{
var word=m[1];
//word is the capture group 1
}
Subscribe to:
Posts (Atom)