Showing posts with label regex. Show all posts
Showing posts with label regex. Show all posts

Tuesday, April 15, 2008

common regex cookbook

I see nice regexes on the java forums from time to time, i am going to add them all here for use!

Wednesday, December 05, 2007

regex to split a document into words

Say you wanted to split a document into words, but works like are'nt shouldnt be split on the ' and numbers like 10,004,333 should remain intact, but other punctuation should be removed from the resulting word array.

A good way to do this is using Scanner's findWithinHorizon and a regular expression. This way you dont need to read the entire document into memory before processing it.

Tuesday, October 16, 2007

extract links from a webpage

this is useful if you decide to make a web crawler, and dont want to bother with an html parser. You have to read the body of the page into a string, and then use this regex to extract all the links.

supported link types: <img src=...<a href=...


an absolute url is in the form of "http://something.com/blah"
a relative url is in the form of "/something/path.blee"

now you can figure out what to do with these...

Monday, October 08, 2007

Remove HTML tags regex

when using an HTML parser is too much work, you may want to use a small regex to remove all the html tags

this is java:

Wednesday, February 07, 2007

Javascript: matching with regex

There seem to be 3 ways to use regular expressions in javascript:
1.
var val = document.getElementById("some_id").value;
val.replace(/^\s+|\s+$/g, ""); //trim using regex
2.
var alpha = /^[A-Za-z]+$/ ;
var val = document.getElementById("some_id").value;
if(alpha.test(val)){
//val matches alpha
}
3.
var val = document.getElementById("some_id").value;
var alpha ='((?:[a-z][a-z]+))'; // Word 1
var p = new RegExp(alpha,["i"]);
var m = p.exec(val);
if (m.length>0)
{
var word=m[1];
//word is the capture group 1
}
if anyone knows a better way, feel free to share.