this is useful if you decide to make a web crawler, and dont want to bother with an html parser. You have to read the body of the page into a string, and then use this regex to extract all the links.
supported link types: <img src=...<a href=...
an absolute url is in the form of "http://something.com/blah"
a relative url is in the form of "/something/path.blee"
now you can figure out what to do with these...
No comments:
Post a Comment