Say you wanted to split a document into words, but works like are'nt shouldnt be split on the ' and numbers like 10,004,333 should remain intact, but other punctuation should be removed from the resulting word array.
A good way to do this is using Scanner's findWithinHorizon and a regular expression. This way you dont need to read the entire document into memory before processing it.
Which language would this be? Java?..Can I do the same in C#?
ReplyDelete