Regular Expression not working in scraper?
This is a very common problem for the beginners who try to write web crawler / spider / scraper. The content is fetched but regex is not working right. :(
But the problem is not with the regular expression. You just need to add the following two lines after you fetch content of a web page:
Now the regex should work if everything else is ok!
But the problem is not with the regular expression. You just need to add the following two lines after you fetch content of a web page:
content = content.replace("\n", "")
content = content.replace("\r", "")
Now the regex should work if everything else is ok!
Comments
Beside this Its good to use domxml to parse the content. It wont fail.