Jul 13
http://www.mattcutts.com/blog/the-web-is-a-fuzz-test-patch-your-browser-and-y...I've found this to be true. A couple of implications. It's hard to build a web crawler that can suck information out of pages reliably.Validation doesn't matter b/c google doesn't penalize for it. And if Google doesn't care, you shouldn't either.
Web Scraping is Hard Because Sites Are Not Valid
Scraping the web is hard.Matt Cutts says so:http://www.mattcutts.com/blog/the-web-is-a-fuzz-test-patch-your-browser-and-y...I've found this to be true. A couple of implications. It's hard to build a web crawler that can suck information out of pages reliably.Validation doesn't matter b/c google doesn't penalize for it. And if Google doesn't care, you shouldn't either.
