Web Scraping is Hard Because Sites Are Not Valid

Scraping the web is hard.

Matt Cutts says so:
http://www.mattcutts.com/blog/the-web-is-a-fuzz-test-patch-your-browser-and-y...

I've found this to be true.

 A couple of implications. 

It's hard to build a web crawler that can suck information out of pages reliably.

Validation doesn't matter b/c google doesn't penalize for it.  And if Google doesn't care, you shouldn't either.

About

[Insert Witty Saying Here]