Sunday, July 13, 2008
Scraping the web is hard.

Matt Cutts says so:
http://www.mattcutts.com/blog/the-web-is-a-fuzz-test-patch-your-browser-and-your-web-server/

I've found this to be true.

 A couple of implications. 

It's hard to build a web crawler that can suck information out of pages reliably.

Validation doesn't matter b/c google doesn't penalize for it.  And if Google doesn't care, you shouldn't either.


Sunday, July 13, 2008 7:54:28 PM (Central Standard Time, UTC-06:00)  #    Comments [0]
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):