That is - the stereotypical blog website has pages listing article teasers in reverse chronological order. The index pages will have URL's like
http://example.com/path/to/index?page=##
. The scripts request all the ?page=## pages, saving the data for each teaser in a YAML file.
The second script reads that YAML file, and retrieves each article. It saves article data in another YAML file, and also detects all images downloading them.
The documentation shows how to configure the scripts for Examiner.com. However, they should be useful for other website technology as well. I have a Blogger blog that I want to convert to something else, perhaps my Wordpress blog and it should be easy to accomplish by adjusting the selectors in the configuration file.
See here: https://github.com/robogeek/articlescraper
NOTE FOR EXAMINER AUTHORS: We own the copyright to our work posted on Examiner. Examiner never asserted ownership over our articles. That means we are fully within our rights to download our articles and resurrect them elsewhere.
Every so often a service will suddenly go belly-up leaving the users of that service scrambling to save their files. At least one photo-sharing site did this, and in the process vaporized a bunch of pictures taken by people. In some cases those were the only copies of those pictures. Think of the proud father whose only picture of his child disappeared in a puff of smoke just because a business suddenly vaporized. That's what is happening to us on Examiner.com right now.
You have until July 10 to retrieve your articles, or they'll be gone forever.