WWW::PDAScraper

WWW::PDAScraper is a Perl class for scraping PDA-friendly content from websites.
Download

WWW::PDAScraper Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • John Horner
  • Publisher web site:
  • http://search.cpan.org/~codyp/WWW-PDAScraper-0.1/PDAScraper.pm

WWW::PDAScraper Tags


WWW::PDAScraper Description

WWW::PDAScraper is a Perl class for scraping PDA-friendly content from websites. WWW::PDAScraper is a Perl class for scraping PDA-friendly content from websites.Synopsis use WWW::PDAScraper; my $scraper = WWW::PDAScraper->new qw ( NewScientist Yahoo::Entertainment ); $scraper->scrape();or use WWW::PDAScraper; my $scraper = WWW::PDAScraper->new; $scraper->scrape qw( NewScientist Yahoo::Entertainment );or perl -MWWW::PDAScraper -e "scrape qw( NewScientist Yahoo::Entertainment )"Having written various kludgey scripts to download PDA-friendly content from various websites, I decided to try and write a generalised solution which would* parse out the section of a news page which contains the links we want* munge those links into the URL for the print-friendly version, if possible* download those pages and make an index page for themThe moving of the pages to your PDA is not part of the scope of the module: the open-source browser and "distiller", Plucker, from http://plkr.org/ is recommended. Just get it to read the index.html file with a depth of 1 from disk, using a URL like file:///path/to/index.htmlThe Sub-modulesWWW::PDAScraper uses a set of rules for scraping a particular website from a second module, i.e. WWW::PDAScraper::Yahoo::Entertainment::TV contains the rules for scraping the Yahoo TV News website: package WWW::PDAScraper::Yahoo::Entertainment::TV; # WWW::PDAScraper.pm rules for scraping the # Yahoo TV website sub config { return { name => 'Yahoo TV', start_from => 'http://news.yahoo.com/i/763', chunk_spec => , url_regex => }; } 1;A more or less random selection of modules is included, as well as a full set for Yahoo, to demonstrate a logical set of modules in categories.Creating a new sub-module ought to be relatively simple, see the template provided, WWW::PDAScraper::Template.pm - you need name, start_from, then either chunk_spec or url_spec, then optionally a url_regex for transformation into the print-friendly URL.Then either move your new module to the same location as the other ones on your system, or make sure they're available to your script with a line like use lib '/path/to/local/modules/PDAScraper/' Requirements: · Perl


WWW::PDAScraper Related Software