dilbertone Posted February 26, 2011 Share Posted February 26, 2011 good evening dear community! Howdy, at the moment i am debugging some lines of code... purpose: i want to process multiple webpages, kind of like a web spider/crawler might. I have some bits - but now i need to have some improved spider-logic. See the target-url http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50 This page has got more than 6000 results! Well how do i get all the results? I use the module LWP::simple and i need to have some improved arguments that i can use in order to get all the 6150 records Attempt: Here are the first 5 page URLs: http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=0 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=50 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=100 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=150 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=200 We can see that the "s" attribute in the URL starts at 0 for page 1, then increases by 50 for each page there after. We can use this information to create a loop: #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $i_first = "0"; my $i_last = "6100"; my $i_interval = "50"; for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { my $html = get("http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"); $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { #trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; #load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; #derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; #trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } } i tested the code and get the following results: .- see below - the error message shown in the command line... btw: here the lines 57 and 58: #trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; what do you think? Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Link to comment https://forums.phpfreaks.com/topic/228914-how-to-fetch-a-page-with-a-parser-live-demo/ Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.