dilbertone Posted February 25, 2011 Share Posted February 25, 2011 good day, hello dear community! i am currently ironing out a little parser-script. I have some bits - but now i need to have some improved spider-logic. See the target-url http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50 This page has got more than 6000 results! Well how do i get all the results? I tried out several things - but i dont helped. I allways got bad results. See i have good csv-data - but unfortunatley no spider logic... I need some bits to get there! How to get there!? I use the module LWP::simple and i need to have some improved arguments that i can use in order to get all the 6150 records #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } Well - with this i have a good csv-output:- but unfortunatley no spider logic. How to add the spider-logic here... !? well i need some help Love to hear from you Link to comment https://forums.phpfreaks.com/topic/228822-how-to-fetch-a-page-with-a-parser-live-demo/ Share on other sites More sharing options...
dilbertone Posted February 25, 2011 Author Share Posted February 25, 2011 good evening - here i am back again!! i run into troubles... i guess that i have made some mistakes while applying some code in the above mentioned script.... #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my $i_first = "0"; my $i_last = "6100"; my $i_interval = "50"; for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { my $pageurl = "http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"; #process pageurl } my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } there have been some issues - i have made a mistake i guess that the error is here: for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { my $pageurl = "http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"; #process pageurl } my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces i have written down some kind of double - code. I need to leave out one part ... this one here; What do you think about this!? my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces I get these kind of errors - it looks very very nasty! martin@suse-linux:~> cd perl martin@suse-linux:~/perl> perl bavaria_all_.pl Possible unintended interpolation of %h in string at bavaria_all_.pl line 52. Possible unintended interpolation of %h in string at bavaria_all_.pl line 52. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 52. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 52. syntax error at bavaria_all_.pl line 59, near "/," Global symbol "%h" requires explicit package name at bavaria_all_.pl line 59. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 60. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 60. Substitution replacement not terminated at bavaria_all_.pl line 63. martin@suse-linux:~/perl> what do you think!? i look forward to hear from you! Link to comment https://forums.phpfreaks.com/topic/228822-how-to-fetch-a-page-with-a-parser-live-demo/#findComment-1179768 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.