Jump to content

how to fetch a page with a parser [live demo]


dilbertone

Recommended Posts

good day, hello dear community!

 

 

i am currently ironing out a little parser-script. I have some bits

- but now i need to have some improved spider-logic. See the target-url http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50

 

This page has got more than 6000 results!

 

Well  how do i get all the results?

 

I tried out several things - but i dont helped. I allways got bad results.

See i have good csv-data - but unfortunatley no spider logic...  I need some bits to get there! How to get there!?

 

I use the module LWP::simple and i need to have some improved arguments that i can use in order to get all  the 6150 records

 



#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
use HTML::TableExtract;
use Text::CSV;

my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50';
$html =~ tr/r//d;     # strip the carriage returns
$html =~ s/ / /g; # expand the spaces

my $te = new HTML::TableExtract();
$te->parse($html);

my @cols = qw(
    rownum
    number
    name
    phone
    type
    website
);

my @fields = qw(
    rownum
    number
    name
    street
    postal
    town
    phone
    fax
    type
    website
);

my $csv = Text::CSV->new({ binary => 1 });

foreach my $ts ($te->table_states) {
    foreach my $row ($ts->rows) {

trim leading/trailing whitespace from base fields
        s/^s+//, s/\s+$// for @$row;

load the fields into the hash using a "hash slice"
        my %h;
        @h{@cols} = @$row;

derive some fields from base fields, again using a hash slice
        @h{qw/name street postal town/} = split /n+/, $h{name};
        @h{qw/phone fax/} = split /n+/, $h{phone};

trim leading/trailing whitespace from derived fields
        s/^s+//, s/\s+$// for @h{qw/name street postal town/};

        $csv->combine(@h{@fields});
        print $csv->string, "\n";
    }
} 

Well - with this i have a good csv-output:- but unfortunatley no spider logic.

 

 

How to add the spider-logic here... !?

 

well i need some help

 

Love to hear from you

 

 

 

 

 

 

good evening - here i am back again!!

 

i run into troubles...  i guess that i have made some mistakes while applying some code in the above mentioned script....

 

#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
use HTML::TableExtract;
use Text::CSV;

my $i_first = "0"; 
my $i_last = "6100"; 
my $i_interval = "50"; 

for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { 
     my $pageurl = "http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"; 
     #process pageurl 
}

my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50';
$html =~ tr/r//d;     # strip the carriage returns
$html =~ s/ / /g; # expand the spaces

my $te = new HTML::TableExtract();
$te->parse($html);

my @cols = qw(
    rownum
    number
    name
    phone
    type
    website
);

my @fields = qw(
    rownum
    number
    name
    street
    postal
    town
    phone
    fax
    type
    website
);

my $csv = Text::CSV->new({ binary => 1 });

foreach my $ts ($te->table_states) {
    foreach my $row ($ts->rows) {

trim leading/trailing whitespace from base fields
        s/^s+//, s/\s+$// for @$row;

load the fields into the hash using a "hash slice"
        my %h;
        @h{@cols} = @$row;

derive some fields from base fields, again using a hash slice
        @h{qw/name street postal town/} = split /n+/, $h{name};
        @h{qw/phone fax/} = split /n+/, $h{phone};

trim leading/trailing whitespace from derived fields
        s/^s+//, s/\s+$// for @h{qw/name street postal town/};

        $csv->combine(@h{@fields});
        print $csv->string, "\n";
    }
} 

 

 

there have been some issues - i have made a mistake

i guess that the error is here:

 

for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { 
     my $pageurl = "http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"; 
     #process pageurl 
}

my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50';
$html =~ tr/r//d;     # strip the carriage returns
$html =~ s/ / /g; # expand the spaces

 

i have written down some kind of double - code. I need to leave out one part ... this one here; What do you think about this!?

 

my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50';
$html =~ tr/r//d;     # strip the carriage returns
$html =~ s/ / /g; # expand the spaces

 

I get these kind of errors - it looks very very nasty!

 

martin@suse-linux:~> cd perl
martin@suse-linux:~/perl> perl bavaria_all_.pl
Possible unintended interpolation of %h in string at bavaria_all_.pl line 52.
Possible unintended interpolation of %h in string at bavaria_all_.pl line 52.
Global symbol "%h" requires explicit package name at bavaria_all_.pl line 52.
Global symbol "%h" requires explicit package name at bavaria_all_.pl line 52.
syntax error at bavaria_all_.pl line 59, near "/,"
Global symbol "%h" requires explicit package name at bavaria_all_.pl line 59.
Global symbol "%h" requires explicit package name at bavaria_all_.pl line 60.
Global symbol "%h" requires explicit package name at bavaria_all_.pl line 60.
Substitution replacement not terminated at bavaria_all_.pl line 63.
martin@suse-linux:~/perl> 

 

 

what do you think!?

 

i look forward to hear from you!

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.