dilbertone Posted February 19, 2011 Share Posted February 19, 2011 hello dear all - hello all freaks of this great community, one question regarding a parser... note - it is a perl-parser, but believe me: i need some help with that. And i guess that here many many experts know the perl-bits... so well that this is no problem here.... Here we go! is there any chance to catch some seperators within the that seperate the table... The paser script runs allready nicely. Note - i want to store the data into a MySQL database. So it would be great to have some seperators - (commas, tabs or somewhat else - a tab seperated values or comma seperated values are handy formats to work with... here the data out of the following site: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20 lfd. Nr. Schul- nummer Schulname Straße PLZ Ort Telefon Fax Schulart Webseite 1 0401 Mädchenrealschule Marienburg, Abenberg, der Diözese Eichstätt Marienburg 1 91183 Abenberg 09178/509210 Realschulen mrs-marienburg.homepage.t-online.de 2 6581 Volksschule Abenberg (Grundschule) Güssübelstr. 2 91183 Abenberg 09178/215 09178/905060 Volksschulen home.t-online.de/home/vs-abenberg 6 3074 Private Berufsschule zur sonderpäd. Förderung, Förderschwerpunkt Lernen, Abensberg Regensburger Straße 60 93326 Abensberg 09443/709191 09443/709193 Berufsschulen zur sonderpädog. Förderung www.berufsschule-abensberg.de Well i need to have those lines divided into at least three columns - take the first record. name: Volksschule Abenberg (Grundschule) street: Güssübelstr. 2 postal-code and town: 91183 Abenberg fax and telephone: 09178/215 09178/905060 type of school: Volksschulen website: home.t-online.de/home/vs-abenberg Or even better - i have divided the postal-code and town into two seperate columns!? Question: is this possible? By the way: see the first record: (here i only show the names of the school) 1 0401 Mädchenrealschule Marienburg, Abenberg, 6 3074 Private Berufsschule zur sonderpäd. Förderung, Förderschwerpunkt Lernen, Abensberg Note, those have some commas inside the name; does this make it difficult to create a parser that creates csv-fomate? Any idea how to do this in Perl... If possible it would be just great!! many many thx for a hint regarding this little issue - besides this all is great and fascinating! dilbertone... Here the code: #!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use LWP::Simple; use Cwd; use POSIX qw(strftime); my $te = HTML::TableExtract->new; my $total_records = 0; my $suchbegriffe = "e"; my $treffer = 50; my $range = 0; my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q="; my $processdir = "processing"; my $counter = 50; my $displaydate = ""; my $percent = 0; &workDir(); chdir $processdir; &processURL(); print "\nPress <enter> to continue\n"; <>; $displaydate = strftime('%Y%m%d%H%M%S', localtime); open OUTFILE, ">webdata_for_$suchbegriffe\_$displaydate.txt"; &processData(); close OUTFILE; print "Finished processing $total_records records...\n"; print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$suchbegriffe\_$displaydate.txt\n"; unlink 'processing.html'; die "\n"; sub processURL() { print "\nProcessing $url_to_process$suchbegriffe&a=$treffer&s=$range\n"; getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'tempfile.html') or die 'Unable to get page'; while( <tempfile.html> ) { open( FH, "$_" ) or die; while( <FH> ) { if( $_ =~ /^.*?(Treffer <b>)(d+)( - )(d+)(</b> w+ w+ <b>)(d+).*/ ) { $total_records = $6; print "Total records to process is $total_records\n"; } } close FH; } unlink 'tempfile.html'; } sub processData() { while ( $range <= $total_records) { getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'processing.html') or die 'Unable to get page'; $te->parse_file('processing.html'); my ($table) = $te->tables; for my $row ( $table->rows ) { cleanup(@$row); print OUTFILE "@$row\n"; } $| = 1; print "Processed records $range to $counter"; print "\r"; $counter = $counter + 50; $range = $range + 50; $te = HTML::TableExtract->new; } } sub cleanup() { for ( @_ ) { s/s+/ /g; } } sub workDir() { # Use home directory to process data chdir or die "$!"; if ( ! -d $processdir ) { mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!"; } } Link to comment https://forums.phpfreaks.com/topic/228175-parser-runs-nicely-how-to-apply-some-separators-of-the-table/ Share on other sites More sharing options...
dilbertone Posted February 19, 2011 Author Share Posted February 19, 2011 hi all - i need some ideas here. it is so frustrating to do the job without a script. I can do it manually - but this takes about 7 hours ..... Link to comment https://forums.phpfreaks.com/topic/228175-parser-runs-nicely-how-to-apply-some-separators-of-the-table/#findComment-1176677 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.