Jump to content

dilbertone

Members
  • Posts

    122
  • Joined

  • Last visited

Everything posted by dilbertone

  1. hello many thx for the answer ! i want to run this task with Curl! This is my approach for this task. well - i will come with some code-lines next days. If any body can give a helping hand i would be more than happy.
  2. Hello dear Community, hello dear Andy I want to parse a site that is called the foundation-finder: My Perl knowledge is pretty small! I have tried various tutorials (examples of Mecha - that i have found on the CPAN) not oll of them work - some of them are broken! Now i try t o get some real-world-task! the Foundation-Finder-task has several steps: Especially interesting for me as a PHP/Perl-beginner is this site in Switzerland: http://www.edi.admin.ch/esv/00475/00698/index.html?lang=de&webgrab_path=http://esv2000.edi.admin.ch/d/entry.asp?Id=3221 which has a dataset of 2700 foundations. All the data are free to use - with no limitations copyrights on it. i mused about a starting-point: ould i use a Perl-module from CPAN and do the job with Perl.I guess that Mechanize or LWP could do a great job. Or HTML::Parser well - i am just musing which is the best way to do the job. Guess that i am in front of a nice learning curve. This task will give me some nice PHP or Perl lessions. Or can we do this with Python either!? I guess so! So here i am! So here is a sample-page for the real-world-task a governmental site in Switzerland: more than 2'700 foundations in http://www.edi.admin.ch/esv/00475/00698/index.html?lang=de&webgrab_path=http://esv2000.edi.admin.ch/d/entry.asp?Id=3221 can i do this with mecha!? love to get a hint thx matz
  3. hello dear community, i am currently wroking on a approach to parse some sites that contain datas on Foundations in Switzerland with some details like goals, contact-E-Mail and the like,,, See http://www.foundationfinder.ch/ which has a dataset of 790 foundations. All the data are free to use - with no limitations copyrights on it. I have tried it with PHP Simple HTML DOM Parser - but , i have seen that it is difficult to get all necessary data -that is needed to get it up and running. Who is wanting to jump in and help in creating this scraper/parser. I love to hear from you. Please help me - to get up to speed with this approach? regards Dilbertone
  4. howdy myarro interesting thing - well i eagerly want to know how the full code will look like. I currently work on the same thing... Look forward to hear from you cheers db1 BTW - ever tried to do it with CURL ...
  5. good evening dear community! Howdy, at the moment i am debugging some lines of code... purpose: i want to process multiple webpages, kind of like a web spider/crawler might. I have some bits - but now i need to have some improved spider-logic. See the target-url http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50 This page has got more than 6000 results! Well how do i get all the results? I use the module LWP::simple and i need to have some improved arguments that i can use in order to get all the 6150 records Attempt: Here are the first 5 page URLs: http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=0 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=50 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=100 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=150 http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=200 We can see that the "s" attribute in the URL starts at 0 for page 1, then increases by 50 for each page there after. We can use this information to create a loop: #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $i_first = "0"; my $i_last = "6100"; my $i_interval = "50"; for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { my $html = get("http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"); $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { #trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; #load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; #derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; #trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } } i tested the code and get the following results: .- see below - the error message shown in the command line... btw: here the lines 57 and 58: #trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; what do you think? Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame Sta�e PLZ Ot",,,Telefo,Fax,Schulat,Webseite Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58. "lfd. N.",Schul-numme,Schul,"ame
  6. good evening - here i am back again!! i run into troubles... i guess that i have made some mistakes while applying some code in the above mentioned script.... #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my $i_first = "0"; my $i_last = "6100"; my $i_interval = "50"; for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { my $pageurl = "http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"; #process pageurl } my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } there have been some issues - i have made a mistake i guess that the error is here: for (my $i = $i_first; $i <= $i_last; $i += $i_interval) { my $pageurl = "http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i"; #process pageurl } my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces i have written down some kind of double - code. I need to leave out one part ... this one here; What do you think about this!? my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces I get these kind of errors - it looks very very nasty! martin@suse-linux:~> cd perl martin@suse-linux:~/perl> perl bavaria_all_.pl Possible unintended interpolation of %h in string at bavaria_all_.pl line 52. Possible unintended interpolation of %h in string at bavaria_all_.pl line 52. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 52. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 52. syntax error at bavaria_all_.pl line 59, near "/," Global symbol "%h" requires explicit package name at bavaria_all_.pl line 59. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 60. Global symbol "%h" requires explicit package name at bavaria_all_.pl line 60. Substitution replacement not terminated at bavaria_all_.pl line 63. martin@suse-linux:~/perl> what do you think!? i look forward to hear from you!
  7. good day, hello dear community! i am currently ironing out a little parser-script. I have some bits - but now i need to have some improved spider-logic. See the target-url http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50 This page has got more than 6000 results! Well how do i get all the results? I tried out several things - but i dont helped. I allways got bad results. See i have good csv-data - but unfortunatley no spider logic... I need some bits to get there! How to get there!? I use the module LWP::simple and i need to have some improved arguments that i can use in order to get all the 6150 records #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } Well - with this i have a good csv-output:- but unfortunatley no spider logic. How to add the spider-logic here... !? well i need some help Love to hear from you
  8. hi dear Abracadaver, many many thanks - i am very very happy to hear from you. i am pretty sure that this can be done in php as well - and the usage of csv-formatted output is also known in php-fields.. But the best argument is - i am a big big fan of this site here. And yes - you helped me years and years... your code is a live time saver..!!! [ i know you from the AutoTheme and i am/was a user of your site from the early beginning in 2003.... So i would be glad if you can help me here...
  9. hello good day dear community, i like this place. It is a great place for idea and knowlege sharing! But by far the most impressive thing i learned is that this community here is so supportive. I am overwhelmed by this experience. This forum has so many many great folks. i have a little parser that parses a site - with 6150 records. But i need to have this in a CSV-formate. First of all see here the target site: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750 i need all the data - with separation in the filed of number schoolnumber school-name Adress Street Postal Code phone fax School-type website BTW - see here the target site: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750 and compare! Well - i have a script: i am very interested what you think about this .... not all the fields are gained yet - i need more of them! #!/usr/bin/perl use strict; use HTML::TableExtract; use LWP::Simple; use Cwd; use POSIX qw(strftime); my $total_records = 0; my $alpha = "x"; my $results = 50; my $range = 0; my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q="; my $processdir = "processing"; my $counter = 50; my $percent = 0; workDir(); chdir $processdir; processURL(); print "\nPress <enter> to continue\n"; <>; my $displaydate = strftime('%Y%m%d%H%M%S', localtime); open my $outfile, '>', "webdata_for_$alpha\_$displaydate.txt" or die 'Unable to create file'; processData(); close $outfile; print "Finished processing $total_records records...\n"; print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$alpha\_$displaydate.txt\n"; unlink 'processing.html'; sub processURL() { print "\nProcessing $url_to_process$alpha&a=$results&s=$range\n"; getstore("$url_to_process$alpha&a=$results&s=$range", 'tempfile.html') or die 'Unable to get page'; while( <tempfile.html> ) { open( FH, "$_" ) or die; while( <FH> ) { if( $_ =~ /^.*?(Treffer \<b\>)(\d+)( - )(\d+)(<\/b> \w+ \w+ \<b\>)(\d+).*/ ) { $total_records = $6; print "Total records to process is $total_records\n"; } } close FH; } unlink 'tempfile.html'; } sub processData() { while ( $range <= $total_records) { my $te = HTML::TableExtract->new(headers => [qw(lfd Schul Schulname Telefon Schulart Webseite)]); getstore("$url_to_process$alpha&a=$results&s=$range", 'processing.html') or die 'Unable to get page'; $te->parse_file('processing.html'); my ($table) = $te->tables; foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { cleanup(@$row); # Add a table column delimiter in this case || print $outfile join("||", @$row)."\n"; } } $| = 1; print "Processed records $range to $counter"; print "\r"; $counter = $counter + 50; $range = $range + 50; } } sub cleanup() { for ( @_ ) { s/\s+/ /g; } } sub workDir() { # Use home directory to process data chdir or die "$!"; if ( ! -d $processdir ) { mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!"; } } output: 1||9752||Deutsche Schule Alamogordo USA Alamogorde - New Mexico || ||Deutschsprachige Auslandsschule|| 2||9931||Deutsche Schule der Borromäerinnen Alexandrien ET Alexandrien - Ägypten || ||Begegnungsschule (Auslandsschuldienst)|| 3||1940||Max-Keller-Schule, Berufsfachschule f.Musik Alt- ötting d.Berufsfachschule für Musik Altötting e.V. Kapellplatz 36 84503 Altötting ||08671/1735 08671/84363||Berufsfachschulen f. Musik|| www.max-keller-schule.de 4||0006||Max-Reger-Gymnasium Amberg Kaiser-Wilhelm-Ring 7 92224 Amberg ||09621/4718-0 09621/4718-47||Gymnasien|| www.mrg-amberg.de With the || being the delimiter. My problem is: i need to have more fields - i need to have the following divided: name: Volksschule Abenberg (Grundschule) street: Güssübelstr. 2 postal-code and town: 91183 Abenberg fax and telephone: 09178/215 09178/905060 type of school: Volksschulen website: home.t-online.de/home/vs-abenberg well - how to add more fields? This obviously has to be done in this line here, doesn t it!? my $te = HTML::TableExtract->new(headers => [qw(lfd Schul Schulname Telefon Schulart Webseite)]); But how. I tried out several things - but i dont helped. I allways got bad results. Btw: i played around - and tried another solution - but here i have good csv-data - but unfortunatley no spider logic... #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TableExtract; use Text::CSV; my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=50'; $html =~ tr/r//d; # strip the carriage returns $html =~ s/ / /g; # expand the spaces my $te = new HTML::TableExtract(); $te->parse($html); my @cols = qw( rownum number name phone type website ); my @fields = qw( rownum number name street postal town phone fax type website ); my $csv = Text::CSV->new({ binary => 1 }); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { # trim leading/trailing whitespace from base fields s/^s+//, s/\s+$// for @$row; # load the fields into the hash using a "hash slice" my %h; @h{@cols} = @$row; # derive some fields from base fields, again using a hash slice @h{qw/name street postal town/} = split /n+/, $h{name}; @h{qw/phone fax/} = split /n+/, $h{phone}; # trim leading/trailing whitespace from derived fields s/^s+//, s/\s+$// for @h{qw/name street postal town/}; $csv->combine(@h{@fields}); print $csv->string, "\n"; } } Well - with this i tried another solution - but here i have good csv-data - but unfortunatley no spider logic. How to add the spider-logic here... !? look forward to any and all help!
  10. hi all - i need some ideas here. it is so frustrating to do the job without a script. I can do it manually - but this takes about 7 hours .....
  11. hello dear all - hello all freaks of this great community, one question regarding a parser... note - it is a perl-parser, but believe me: i need some help with that. And i guess that here many many experts know the perl-bits... so well that this is no problem here.... Here we go! is there any chance to catch some seperators within the that seperate the table... The paser script runs allready nicely. Note - i want to store the data into a MySQL database. So it would be great to have some seperators - (commas, tabs or somewhat else - a tab seperated values or comma seperated values are handy formats to work with... here the data out of the following site: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20 Well i need to have those lines divided into at least three columns - take the first record. Or even better - i have divided the postal-code and town into two seperate columns!? Question: is this possible? By the way: see the first record: (here i only show the names of the school) Note, those have some commas inside the name; does this make it difficult to create a parser that creates csv-fomate? Any idea how to do this in Perl... If possible it would be just great!! many many thx for a hint regarding this little issue - besides this all is great and fascinating! dilbertone... Here the code: #!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use LWP::Simple; use Cwd; use POSIX qw(strftime); my $te = HTML::TableExtract->new; my $total_records = 0; my $suchbegriffe = "e"; my $treffer = 50; my $range = 0; my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q="; my $processdir = "processing"; my $counter = 50; my $displaydate = ""; my $percent = 0; &workDir(); chdir $processdir; &processURL(); print "\nPress <enter> to continue\n"; <>; $displaydate = strftime('%Y%m%d%H%M%S', localtime); open OUTFILE, ">webdata_for_$suchbegriffe\_$displaydate.txt"; &processData(); close OUTFILE; print "Finished processing $total_records records...\n"; print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$suchbegriffe\_$displaydate.txt\n"; unlink 'processing.html'; die "\n"; sub processURL() { print "\nProcessing $url_to_process$suchbegriffe&a=$treffer&s=$range\n"; getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'tempfile.html') or die 'Unable to get page'; while( <tempfile.html> ) { open( FH, "$_" ) or die; while( <FH> ) { if( $_ =~ /^.*?(Treffer <b>)(d+)( - )(d+)(</b> w+ w+ <b>)(d+).*/ ) { $total_records = $6; print "Total records to process is $total_records\n"; } } close FH; } unlink 'tempfile.html'; } sub processData() { while ( $range <= $total_records) { getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'processing.html') or die 'Unable to get page'; $te->parse_file('processing.html'); my ($table) = $te->tables; for my $row ( $table->rows ) { cleanup(@$row); print OUTFILE "@$row\n"; } $| = 1; print "Processed records $range to $counter"; print "\r"; $counter = $counter + 50; $range = $range + 50; $te = HTML::TableExtract->new; } } sub cleanup() { for ( @_ ) { s/s+/ /g; } } sub workDir() { # Use home directory to process data chdir or die "$!"; if ( ! -d $processdir ) { mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!"; } }
  12. Hello TLG yes - i want to split that information to tree cells or columns (in MySQL) BTW see the dataset: here you have an overview: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750 Well - i have loaded the data of the online sheet to a calc-spreadsheet and from there i imported it to mysql. In only one Column (the third one!) i have the full adress with 1. name of the school 2. name of the street 3. postal code and town Well - i guess that your code hits the point. I take all the (almost 6000 ) records and apply your code below. i will have a closer look what explode does exactly. But i am pretty sure that you have given the exact hint.... Best regards db1
  13. hello dear The Little Guy many many thanks for the hints! GREAT! well did i get you right: i take the information of the third column in this huge spreadheet that can be found here http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750 (or the information that is derived from this dataset to a calc-Spreadsheet) and apply your code on the third (!!!) Column of the Spreadsheet ? Note - i need to have the information not only in one cell but in three or four!? Question: should i take the information of the huge (note the table contains almost 6000 records) The Little Guy - i love to hear from you again. .. best regards db1
  14. good day dear community, well i am in big big trouble - i need some regex to solve a problem! Can you help me a bit! That would be great! Well - i mused alot how to call the subject: Finally i came to: "Regex or explode to array: I need some help in a simple string!" i have a spreadsheed in calc. with some records. There is a column that contains the following information Ecole Saint-Exupery Rue Saint-Malo 24 67544 Paris Well i need to have those lines divided into at least three columns name: Ecole Saint-Exupery street: Rue Saint-Malo 24 postal code and town 67544 Paris Or even better - i have divided the postal code and town into two seperate columns!? Question: is this possible? Can (or should) i do this in calc (open document-formate)? Do i need to have to use a regex and perl or am i able to solve this issues without an regex? Note - finally i need to transfer the data into MySQL-database... I look forward to a tipp... greetings BTW: you can see all the things in a real world-live-demo: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750 - see the filed Schulname Straße PLZ Ort These field contains three things - the name, the street and the Postal Code and the town! Question: can this be divided into parts!? If you copy and paste the information - and drop it to calc then you get all the information in only one cell. How to divide and seperate all those information into three cells or even four? BTW - i tried to translate the information to hex-code - see the follwoing...: Staatl. Realschule Grafenau Rachelweg 20 94481 Grafenau 00000000: 53 74 61 61 74 6C 2E 20 52 65 61 6C 73 63 68 75 00000010: 6C 65 20 47 72 61 66 65 6E 61 75 20 0A 52 61 63 00000020: 68 65 6C 77 65 67 20 32 30 0A 39 34 34 38 31 20 00000030: 20 47 72 61 66 65 6E 61 75 20 20 but i do not know if this helps here!?? Can you help me to solve the problem. Do i need to have a regex!? Many thanks in advance for any and all help!
  15. Hello Ignace many many thanks for the idea - that sounds very very good., BTW; As i already have the addresses in tab separated format What aobut this: I can create 10 different tables (or less according to the different formats) and loaf the into the database using load data infile command MySQL :: MySQL 5.1 Reference Manual :: 12.2.6 LOAD DATA INFILE Syntax .After this i can used the commands posted by you to create a new table with your new address book format. what do you think about this! look forward to hear from you best db1 see also: http://dev.mysql.com/doc/refman/5.1/en/load-data.html
  16. Hi there - hello BlueSkyIS Well - what if i want to migrate 10 (Adressbook-)DBs into one. They look a bit different: Adressbook 1: name adress eMail tel Telefax portrait Adressbook 2: name Company aresss: postalcode Telefon: Fax: E-Mail: Internet: Adressbook 3: name address tel fax email homepage all ten look like a bit different. How to treat this migration of ten tables into one big DB!? Hope i was able to make clear what i want. If i have to be more precise - just lemme know Many thanks in advance regards db1
  17. Hi dear freaks i want to create an adressbook with MySQL. At the moment i do not know how many fields i need. i want to be flexible with that - at least in the next days... Untill i am sure how many fields i really would need. i have found a Dump that allready is build for a Adressbook - i found this one in the internet. http://www.apachefriends.org/f/viewtopic.php?f=14&t=26305&start=0&sid=633a3f317b08dc6d8e555a81ed10538f&view=print # phpMyAdmin SQL Dump # version 2.5.7-pl1 # http://www.phpmyadmin.net # # Host: localhost # Erstellungszeit: 04. September 2007 um 16:37 # Server Version: 4.0.20 # PHP-Version: 4.3.7 # # Datenbank: `joels` # # -------------------------------------------------------- # # Tabellenstruktur f�r Tabelle `address_book` # CREATE TABLE `address_book` ( `address_book_id` int(11) NOT NULL auto_increment, `customers_id` int(11) NOT NULL default '0', `entry_gender` char(1) NOT NULL default '', `entry_company` varchar(32) default NULL, `entry_firstname` varchar(32) NOT NULL default '', `entry_lastname` varchar(32) NOT NULL default '', `entry_street_address` varchar(64) NOT NULL default '', `entry_suburb` varchar(32) default NULL, `entry_postcode` varchar(10) NOT NULL default '', `entry_city` varchar(32) NOT NULL default '', `entry_state` varchar(32) default NULL, `entry_country_id` int(11) NOT NULL default '0', `entry_zone_id` int(11) NOT NULL default '0', PRIMARY KEY (`address_book_id`), KEY `idx_address_book_customers_id` (`customers_id`) ) TYPE=MyISAM AUTO_INCREMENT=2 ; can i use this - and can i easily add more fields... ? lookforward to hear from you Regards db1
  18. Hello BlueSkyIs, many thanks for your answer!! Hmm - how do i apply it!` Hmmm - as i am new to php i ask myself what the point of the function is. Does the $numbers loop go within the function definition. i guess no - Hmm - guess the loop goes outside the function definition where the function is called multiple times. function () { /* Inside, define the function. */ } multiload(); /* <-- Outside, call the function. */ Love to hear from you best regards db1
  19. Hello dear community, The following code is a solution that returns the labels and values in a formatted array ready for input to mysql. Very nice;-) <?php $dom = new DOMDocument(); @$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1%5buid%5d=60119'); $divElement = $dom->getElementById('wfqbeResults'); $innerHTML= ''; $children = $divElement->childNodes; foreach ($children as $child) { $innerHTML = $child->ownerDocument->saveXML( $child ); $doc = new DOMDocument(); $doc->loadHTML($innerHTML); //$divElementNew = $dom->getElementsByTagName('td'); $divElementNew = $dom->getElementsByTagname('td'); /*** the array to return ***/ $out = array(); foreach ($divElementNew as $item) { /*** add node value to the out array ***/ $out[] = $item->nodeValue; } echo '<pre>'; print_r($out); echo '</pre>'; } ?> That bit of code works very fine and it performs an operation that i intend to call upon multiple times, Therefore it makes sense to wrap it in a function. We can name it whatever we want- Let us just name it "multiload". I tried to do this with the following code - but this does not run... I am still not sure where to put the uid - inside or outside the function... <?php function multiload ($uid) { /*...*/ // $uid = '60119'; $dom = new DOMDocument(); $dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1%5buid%5d=' . $uid); } multiload ('60089'); multiload ('60152'); multiload ('60242'); /*...*/ $divElement = $dom->getElementById('wfqbeResults'); $innerHTML= ''; $children = $divElement->childNodes; foreach ($children as $child) { $innerHTML = $child->ownerDocument->saveXML( $child ); $doc = new DOMDocument(); $doc->loadHTML($innerHTML); //$divElementNew = $dom->getElementsByTagName('td'); $divElementNew = $dom->getElementsByTagname('td'); /*** the array to return ***/ $out = array(); foreach ($divElementNew as $item) { /*** add node value to the out array ***/ $out[] = $item->nodeValue; } echo '<pre>'; print_r($out); echo '</pre>'; } ?> where to put the following lines multicall('60089'); multicall('60152'); multicall('60242'); /*...*/ This is still repetitive, so we can put the numbers in an array - can ´t we! Then we can loop through the array. $numbers = array ('60089', '60152', '60242' /*...*/); foreach ($numbers as $number) { doStuff($number); } But the question is - how to and where to put the loop!? Can anybody give me a starting point... BTW - if i have to be more descriptive i am trying to explain more - just let me know... it is no problem to explain more greetings
  20. Hello dear friends, found out the following: $dom->getElementById('floatbox'); ...in original html it's not an id, it's a class. So i have to rewrite like so: Well i try out this solution.
  21. hello dear revraz, hello dear litebearer, good day many many tanks to you both! Great to hear from you. The idea with an array is convincing me! I am convinced! BTW: this uses the Dot-Operator, doesn ´t it!? One of two solutions for string- or Url-concetenation?? many many thanks for the hint!! btw; this is a absolute great forum - i love it!! Many many thanks for the ideas and hints. @ you both - you are very very supportive. GREAT To have you here! Have a great season break and merry merry Christmas greetings Dilbertone! Update: one last question: I integrate the loop solution with the array into my basic-script .... <?php $dom = new DOMDocument(); @$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1%5buid%5d=60119'); $divElement = $dom->getElementById('wfqbeResults'); $innerHTML= ''; $children = $divElement->childNodes; foreach ($children as $child) { $innerHTML = $child->ownerDocument->saveXML( $child ); $doc = new DOMDocument(); $doc->loadHTML($innerHTML); //$divElementNew = $dom->getElementsByTagName('td'); $divElementNew = $dom->getElementsByTagname('td'); /*** the array to return ***/ $out = array(); foreach ($divElementNew as $item) { /*** add node value to the out array ***/ $out[] = $item->nodeValue; } echo '<pre>'; print_r($out); echo '</pre>'; } .......like so: <?php $dom = new DOMDocument(); $orig_string = "http://www.somesite.com?page="; @$dom->loadHTMLFile { $number_array = array ("123", "43567", "9287","3323"); for($i=0; $i<$count($number_array); $i ++) { $new_url = $orig_url . $number_array[$i]; /* do something with the new url */ } $divElement = $dom->getElementById('wfqbeResults'); $innerHTML= ''; $children = $divElement->childNodes; foreach ($children as $child) { $innerHTML = $child->ownerDocument->saveXML( $child ); $doc = new DOMDocument(); $doc->loadHTML($innerHTML); //$divElementNew = $dom->getElementsByTagName('td'); $divElementNew = $dom->getElementsByTagname('td'); /*** the array to return ***/ $out = array(); foreach ($divElementNew as $item) { /*** add node value to the out array ***/ $out[] = $item->nodeValue; } echo '<pre>'; print_r($out); echo '</pre>'; }
  22. good evening dear Community, Well first of all: felize Navidad - I wanna wish you a Merry Christmas!! Today i'm trying to debug a little DOMDocument object in PHP. Ideally it'd be nice if I could get DOMDocument to output in a array-like format, to store the data in a database! My example: head over to the url - see the example: the target http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=8880 I investigated the Sourcecode: I want to filter out the data that that is in the following class <div class="floatbox"> See the sourcecode: <span class="grey"> <span style="font-size:x-small;">></span></span> <a class="navLink" href="http://dms-schule.bildung.hessen.de/suchen/index.html" title="Suchformulare zum hessischen schulischen Bildungssystem">suche</a> </div> </div> <!-- begin of text --> <h3>Siegfried-Pickert Schule</h3> <div class="floatbox"> See my approach: Here is the solution return the labels and values in a formatted array ready for input to mysql! <?php $dom = new DOMDocument(); @$dom->loadHTMLFile('http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=8880'); $divElement = $dom->getElementById('floatbox'); $innerHTML= ''; $children = $divElement->childNodes; foreach ($children as $child) { $innerHTML = $child->ownerDocument->saveXML( $child ); $doc = new DOMDocument(); $doc->loadHTML($innerHTML); //$divElementNew = $dom->getElementsByTagName('td'); $divElementNew = $dom->getElementsByTagname('td'); /*** the array to return ***/ $out = array(); foreach ($divElementNew as $item) { /*** add node value to the out array ***/ $out[] = $item->nodeValue; } echo '<pre>'; print_r($out); echo '</pre>'; } well Duhh: this outputs lot of garbage. The code spits out a lot of html anyway. What can i do to get a more cleaned up code!? What is wrong with the idea of using this attribute: $dom->getElementById('floatbox'); any idea!? any and all help will greatly appreciated. season-greetings db1
  23. Hello dear community, good day! first of all: Merry Christmas to all of you!! How to combine / concatenate a *divided* string in order to use this combined / concatenated string in a loop where i run the $dom = new DOMDocument(); @$dom->loadHTMLFile('<- path to the file-> =60119'); and the following.... numbers - Note: they replace the ending!!! 60299 64643 62958 63678 60419 60585 60749 60962 and so on. ( Question: How to combine the string (in fact the string is an URL) so that i am able to build the URLs automatically. And that i am able to run all that in a loop - eg with foreach [probably this is the right way to do that]. I hope that i was able to explain the question so that you understand it. If i have to be more descriptive - just let me know! Many many thanks for a hint! db1
  24. hello good evening! i was able to run the test: suse-linux:~ # php -r "echo class_exists('DOMDocument') ? 'It exists' : 'It Does NOT exist';" It existssuse-linux:~ # well i am glad - now i can continue with the work on the parser. @DJ Kat: i continue with some tests on the parserscript (you gave me in an other thread - furhter below)]
  25. hi there - hello dear DJ Kat well - i am a bit confused! what to say? ;-) hmmm - i want to run the DOM-Document-code you suggested to me: so i am trying my best to get the Linux-box up and running with all that i need to have. Lemme know if i did something wrong!? well i try it on the shell: Question: should i run this: $ php -r 'echo (class_exists("DOMDocument")) ? "It exists \n" : "It Does NOT exist \n";' or this: on the -r 'echo (class_exists("DOMDocument")) ? "It exists \n" : "It Does NOT exist \n";' or so: -r 'echo (class_exists("DOMDocument")) ? "It exists \n" : "It Does NOT exist \n" hmmm - i am a bit confused... love to hear from you! db1
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.