Jump to content

parsing a larger number of locally based files...


dilbertone

Recommended Posts

Hello dear Community,

 

i have a large document - i need to parse it and spit out only this part: schule.php?schulnr=80287&lschb=

 

how to i parse the stuff!?

  <td>
<A HREF="schule.php?schulnr=80287&lschb=" target="_blank">
    <center><img border=0 height=16 width=15 src="sh_info.gif"></center></A>
        </td>

Love to hear from you

How large is large?  How do you think you would parse the stuff?

 

There are many ways to "parse" a document; for HTML you could use the DOM to get an object-based view of the file, or you could read in the whole file and do a quicky preg_match_all(), or if it's really huge you could read it line-by-line and test each line for matching links.

Hello dear salathe

 

many thanks for the quick reply.

 

i see that you are a regex-expert. Well i will try to do the job according your advices.  I will try out the /preg_match_all]preg_match_all() way

 

btw - i try also to parse those little examplesites - which are not very complicated - but seem to be some nice examples to learn alot.

 

 

http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=5459&lschb=

 

http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=675.635319953953&SchulAdresseMapDO=193975

 

any idea to do it with a quick way - or so.... love to hear from you

 

db1 :shy:

hello dear salathe,

 

<?php

$content = file_get_contents("http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb");

var_dump($content);

$pattern = '/<td>(.*?)<\/td>/si';
preg_match_all($pattern,$content,$matches);

foreach ($matches[1] as $match) {
    $match = strip_tags($match);
    $match = trim($match);
    var_dump($match);
}

Hi Salathe,

 

many many thanks - great to hear from you!  you re  right - there a question was missing

 

hello dear salathe

 

Hi there. :)  You didn't ask anything in the last post, are you happy with the code that you've got or do you still have questions or things that you would like to talk through?

 

i want to apply the above mentioned code on this URL - is this possible!? Guess that the HTML is a bit invaid!?

But besides this - is it possible to apply the code on this new target-URL!?

 

<?php

$content = file_get_contents("http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb");

var_dump($content);

$pattern = '/<td>(.*?)<\/td>/si';
preg_match_all($pattern,$content,$matches);

foreach ($matches[1] as $match) {
    $match = strip_tags($match);
    $match = trim($match);
    var_dump($match);
}
...[...]...

 

i love to hear from you!

 

Regards

db-one!

 

 

 

Hello Salathe

 

If the "target URL" has the HTML structure and content that $pattern looks for, then yes. Otherwise, no.

 

many many thanks - no - it has not. But it has tables! So i have to re-design  the Regex a bit

 

Can you give me some advices... Note it also has got tables!

 

 

 

hello dear salathe, good evening!

 

Many many thanks for the  reply. I am very happy to hear from you !

 

Oh, it has got tables! Then I'm sorry you'll have to start all over again!

 

Joking aside, your $pattern only looks for <td>...</td> so since the page uses tables, you should be OK (fingers crossed).

 

the page has tables ...

http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=675.635319953953&SchulAdresseMapDO=193975

 

Okay i will try out the regex and will see what is spit out!

 

Many thanks for your help!

regards db1

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.