dilbertone Posted December 1, 2010 Share Posted December 1, 2010 Hello dear Community, i have a large document - i need to parse it and spit out only this part: schule.php?schulnr=80287&lschb= how to i parse the stuff!? <td> <A HREF="schule.php?schulnr=80287&lschb=" target="_blank"> <center><img border=0 height=16 width=15 src="sh_info.gif"></center></A> </td> Love to hear from you Quote Link to comment Share on other sites More sharing options...
salathe Posted December 2, 2010 Share Posted December 2, 2010 How large is large? How do you think you would parse the stuff? There are many ways to "parse" a document; for HTML you could use the DOM to get an object-based view of the file, or you could read in the whole file and do a quicky preg_match_all(), or if it's really huge you could read it line-by-line and test each line for matching links. Quote Link to comment Share on other sites More sharing options...
dilbertone Posted December 7, 2010 Author Share Posted December 7, 2010 Hello dear salathe many thanks for the quick reply. i see that you are a regex-expert. Well i will try to do the job according your advices. I will try out the /preg_match_all]preg_match_all() way btw - i try also to parse those little examplesites - which are not very complicated - but seem to be some nice examples to learn alot. http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=5459&lschb= http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=675.635319953953&SchulAdresseMapDO=193975 any idea to do it with a quick way - or so.... love to hear from you db1 Quote Link to comment Share on other sites More sharing options...
dilbertone Posted December 8, 2010 Author Share Posted December 8, 2010 hello dear salathe, <?php $content = file_get_contents("http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb"); var_dump($content); $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); var_dump($match); } Quote Link to comment Share on other sites More sharing options...
salathe Posted December 9, 2010 Share Posted December 9, 2010 hello dear salathe Hi there. You didn't ask anything in the last post, are you happy with the code that you've got or do you still have questions or things that you would like to talk through? Quote Link to comment Share on other sites More sharing options...
dilbertone Posted December 10, 2010 Author Share Posted December 10, 2010 Hi Salathe, many many thanks - great to hear from you! you re right - there a question was missing hello dear salathe Hi there. You didn't ask anything in the last post, are you happy with the code that you've got or do you still have questions or things that you would like to talk through? i want to apply the above mentioned code on this URL - is this possible!? Guess that the HTML is a bit invaid!? But besides this - is it possible to apply the code on this new target-URL!? <?php $content = file_get_contents("http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb"); var_dump($content); $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); var_dump($match); } ...[...]... i love to hear from you! Regards db-one! Quote Link to comment Share on other sites More sharing options...
salathe Posted December 10, 2010 Share Posted December 10, 2010 If the "target URL" has the HTML structure and content that $pattern looks for, then yes. Otherwise, no. Quote Link to comment Share on other sites More sharing options...
dilbertone Posted December 10, 2010 Author Share Posted December 10, 2010 Hello Salathe If the "target URL" has the HTML structure and content that $pattern looks for, then yes. Otherwise, no. many many thanks - no - it has not. But it has tables! So i have to re-design the Regex a bit Can you give me some advices... Note it also has got tables! Quote Link to comment Share on other sites More sharing options...
salathe Posted December 10, 2010 Share Posted December 10, 2010 Oh, it has got tables! Then I'm sorry you'll have to start all over again! Joking aside, your $pattern only looks for <td>...</td> so since the page uses tables, you should be OK (fingers crossed). Quote Link to comment Share on other sites More sharing options...
dilbertone Posted December 10, 2010 Author Share Posted December 10, 2010 hello dear salathe, good evening! Many many thanks for the reply. I am very happy to hear from you ! Oh, it has got tables! Then I'm sorry you'll have to start all over again! Joking aside, your $pattern only looks for <td>...</td> so since the page uses tables, you should be OK (fingers crossed). the page has tables ... http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=675.635319953953&SchulAdresseMapDO=193975 Okay i will try out the regex and will see what is spit out! Many thanks for your help! regards db1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.