dilbertone Posted November 14, 2010 Share Posted November 14, 2010 good day dear PHPFreaks - hello to everybody. i want to create a link parser. i have choosen to do it with Curl. I have some lines together now. Love to hear your review... Since i am new to programming i love to get some hints from experienced devs. Here some details: well since we have several hundred of resultpages derived from this one: http://www.educa.ch/dyn/79362.asp?action=search Note: i want to itterate over the resultpages - with a loop. http://www.educa.ch/dyn/79376.asp?id=1568 http://www.educa.ch/dyn/79376.asp?id=2149 i take this loop: for($i=1;$i<=$match[1];$i++) { $url = "http://www.example.com/page?page={$i}"; // access new sub-page, extract necessary data } what do you think? What about the Loop over the target-Urls? BTW: you see - there will be some pages empty. Note - the empty pages should be thrown away. I do not want to store "empty" stuff. well this is what i want to. And now i need to have a good parser-script. Note: this is a tree-part-job: 1. fetching the sub-pages 2. parsing them 3. storing the data in a mysql-db Well - the problem - some of the above mentioned pages are empty. so i need to find a solution to leave them aside - unless i do not want to populate my mysql-db with too much infos.. Btw- parsing should be a part that can be done with DomDocument - What do you think? I need to combine the first part with tthe second - can you give me some starting points and hints to get this. The fetching-job should be done with CuRL - and to process the data into a DomDocument-Parser-Job. No Problem here: But how to do the DOM-Document-Job ... i have installed FireBug into the FireFox... now i have the Xpaths for the sites: http://www.educa.ch/dyn/79376.asp?id=1187 http://www.educa.ch/dyn/79376.asp?id=2939 http://www.educa.ch/dyn/79376.asp?id=1515 http://www.educa.ch/dyn/79376.asp?id=1469 Altes Schulhaus Ossingen :: /html/body/div[2] Guntibachstrasse 10 :: /html/body/div[4] 8475 Ossingen :: /html/body/div[6] [email protected] :: /html/body/div[9]/a Tel:052 317 15 45 :: /html/body/div[11] Fax:052 317 04 42 :: /html/body/div[12] but how to appyl in the Simple DomDocument - i want to use this here: http://simplehtmldom.sourceforge.net/ look forward to a hint that gives me a starting point Link to comment https://forums.phpfreaks.com/topic/218625-domdocument-parser-i-need-a-starting-point/ Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.