dilbertone Posted November 6, 2010 Share Posted November 6, 2010 i am new to PHP - and i want to learn some thing bout PHP - currently i have a little project - in order to get the links visited that this site presents http://www.educa.ch/dyn/79363.asp?action=search [search with wildcard % ] i parse with a loop. <?php $data = file_get_contents('http://www.educa.ch/dyn/79363.asp?action=search'); $regex = '/Page 1 of (.+?) results/'; preg_match($regex,$data,$match); var_dump($match); echo $match[1]; ?> in order to get the following pages http://www.educa.ch/dyn/79376.asp?id=4438 http://www.educa.ch/dyn/79376.asp?id=2939 If we are looping over a set of values, then we need to supply it as an array. I would guess something like this. As i am not sure which numbers which are filled with content - i therefore have to loop from 1 to 10000. So i make sure that i get all data. What do you think!? for ($i = 1; $i <= 10000; $i++) { // body of loop } according the following description: http://www.php.net/manual/en/control-structures.for.php Link to comment https://forums.phpfreaks.com/topic/217971-loop-constructing-the-urls-and-use-php-to-fetch-up-to-thousand-pages/ Share on other sites More sharing options...
joel24 Posted November 7, 2010 Share Posted November 7, 2010 To try and read 10,000 pages from an external source and search those pages for content will take up a lot of resources and time. What exactly are you trying to get from those pages? Link to comment https://forums.phpfreaks.com/topic/217971-loop-constructing-the-urls-and-use-php-to-fetch-up-to-thousand-pages/#findComment-1131220 Share on other sites More sharing options...
dilbertone Posted November 7, 2010 Author Share Posted November 7, 2010 hi Joel24 thx for writing To try and read 10,000 pages from an external source and search those pages for content will take up a lot of resources and time. What exactly are you trying to get from those pages? see the pages - [this is a open - for everybody free readable and usuable server - a governmental database - runned in swizzerland. This serer provides adresses for schools - have a closer look; http://www.educa.ch/dyn/79376.asp?id=4438 http://www.educa.ch/dyn/79376.asp?id=2939 nothing harmful i want o read the adresses with php or perl Link to comment https://forums.phpfreaks.com/topic/217971-loop-constructing-the-urls-and-use-php-to-fetch-up-to-thousand-pages/#findComment-1131335 Share on other sites More sharing options...
joel24 Posted November 7, 2010 Share Posted November 7, 2010 as you probably know, the 'detail' link displays the address. That link is called by a javascript onclick function with a dynamic id at the end which calls the page. <a href="#73" onclick="javascript: window.open('79376.asp?id=375','Detail','width=400,height=300,left=0,top=0');">Detail</a> To lessen the server load, I would set up a database and then create a program to crawl educa.ch and use regular expressions to extract data from each url ('79376.asp?id=375', '79376.asp?id=324', etc) from the onclick function, then store the contents in a database, preferably sorted into corresponding fields; address, email etc. Then you would need to extract the address from that detail page, how you would go about separating the address from the other content I am unsure. A crafty regular expression may do the job, you could easily pull the email as it is an anchor link with href='mailto:[email protected]' I'm not experienced enough with regular expressions so you'll have to find someone who is. Good luck Link to comment https://forums.phpfreaks.com/topic/217971-loop-constructing-the-urls-and-use-php-to-fetch-up-to-thousand-pages/#findComment-1131343 Share on other sites More sharing options...
dilbertone Posted November 7, 2010 Author Share Posted November 7, 2010 hello joel24 many thanks for the reply. REGEX is a solution. I currently read some docs that cover Dom_Document. Probably a solution for the Parser-Job. Concerning the fetching i muse about using Curl. It is pretty powerful. i will come back and report all my findings regards Link to comment https://forums.phpfreaks.com/topic/217971-loop-constructing-the-urls-and-use-php-to-fetch-up-to-thousand-pages/#findComment-1131396 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.