dilbertone Posted May 17, 2011 Share Posted May 17, 2011 A German DB that collects all the data from all German Foundations... see: http://www.suche.stiftungen.org/index.php?strg=87_124&baseID=129 Here we find all Foundations in Germany: : 8074 different foundations You get the full results if you choose % as wildcard in the Search-field. But if we do that - then you get some kind of overflow... 350 results are the limit. More is not possible to show. So the question is: How can we create a spider that runs across the site and asks step by step - that we get all : 8074 results. The way to get through this database is to search combinations of letters eg "ac" and select search only titles. Then go through every pair of letters. If you still get too many results for a particular pair, use 3 letters. aca, acb,... Can i do this with File_get_contents_ or with Curl!? (eg -MultiCurl ) Well - i want to make a little script thatdoes this - i need to create a little automation - that does this task automatically. Regarding the destination database, it's all going into sqlite, which we believe can handle large enough sets of data without any problems. We can download the database as a file too. For capabilities, see here: http://www.sqlite.org/limits.html The question is _ how to create the first approach of the parser...!`? Can any body assist! Quote Link to comment https://forums.phpfreaks.com/topic/236689-file_get_contents-or-curl-which-one-to-take/ Share on other sites More sharing options...
The Letter E Posted May 17, 2011 Share Posted May 17, 2011 In theory... I would use curl to return the requested html from the search pages, then I would write a script that searches for the html tags that the required info is between and then adds that info to the database. I don't think you can do file_get_contents from a remote site, but I'm not sure on that one. E Quote Link to comment https://forums.phpfreaks.com/topic/236689-file_get_contents-or-curl-which-one-to-take/#findComment-1216730 Share on other sites More sharing options...
anupamsaha Posted May 18, 2011 Share Posted May 18, 2011 You can do both ways. But, for security reason, opening a remote URL through file_get_contents() is disabled in the servers. So, cURL is the ideal way to go with. Quote Link to comment https://forums.phpfreaks.com/topic/236689-file_get_contents-or-curl-which-one-to-take/#findComment-1216833 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.