Jump to content

file_get_contents or Curl - which one to take?


dilbertone

Recommended Posts

 

A German DB that collects all the data from all German Foundations... see: http://www.suche.stiftungen.org/index.php?strg=87_124&baseID=129

 

Here we find all Foundations in Germany: : 8074 different foundations

 

You get the full results if you choose % as wildcard in the Search-field.

 

But if we do that - then you get some kind of overflow... 350 results are  the limit. More is not possible to show.  So the question is: How can we create a spider that runs across the site and  asks step by step - that we get all : 8074 results.

 

The way to get through this database is to search combinations of letters eg "ac" and select search only titles.

Then go through every pair of letters.  If you still get too many results for a particular pair, use 3 letters. 

aca, acb,...

 

Can i do this with File_get_contents_ or with Curl!?  (eg -MultiCurl )

 

Well - i  want to make a little script thatdoes this - i need to create a little automation - that does this task automatically.

 

Regarding the destination database, it's all going into sqlite, which we believe can handle large enough sets of data without any problems. We can download the database as a file too. For capabilities, see here:  http://www.sqlite.org/limits.html

 

 

The question is _ how to create the  first approach of the parser...!`? Can any body assist!

 

 

Link to comment
Share on other sites

In theory...

 

I would use curl to return the requested html from the search pages, then I would write a script that searches for the html tags that the required info is between and then adds that info to the database.

 

I don't think you can do file_get_contents from a remote site, but I'm not sure on that one.

 

E

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.