Jump to content

file_get_contents or Curl - which one to take?


dilbertone

Recommended Posts

 

A German DB that collects all the data from all German Foundations... see: http://www.suche.stiftungen.org/index.php?strg=87_124&baseID=129

 

Here we find all Foundations in Germany: : 8074 different foundations

 

You get the full results if you choose % as wildcard in the Search-field.

 

But if we do that - then you get some kind of overflow... 350 results are  the limit. More is not possible to show.  So the question is: How can we create a spider that runs across the site and  asks step by step - that we get all : 8074 results.

 

The way to get through this database is to search combinations of letters eg "ac" and select search only titles.

Then go through every pair of letters.  If you still get too many results for a particular pair, use 3 letters. 

aca, acb,...

 

Can i do this with File_get_contents_ or with Curl!?  (eg -MultiCurl )

 

Well - i  want to make a little script thatdoes this - i need to create a little automation - that does this task automatically.

 

Regarding the destination database, it's all going into sqlite, which we believe can handle large enough sets of data without any problems. We can download the database as a file too. For capabilities, see here:  http://www.sqlite.org/limits.html

 

 

The question is _ how to create the  first approach of the parser...!`? Can any body assist!

 

 

Link to comment
Share on other sites

In theory...

 

I would use curl to return the requested html from the search pages, then I would write a script that searches for the html tags that the required info is between and then adds that info to the database.

 

I don't think you can do file_get_contents from a remote site, but I'm not sure on that one.

 

E

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.