Jump to content

[SOLVED] php scraping/crawling problem


andreidumitrescu

Recommended Posts

Hi,

 

I have a page with real estate ads which I need to get data from.

I used to take the link of the result of a search and crawl the results,but this site is somewhat different.It's done in asp and even if a make a search and result do apear the link wouldn't change so I can't get the link to take the data from.

In the past I crawled those kind of sites using C# visual studio and it's webbrower but now I have to use exclusivelly PHP.Is there a solution for this?

Regards,

Andrei

Link to comment
https://forums.phpfreaks.com/topic/174551-solved-php-scrapingcrawling-problem/
Share on other sites

Well it seems to me that your issue is that the search feature on the website you are trying to crawl is using the POST method for the search form.  Probably to limit the type of crawling you are attempting.

You can check the HTML source to see if the form is in fact utilizing the POST method to request data from the script.

I believe that cURL is the method you'll need to employ to crawl this site.

Basically, you have to have the server send the POST request with the required fields to the ASP script in question.

 

From there, you have to have the script store whatever data you want to pull from the site.

Hope this helps,

Handy PHP

Hi,

Thanks for respond.I've checked the source code and indeed the form uses Post method.

I'm a bit confuzed about how this will work.Is it possible to fill textboxes and then send request for the post method in PHP ? I know how this is done in C# and what are you saying is pretty similar.

Thanks,

Andrei

  • Generally, we only have to worry about 2 methods for form submission in PHP,  POST and GET!
     
    In fact, the GET method actually forms those URL Query Strings you are familiar with so for now, we will consider the two methods like this:
    POST submits data to the server in a more secure way that is hidden to the average bystander.
    GET submits data and shows what is submitted in a URL query string for the user or anyone around to see.
     
    Obviously, we wouldn't submit something like a password using the GET method.
    Many search forms utilize the GET method to translate the information into the URL query string.
     
    So a textbox that is filled out has it's contents sent to the script using whatever method is specified in the form tag.
     
    Using cURL, you can have the server send either a POST or GET query to the target website's script.  The target website won't know the difference but your script is actually reading the search results.  It is up to you to tell your script where and what to search for and then how to process the data returned.
    For testing, you should just output whatever the server found to your browser to see that your script successfully connected to the other server and submitted the search request.
     
    My guess is that you want your script to go to the other website with a list of search parameters you want to cycle through and save the returned data to your website for your use.
    Likely, you need one function to get a list of links to items you want and another function to read each of those items.
    You have to know exactly what the search form submits and exactly how the search results are returned to be able to extract the links you need.  Then you need to know exactly how the item detail page is layed out to extract the data you want.
     

  1. So, you use cURL to connect to a website and submit the search parameters you have.  Cycling through an array of different search parameters would automate more of the task.
  2. Using regular expressions, extract the links to the items returned
  3. Save the extracted links in a database, file or an array
  4. Using cURL, visit each link and use regular expressions, extract the data you want
  5. Save the extracted data in a file, database, or an array.

 

Hopefully, that will point you in the right direction.

I suggest that you read this example for curl_setopt as it shows exactly how to use cURL to connect to another server from your own.

 

Good Luck,

Handy PHP


 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.