Jump to content

Web Scrapping


olly79

Recommended Posts

Hi all,

 

I wonder if someone can assist me with the following.

 

I have hundereds of URLs for which I need to return the <title> tags of and dump them along with the URL into Excel for further manipulation.

 

I read a post r.e. using cURL; however, I wonder if someone could help me with the coding of such an application.

 

I also need to understand how to apply the code, that is instructions on what to do with it i.e. load cURL.php onto webserver, go to such a URL and it will load etc etc.

 

Any help much appreciated.

Link to comment
Share on other sites

Excel can read from a CSV source

 

$file = fopen('excel.csv', 'w');
fputcsv($file, array('Title', 'URL'));

$urls = array();//fill this with your URL's
$doc = new DomDocument();

libxml_use_internal_errors(true);
foreach ($urls as $url)
{
   if ($doc->loadHtmlFile($url))
   {
      $xpath = new DomXPath($doc);
      $title = $xpath->query('//title')->items(0)->nodeValue;
    
      fputcsv($file, array($title, $url));
   }
}
fclose($file);

Link to comment
Share on other sites

Hi there,

 

Thanks for that.

 

Can you assit me with how I load that into Excel.

 

I'm on Excel 2007 and not sure how to do it.

 

Thanks

 

If you have it installed, it will auto-recognize the .csv extension and display an Excel icon on the file. Double-click to open it.

Link to comment
Share on other sites

Hi,

 

Just to clarify. I have the URL I want to crawl; however, I don't have the <title> tags, thus the script needs to look at http://www.somedomain.com and return it's <title> tag. It needs to then go to the next URL in my spreadsheet and go off and find that <title> tag and so on.

 

I want to insert next to the URL its associated <title> tag that the script has returned.

 

I hope that helps.

Link to comment
Share on other sites

Hi,

 

I don't understand what you mean - sorry I'm not familiar with Excel/PHP.

 

Can you explain in simply terms what I need to do please?

 

Thanks

 

If you have Excel installed on your system, it will auto-recognize the file when you browse to it and you will be able to double-click it to open it in Excel.

Link to comment
Share on other sites

I don't follow your process.

 

1) What do I save the code you have supplies as?

2) In the excel.csv within the code should this be the excel spreadsheet with all my URL's

3) Do I open a new excel workbook and open the code file that I have saved as you have suggested and it will work?

 

Thanks

Link to comment
Share on other sites

1) What do I save the code you have supplies as?

 

The data is saved to excel.csv in the same directory as the .php file executes.

 

2) In the excel.csv within the code should this be the excel spreadsheet with all my URL's

 

Where do your URL's come from? A spreadsheet or a database, array, ..?

 

3) Do I open a new excel workbook and open the code file that I have saved as you have suggested and it will work?

 

Run the .php file and then find the excel.csv, double-click it to open it in Excel.

Link to comment
Share on other sites

Okay, I hope this helps to clarify matters further as I really need to get this up and running and would appreciate you helping me a little further in terms of how to execute what you are saying as its not making a sense to me and you are perhaps assuming that I'm further on than I am, thus it needs to be simple steps please.

 

1) You supplied me with the following code:

 

$file = fopen('excel.csv', 'w');

fputcsv($file, array('Title', 'URL'));

 

$urls = array();//fill this with your URL's

$doc = new DomDocument();

 

libxml_use_internal_errors(true);

foreach ($urls as $url)

{

  if ($doc->loadHtmlFile($url))

  {

      $xpath = new DomXPath($doc);

      $xrelative = newDomXPath($doc);

      $title = $xpath->query('//title')->items(0)->nodeValue;

   

      fputcsv($file, array($title, $url));

  }

}

fclose($file);

 

$close(file_upload).

 

Therefore what do I do with this? If I'm to save it in Excel then where does it go i.e. Macro? If so how do I do this please? Are you then suggetsing that this file is saved as excel.csv and stored on my desktop?

 

2) All the URL's that I need to scrape for the <title> tag are in another excel spreadsheet on my desktop.

 

3) You mention run the .php file - what is this and how do I do that as I'm not following?

 

Thanks again for your help and I need you to breakdown each step please in simple terms.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.