olly79 Posted June 13, 2010 Share Posted June 13, 2010 Hi all, I wonder if someone can assist me with the following. I have hundereds of URLs for which I need to return the <title> tags of and dump them along with the URL into Excel for further manipulation. I read a post r.e. using cURL; however, I wonder if someone could help me with the coding of such an application. I also need to understand how to apply the code, that is instructions on what to do with it i.e. load cURL.php onto webserver, go to such a URL and it will load etc etc. Any help much appreciated. Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/ Share on other sites More sharing options...
ignace Posted June 13, 2010 Share Posted June 13, 2010 Excel can read from a CSV source $file = fopen('excel.csv', 'w'); fputcsv($file, array('Title', 'URL')); $urls = array();//fill this with your URL's $doc = new DomDocument(); libxml_use_internal_errors(true); foreach ($urls as $url) { if ($doc->loadHtmlFile($url)) { $xpath = new DomXPath($doc); $title = $xpath->query('//title')->items(0)->nodeValue; fputcsv($file, array($title, $url)); } } fclose($file); Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071574 Share on other sites More sharing options...
olly79 Posted June 13, 2010 Author Share Posted June 13, 2010 Hi there, Thanks for that. Can you assit me with how I load that into Excel. I'm on Excel 2007 and not sure how to do it. Thanks Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071581 Share on other sites More sharing options...
ignace Posted June 13, 2010 Share Posted June 13, 2010 Hi there, Thanks for that. Can you assit me with how I load that into Excel. I'm on Excel 2007 and not sure how to do it. Thanks If you have it installed, it will auto-recognize the .csv extension and display an Excel icon on the file. Double-click to open it. Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071601 Share on other sites More sharing options...
olly79 Posted June 13, 2010 Author Share Posted June 13, 2010 Hi, I don't understand what you mean - sorry I'm not familiar with Excel/PHP. Can you explain in simply terms what I need to do please? Thanks Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071609 Share on other sites More sharing options...
olly79 Posted June 13, 2010 Author Share Posted June 13, 2010 Hi, Just to clarify. I have the URL I want to crawl; however, I don't have the <title> tags, thus the script needs to look at http://www.somedomain.com and return it's <title> tag. It needs to then go to the next URL in my spreadsheet and go off and find that <title> tag and so on. I want to insert next to the URL its associated <title> tag that the script has returned. I hope that helps. Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071613 Share on other sites More sharing options...
ignace Posted June 14, 2010 Share Posted June 14, 2010 Hi, I don't understand what you mean - sorry I'm not familiar with Excel/PHP. Can you explain in simply terms what I need to do please? Thanks If you have Excel installed on your system, it will auto-recognize the file when you browse to it and you will be able to double-click it to open it in Excel. Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071750 Share on other sites More sharing options...
olly79 Posted June 14, 2010 Author Share Posted June 14, 2010 I don't follow your process. 1) What do I save the code you have supplies as? 2) In the excel.csv within the code should this be the excel spreadsheet with all my URL's 3) Do I open a new excel workbook and open the code file that I have saved as you have suggested and it will work? Thanks Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071763 Share on other sites More sharing options...
ignace Posted June 14, 2010 Share Posted June 14, 2010 1) What do I save the code you have supplies as? The data is saved to excel.csv in the same directory as the .php file executes. 2) In the excel.csv within the code should this be the excel spreadsheet with all my URL's Where do your URL's come from? A spreadsheet or a database, array, ..? 3) Do I open a new excel workbook and open the code file that I have saved as you have suggested and it will work? Run the .php file and then find the excel.csv, double-click it to open it in Excel. Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071780 Share on other sites More sharing options...
olly79 Posted June 14, 2010 Author Share Posted June 14, 2010 Okay, I hope this helps to clarify matters further as I really need to get this up and running and would appreciate you helping me a little further in terms of how to execute what you are saying as its not making a sense to me and you are perhaps assuming that I'm further on than I am, thus it needs to be simple steps please. 1) You supplied me with the following code: $file = fopen('excel.csv', 'w'); fputcsv($file, array('Title', 'URL')); $urls = array();//fill this with your URL's $doc = new DomDocument(); libxml_use_internal_errors(true); foreach ($urls as $url) { if ($doc->loadHtmlFile($url)) { $xpath = new DomXPath($doc); $xrelative = newDomXPath($doc); $title = $xpath->query('//title')->items(0)->nodeValue; fputcsv($file, array($title, $url)); } } fclose($file); $close(file_upload). Therefore what do I do with this? If I'm to save it in Excel then where does it go i.e. Macro? If so how do I do this please? Are you then suggetsing that this file is saved as excel.csv and stored on my desktop? 2) All the URL's that I need to scrape for the <title> tag are in another excel spreadsheet on my desktop. 3) You mention run the .php file - what is this and how do I do that as I'm not following? Thanks again for your help and I need you to breakdown each step please in simple terms. Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071789 Share on other sites More sharing options...
l4nc3r Posted June 14, 2010 Share Posted June 14, 2010 Instead of having him do it for you, why don't you RTFM? http://www.php.net/ http://www.w3schools.com/php/default.asp http://spreadsheets.about.com/od/excel101/a/Excel_beg_guide.htm Excellent answers ignance. Of course, if you have money, I'd be glad to do it for you. Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071804 Share on other sites More sharing options...
olly79 Posted June 14, 2010 Author Share Posted June 14, 2010 Thanks for your non helpful response. Much appreciated!!! If ignace doesn't want to assit further I will leave it at that. Quote Link to comment https://forums.phpfreaks.com/topic/204671-web-scrapping/#findComment-1071807 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.