Miteshsach86 Posted October 7, 2010 Share Posted October 7, 2010 Hi fellow developers, I'm having a real problem at the moment, I'm trying to capture everything in between <body></body> tags using the following code but it does not print anything: $lines = file("http://www.bbc.co.uk/"); foreach ($lines as $line_num => $line) { $thecontent .= htmlspecialchars($line) . "<br />\n"; } preg_match('/<body.*?>(.*?)<\/body >/', $thecontent, $htmltext); $moretext = $htmltext[1]; echo $moretext; When you do place a "print($thecontent);" into the code the entire html for [whatever the website] does display but I want to capture only the html code in between the body tags. I've tried everything but I just can't get this to work. I would appreciate anyone's help and I'd like to thank you in advance. M Quote Link to comment Share on other sites More sharing options...
joel24 Posted October 7, 2010 Share Posted October 7, 2010 rather than having that foreach loop and individually adding each line to a variable, you can use file_get_contents() which will put the contents into a single string. something like this may work... I'm not all too skilled with regular expressions though someone else may help you. //haven't tested this yet, but good luck $pageHtml = file_get_contents("http://www.bbc.co.uk/"); //get position of <body> tag $openingTag = stripos($pageHtml, "<body>"); //position of </body> tag $closingTag = stripos($pageHtml, "</body>"); //get length of body tag by position closing minus position starting tag (+6 for <body> tag characters) $length = $closingTag - ($openingTag + 6); $bodyHTML = substr($pageHTML, $length); echo $bodyHTML; Quote Link to comment Share on other sites More sharing options...
Miteshsach86 Posted October 7, 2010 Author Share Posted October 7, 2010 Hi Joel, Thanks for your reply. I added thoses changes into my code but unfortunately that displays nothing at all. I've also posted another method which I tried and that displays an empty Array as well. I really need to use preg_match for what I'm doing. The new code is: $thecontent = file_get_contents("http://www.bbc.co.uk/"); preg_match('/<body.*?>(.*?)<\/body >/', $thecontent, $htmltext); $moretext = $htmltext[1]; echo $moretext; I've searched everywhere for this problem but can't seem to figure it out. Please help M Quote Link to comment Share on other sites More sharing options...
anups Posted October 7, 2010 Share Posted October 7, 2010 <?php $lines = file_get_contents("http://www.bbc.co.uk/"); preg_match("~<body[^>]*>(.*?)</body>~si", $lines, $output); print_r($output); ?> Quote Link to comment Share on other sites More sharing options...
Miteshsach86 Posted October 7, 2010 Author Share Posted October 7, 2010 Hi Anup, Thanks for your response... unfortunately I've tried that as well... it's not working I'm getting the same problem... shows empty Array ( ) Quote Link to comment Share on other sites More sharing options...
joel24 Posted October 7, 2010 Share Posted October 7, 2010 maybe try cURL? demo here and if you persist on using file() or file_get_contents(), this is written on PHP.net in regards to the file() function link Tip A URL can be used as a filename with this function if the fopen wrappers have been enabled. See fopen() for more details on how to specify the filename. See the List of Supported Protocols/Wrappers for links to information about what abilities the various wrappers have, notes on their usage, and information on any predefined variables they may provide. Quote Link to comment Share on other sites More sharing options...
Miteshsach86 Posted October 8, 2010 Author Share Posted October 8, 2010 Thanks for all your help Joel24! Much appreciated Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.