Regular Expression returning empty Array ( )

Miteshsach86 · October 7, 2010

Hi fellow developers,

I'm having a real problem at the moment, I'm trying to capture everything in between <body></body> tags using the following code but it does not print anything:

$lines = file("http://www.bbc.co.uk/");

foreach ($lines as $line_num => $line) {

$thecontent .= htmlspecialchars($line) . "<br />\n";

}

preg_match('/<body.*?>(.*?)<\/body >/', $thecontent, $htmltext);

$moretext = $htmltext[1];

echo $moretext;

When you do place a "print($thecontent);" into the code the entire html for [whatever the website] does display but I want to capture only the html code in between the body tags. I've tried everything but I just can't get this to work. :shrug:

I would appreciate anyone's help and I'd like to thank you in advance.

M

joel24 · October 7, 2010

rather than having that foreach loop and individually adding each line to a variable, you can use file_get_contents() which will put the contents into a single string.

something like this may work... I'm not all too skilled with regular expressions though someone else may help you.

//haven't tested this yet, but good luck

$pageHtml = file_get_contents("http://www.bbc.co.uk/");

//get position of <body> tag
$openingTag = stripos($pageHtml, "<body>");

//position of </body> tag
$closingTag = stripos($pageHtml, "</body>");

//get length of body tag by position closing minus position starting tag (+6 for <body> tag characters)
$length = $closingTag - ($openingTag + 6);

$bodyHTML = substr($pageHTML, $length);

echo $bodyHTML;

Miteshsach86 · October 7, 2010

Hi Joel,

Thanks for your reply. I added thoses changes into my code but unfortunately that displays nothing at all. I've also posted another method which I tried and that displays an empty Array as well. I really need to use preg_match for what I'm doing. The new code is:

$thecontent = file_get_contents("http://www.bbc.co.uk/");

preg_match('/<body.*?>(.*?)<\/body >/', $thecontent, $htmltext);

$moretext = $htmltext[1];

echo $moretext;

I've searched everywhere for this problem but can't seem to figure it out. Please help :confused:

M

anups · October 7, 2010


<?php
$lines = file_get_contents("http://www.bbc.co.uk/");
preg_match("~<body[^>]*>(.*?)</body>~si", $lines, $output);
print_r($output); 
?>

Miteshsach86 · October 7, 2010

Hi Anup,

Thanks for your response... unfortunately I've tried that as well... it's not working

I'm getting the same problem... shows empty Array ( ) :confused:

joel24 · October 7, 2010

maybe try cURL?

demo here

and if you persist on using file() or file_get_contents(), this is written on PHP.net in regards to the file() function link

Tip

A URL can be used as a filename with this function if the fopen wrappers have been enabled. See fopen() for more details on how to specify the filename. See the List of Supported Protocols/Wrappers for links to information about what abilities the various wrappers have, notes on their usage, and information on any predefined variables they may provide.

Miteshsach86 · October 8, 2010

Thanks for all your help Joel24!

Much appreciated

Sign In

Regular Expression returning empty Array ( )

Recommended Posts

Miteshsach86

Link to comment

Share on other sites

joel24

Link to comment

Share on other sites

Miteshsach86

Link to comment

Share on other sites

anups

Link to comment

Share on other sites

Miteshsach86

Link to comment

Share on other sites

joel24

Link to comment

Share on other sites

Miteshsach86

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information