read contents of html <head> or <body>

snorky · December 3, 2009

I wrote (like 10,000,000 others) a script to recursively read and list the contents of directories. However, when I print the results to the screen I want to include the contents of the <title></title> block and/or the first nn words in the <body></body> (somewhat like the results from a search engine).

How do I read specific, limited content from an HTML file?

zq29 · December 3, 2009

preg_match() and RegEx

Psycho · December 3, 2009

<?php

function getHTMLTitleAndBody($fileName)
{
    //Read the HTML content
$content = file_get_contents($fileName);

    //Extract the title
    preg_match("/<title[^>]*>(.*)<\/title>/is", $content, $titleMatch);
    $result->title = $titleMatch[1];

    //Extract words from body up to X number of characters
preg_match("/<body[^>]*>(.*)<\/body>/is", $content, $bodyMatch);
$result->body  = array_shift(explode("\n", wordwrap(strip_tags(trim($bodyMatch[1])), 30)));

return $result;
}

$file = "temp.htm";
$results = getHTMLTitleAndBody($file);

echo "<b>Title:</b> {$results->title}<br><br>\n";
echo "<b>Body:</b> {$results->body}";

?>

Sign In

read contents of html <head> or <body>

Recommended Posts

snorky

Link to comment

Share on other sites

zq29

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information