Jump to content

read contents of html <head> or <body>


snorky

Recommended Posts

I wrote (like 10,000,000 others) a script to recursively read and list the contents of directories. However, when I print the results to the screen I want to include the contents of the <title></title> block and/or the first nn words in the <body></body> (somewhat like the results from a search engine).

 

How do I read specific, limited content from an HTML file?

Link to comment
https://forums.phpfreaks.com/topic/183864-read-contents-of-html-or/
Share on other sites

<?php

function getHTMLTitleAndBody($fileName)
{
    //Read the HTML content
$content = file_get_contents($fileName);

    //Extract the title
    preg_match("/<title[^>]*>(.*)<\/title>/is", $content, $titleMatch);
    $result->title = $titleMatch[1];

    //Extract words from body up to X number of characters
preg_match("/<body[^>]*>(.*)<\/body>/is", $content, $bodyMatch);
$result->body  = array_shift(explode("\n", wordwrap(strip_tags(trim($bodyMatch[1])), 30)));

return $result;
}

$file = "temp.htm";
$results = getHTMLTitleAndBody($file);

echo "<b>Title:</b> {$results->title}<br><br>\n";
echo "<b>Body:</b> {$results->body}";

?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.