snorky Posted December 3, 2009 Share Posted December 3, 2009 I wrote (like 10,000,000 others) a script to recursively read and list the contents of directories. However, when I print the results to the screen I want to include the contents of the <title></title> block and/or the first nn words in the <body></body> (somewhat like the results from a search engine). How do I read specific, limited content from an HTML file? Link to comment https://forums.phpfreaks.com/topic/183864-read-contents-of-html-or/ Share on other sites More sharing options...
zq29 Posted December 3, 2009 Share Posted December 3, 2009 preg_match() and RegEx Link to comment https://forums.phpfreaks.com/topic/183864-read-contents-of-html-or/#findComment-970701 Share on other sites More sharing options...
Psycho Posted December 3, 2009 Share Posted December 3, 2009 <?php function getHTMLTitleAndBody($fileName) { //Read the HTML content $content = file_get_contents($fileName); //Extract the title preg_match("/<title[^>]*>(.*)<\/title>/is", $content, $titleMatch); $result->title = $titleMatch[1]; //Extract words from body up to X number of characters preg_match("/<body[^>]*>(.*)<\/body>/is", $content, $bodyMatch); $result->body = array_shift(explode("\n", wordwrap(strip_tags(trim($bodyMatch[1])), 30))); return $result; } $file = "temp.htm"; $results = getHTMLTitleAndBody($file); echo "<b>Title:</b> {$results->title}<br><br>\n"; echo "<b>Body:</b> {$results->body}"; ?> Link to comment https://forums.phpfreaks.com/topic/183864-read-contents-of-html-or/#findComment-970733 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.