snorky Posted December 3, 2009 Share Posted December 3, 2009 I wrote (like 10,000,000 others) a script to recursively read and list the contents of directories. However, when I print the results to the screen I want to include the contents of the <title></title> block and/or the first nn words in the <body></body> (somewhat like the results from a search engine). How do I read specific, limited content from an HTML file? Quote Link to comment Share on other sites More sharing options...
zq29 Posted December 3, 2009 Share Posted December 3, 2009 preg_match() and RegEx Quote Link to comment Share on other sites More sharing options...
Psycho Posted December 3, 2009 Share Posted December 3, 2009 <?php function getHTMLTitleAndBody($fileName) { //Read the HTML content $content = file_get_contents($fileName); //Extract the title preg_match("/<title[^>]*>(.*)<\/title>/is", $content, $titleMatch); $result->title = $titleMatch[1]; //Extract words from body up to X number of characters preg_match("/<body[^>]*>(.*)<\/body>/is", $content, $bodyMatch); $result->body = array_shift(explode("\n", wordwrap(strip_tags(trim($bodyMatch[1])), 30))); return $result; } $file = "temp.htm"; $results = getHTMLTitleAndBody($file); echo "<b>Title:</b> {$results->title}<br><br>\n"; echo "<b>Body:</b> {$results->body}"; ?> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.