cordoprod Posted August 16, 2009 Share Posted August 16, 2009 Hi! I need to parse some HTML and I'm not quite sure how to do it. I use curl to get the HTML, and I know how to do that and I have all the code for that. But then I get the source of the whole page, and I don't want that. Check this source out: <div> <div class="bransjenavn"> <div class="sort"> <form name="form" action=""> <select onchange="MM_jumpMenu('parent',this,0)" name="jumpMenu"> <option value="/gs/companyList.c?bc=0&q=elektronikk">Standard sortering</option> <option value="/gs/companyList.c?bc=0&q=elektronikk&sort=2">Sorter alfabetisk</option> <option value="/gs/companyList.c?bc=0&q=elektronikk&sort=4">Sorter etter omtaler</option> </select> </form> <div class="treffKart"> <a href="/kart/#tab%3Dyellow%26autozoom%3Dtrue%26id%3Dc_Z001UNJL%26id%3Dc_Z0HPLR6L">Vis treff i kart</a> </div> </div> <h1>Treff i firmanavn: <span> 2 av 254 treff - <a href="/gs/companyList.c?bc=0&q=elektronikk"> Vis alle </a> </span> </h1> </div> </div> All I want from that code is "Treff i firmanavn". Is it possible to remove all the other code? (Be aware of that there should be more of them, so Treff i firmanavn is a category. And that is not the whole source code of the page. Here is my function which just outputs the whole page: function getCategory($q) { $url = "http://www.gulesider.no/gs/categoryList.c?q=$q"; $html = curlGet($url); $start = strpos($html, "<h1>"); $end = strpos($html, ":"); $html = substr($html, $start, $end-$start); preg_match_all('/' . preg_quote($start, '/') . '([^\.)]+)'. preg_quote($end, '/').'/i', $html, $matches); return $matches[1]; } Link to comment https://forums.phpfreaks.com/topic/170490-parse-html-code/ Share on other sites More sharing options...
DEVILofDARKNESS Posted August 16, 2009 Share Posted August 16, 2009 Maybe you should try to split the code twice, first all text until <h1> and next all text after </h1> Link to comment https://forums.phpfreaks.com/topic/170490-parse-html-code/#findComment-899363 Share on other sites More sharing options...
cordoprod Posted August 16, 2009 Author Share Posted August 16, 2009 Could you please show me some code? Link to comment https://forums.phpfreaks.com/topic/170490-parse-html-code/#findComment-899437 Share on other sites More sharing options...
PatrickMc Posted August 31, 2009 Share Posted August 31, 2009 All I want from that code is "Treff i firmanavn". Cordoprod, are you allowed to use anything in addition to PHP ? If so, the following server-side biterscripting script will work. # Get source into a str variable. Source may be a # document on the internet, or in a local file. Use # correct URL (starting with http:// ) or # file path instead of "source" below. var str source ; cat "source" > $source # Extract and remove portion up to <h1...>. stex -c -r "^<h1&\>^]" $source > null # Extract and output portion up to colon . stex "]^:^" $source Link to comment https://forums.phpfreaks.com/topic/170490-parse-html-code/#findComment-909726 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.