Jump to content

Parse HTML code


cordoprod

Recommended Posts

Hi! I need to parse some HTML and I'm not quite sure how to do it.

 

I use curl to get the HTML, and I know how to do that and I have all the code for that.

But then I get the source of the whole page, and I don't want that.

 

Check this source out:

 <div>
    <div class="bransjenavn">
        
            
        
        <div class="sort">
            
            <form name="form" action="">
                <select onchange="MM_jumpMenu('parent',this,0)" name="jumpMenu">
                    <option value="/gs/companyList.c?bc=0&q=elektronikk">Standard sortering</option>
                    
                        <option value="/gs/companyList.c?bc=0&q=elektronikk&sort=2">Sorter alfabetisk</option>
                    
                    
                        
                            <option value="/gs/companyList.c?bc=0&q=elektronikk&sort=4">Sorter etter omtaler</option>
                        
                    
                </select>
            </form>                     
                                                                              <div class="treffKart">
                    <a href="/kart/#tab%3Dyellow%26autozoom%3Dtrue%26id%3Dc_Z001UNJL%26id%3Dc_Z0HPLR6L">Vis treff i kart</a>
                </div>
            
        </div>
        <h1>Treff i firmanavn:
            
                    <span>
                            2
                        av
                            254
                        treff
                        -
                        <a href="/gs/companyList.c?bc=0&q=elektronikk">
                            Vis alle
                        </a>
                    </span>
                
        </h1>
    </div>
</div>

 

All I want from that code is "Treff i firmanavn".

Is it possible to remove all the other code? (Be aware of that there should be more of them, so Treff i firmanavn is a category. And that is not the whole source code of the page.

 

Here is my function which just outputs the whole page:

function getCategory($q) {
$url = "http://www.gulesider.no/gs/categoryList.c?q=$q";
    $html = curlGet($url);
    $start = strpos($html, "<h1>");
    $end = strpos($html, ":");
    $html = substr($html, $start, $end-$start);
    
    
    
    preg_match_all('/' . preg_quote($start, '/') . '([^\.)]+)'. preg_quote($end, '/').'/i', $html, $matches);
    
    return $matches[1];
}

Link to comment
https://forums.phpfreaks.com/topic/170490-parse-html-code/
Share on other sites

  • 2 weeks later...

 

All I want from that code is "Treff i firmanavn".

 

Cordoprod, are you allowed to use anything in addition to PHP ?  If so, the following server-side biterscripting script will work.

 

 

# Get source into a str variable. Source may be a
# document on the internet, or in a local file. Use
# correct URL (starting with http:// ) or
# file path instead of "source" below.
var str source ; cat "source" > $source

# Extract and remove portion up to <h1...>.
stex -c -r "^<h1&\>^]" $source > null

# Extract and output portion up to colon .
stex "]^:^" $source

 

 

 

Link to comment
https://forums.phpfreaks.com/topic/170490-parse-html-code/#findComment-909726
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.