Jump to content

Count links on a page, sort them and grade them.


guymclarenza

Recommended Posts

This function is not working but I need some addd info from the function, Assume $content is populated, I want to count outbound links, ie links going to another website and internal links those staying on the website but going to another page. I also want to list said links by outbount and internal with link text and url.,  It would also be nice if I could at the same time fail or pass link text if neccessary,  If linktext is "click here" or something equally non descriptive it would be marked with a red cross, or green tick if good,

Why is this not working? and how do I make the neccessary changes to the script to add the functionality

function checkLinksForDescriptiveText($content) {
    // Example: Check links for descriptive link text
    preg_match_all('/<a [^>]*href=["\']([^"\']+)["\'][^>]*>([^<]+)<\/a>/i', $content, $links);

    $total_links = count($links[1]);
    $descriptive_links = 0;

    foreach ($links[2] as $linkText) {
        $linkText = strip_tags($linkText);
        if (strlen($linkText) > 0 && strlen($linkText) < 50) {
            $descriptive_links++;
        }
    }

    return [$total_links, $descriptive_links];
}



 

 

Link to comment
Share on other sites

  • 2 weeks later...

This function parses the input using an HTML 4 parser. The parsing rules of HTML 5, which is what modern web browsers use, are different. Depending on the input this might result in a different DOM structure. Therefore this function cannot be safely used for sanitizing HTML.

As an example, some HTML elements will implicitly close a parent element when encountered. The rules for automatically closing parent elements differ between HTML 4 and HTML 5 and thus the resulting DOM structure that DOMDocument sees might be different from the DOM structure a web browser sees, possibly allowing an attacker to break the resulting HTML.

 

Seems like that is not a fantastic option

Link to comment
Share on other sites

On 11/20/2023 at 5:40 AM, guymclarenza said:

This function parses the input using an HTML 4 parser. The parsing rules of HTML 5, which is what modern web browsers use, are different. Depending on the input this might result in a different DOM structure. Therefore this function cannot be safely used for sanitizing HTML.

 

What function are you talking about?
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.