Jump to content

Recommended Posts

You have bigger problems than the conversion:

  • Your pattern was always wrong, because the [[:graph:]]+ part is greedy and may consume the next anchor or link as well. If the code seemingly “worked” in the past, that's because the graph class doesn't include whitespace. But this is pure luck. Try input without whitespace to see the pattern fail miserably.
  • There are no security measures whatsoever. If the input comes from the users or can be manipulated, you're wide open to cross-site scripting attacks. Anchor elements are particularly nasty in this regard, because simple HTML-escaping isn't enough; people can still inject code with javascript: or data: URLs.

Inventing your own language and trying to parse it with regex gymnastics is rarely a good idea. Use a standard markup language like Markdown and a proper parser.

 

parsedown looks OK. Unfortunately, they haven't thought about unsafe URLs either, so you need to modify the class a bit:

<?php

require_once '/path/to/parsedown/or/autoloader';



class SafeMarkdown extends Parsedown
{
    protected $allowedURLSchemes;

    public function __construct($allowedURLSchemes = ['http', 'https', 'mailto'])
    {
        // disable embedded HTML markup by default
        $this->setMarkupEscaped(true);

        // only accept specific URL schemes to prevent XSS attacks through javascript: or data: URIs
        $this->allowedURLSchemes = $allowedURLSchemes;
    }

    protected function inlineLink($excerpt)
    {
        $linkData = parent::inlineLink($excerpt);

        // only allow specific URLs schemes
        $url = parse_url($linkData['element']['attributes']['href']);

        if ($url === false)
        {
            throw new RuntimeException('Malformed URL while parsing link: '.$url);
        }

        if (isset($url['scheme']) && !in_array(strtolower($url['scheme']), $this->allowedURLSchemes, true))
        {
            throw new RuntimeException('Unexpected URL scheme while parsing link: '.$url['scheme'].' (allowed: '.implode(', ', $this->allowedURLSchemes).')');
        }

        return $linkData;
    }
}
<?php

require_once '/path/to/class/or/autoloader';



$markdownParser = new SafeMarkdown();

echo $markdownParser->text("[I'm an inline-style link](https://www.google.com)");

// test unsafe URL scheme
echo $markdownParser->text("[I'm an inline-style link](javascript:alert('XSS'))");

What have you tried so far? Do you know what those regexes are doing now? Have you learned about PCRE? Have you checked the documentation?

 

Is the following conversion correct?

 

$output = preg_replace('/\[anchor="([[:graph:]]+)"\]/', '<a name="\\1"></a>', $output);

 

$output = preg_replace('/\[link="([[:graph:]]+)"\]/', '<a href="\\1">', $output);

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.