Jump to content

Recommended Posts

Ok so I've got a difficult regex problem (well, for me anyways)

 

I'm trying to parse some HTML, a forum.

 

The pattern successfully pulls out usernames and post counts, but I also need it to pull out the username of the person who edited a post, if it exists.

 

Is there a way to tell regex (preg) to "Keep going over any text and new lines until we see the edited by name, or if we find our original pattern stop."

 

Here is the pattern that successfully pulls out usernames and post counts

&which=boards">([a-zA-Z-_]*)<\/a> \s*.*?alt="(\d*) posts"

 

Now heres what I tried to make it find the block of HTML that may or may not exist that contains the Edited by text

&which=boards">([a-zA-Z-_]*)<\/a> \s*.*?alt="(\d*) posts"[.*\s*]*(?:<span(?:.*)>Edited By:<\/span>\s*<a(?:.*)>([a-zA-Z-_]*)<\/a>)?

 

It never matches the edited by text. Any ideas?

 

Note that the number of lines between where it finds the post count and the Edited By text can change.

 

Huge kudos to anyone who can help me

 

 

 

 

 

[attachment deleted by admin]

Try

 

'~&which=boards">([a-z_-]*)</a> .+?alt="([0-9]+) posts"(?:.+?<span[^>]*>Edited By:</span>\s*<[^>]+>([a-z_-]*)</a>)?~is'

 

Probably not the most elegant way to do it, but I hope it at least works ^^

 

Edit: Actually, I'm afraid my pattern captures the wrong "Edited By" user name, when there's no edit. Ugh. Bed time.

Hmmm, is this along the lines of what you are looking for?

 

Example:

$html = <<<EOD
<a href="&which=boards">RandomUsername</a> 
<span><a alt="16609 posts" border="0 hspace="0" vspace="0" align="absmiddle"></span>
<br/>

<span style="font-weight:bold;">Edited By:</span>
<a href="&which=boards">RandomUsername</a>

<a href="&which=boards">AnotherRandomUser</a> 
<span><a alt="9719 posts" border="0 hspace="0" vspace="0" align="absmiddle"></span>
<br/>

<a href="&which=boards">SomeGuy</a> 
<span><a alt="16609 posts" border="0 hspace="0" vspace="0" align="absmiddle"></span>
<br/>
EOD;

preg_match_all('#href="&which=boards">(.+?)</a> \R+<span><a alt="(\d+) posts[^>]+></span>\R+<br/>(?:\R+<span.*?>Edited By:</span>\R+<a href="&which=boards">(.+?)</a>)?#si', $html, $matches, PREG_SET_ORDER);
$total = count($matches);
for ($a = 0 ; $a < $total ; $a++) {
    unset($matches[$a][0]); // get ride of array[0] (which holds everything that preg_match_all matches)
}

echo '<pre>'.print_r($matches, true);

 

Output:

Array
(
    [0] => Array
        (
            [1] => RandomUsername
            [2] => 16609
            [3] => RandomUsername
        )

    [1] => Array
        (
            [1] => AnotherRandomUser
            [2] => 9719
        )

    [2] => Array
        (
            [1] => SomeGuy
            [2] => 16609
        )

)

 

In either case, for future reference Prismatic, instead of including screenshots, perhaps cutting and pasting the sample portion code in question (saves from having to retype the sample into the IDE to test things out).

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.