Jump to content

preg match all question


blacknight

Recommended Posts

working on a site for a wow guild trying to get some info from a page

 

<div class="s_ach_stat">10<img src="../images/achievements/tiny_shield.gif">
        	(01-13-2009)
        </div>
<span>Explore Storm Peaks</span><span class="achv_desc">Explore Storm Peaks, revealing the covered areas of the world map.</span>
</div>[code] is exactly how it appears in the page source my problem is the () in the statement are throwing it off 

i can get [code]<span>Explore Storm Peaks</span><span class="achv_desc">Explore Storm Peaks, revealing the covered areas of the world map.</span>

to phase out to an array

 

but (01-13-2009) still causes me some issues any ideas guys?

Link to comment
Share on other sites

Well, you don't explain what kind you want stored in the array..

Show us an example (based off that code chunk you provided) what you want out of it.. perhaps even showing what you have coded so far can also shed some light on the matter.

 

From the regex rules page:

 

So you got a problem with regex? Great, here's some guidelines on asking:

1. Describe your problem or background that is important to it.

2. Give sample data (the input data, the haystack...etc..)

3. Give the expected output/matches from your sample data.

4. Give the actual output that you have if you've attempted something already.

5. Provide code if necessary, if your problem concerns it.

6. We assume its a php regex related as this is phpfreaks, but you still need to specifiy (if not obvious) if you are talking about POSIX (ereg) or PCRE (preg) flavor.

7. Be patient/grateful and don't demand things. Regex questions may take longer than expected to be answered, they're tougher sometimes.

Link to comment
Share on other sites

ok i am pulling

 

<div class="s_ach_stat">10<img src="../images/achievements/tiny_shield.gif">

        (01-13-2009)

        </div>

<span>Explore Storm Peaks</span><span class="achv_desc">Explore Storm Peaks, revealing the covered areas of the world map.</span>

 

 

from a web page in this example i want to pull "10" "01-13-2009" "Explore Storm Peaks" "Explore Storm Peaks, revealing the covered areas of the world map." from the text

 

using

 

preg_match_all('|<span>(.+?)</span><span class="achv_desc">(.+?)</span>|', $text, $match3);

 

i can get

Array
(
    [0] => Array
        (
            [0] => Explore Howling FjordExplore Howling Fjord, revealing the covered areas of the world map.
            [1] => Explore Borean TundraExplore Borean Tundra, revealing the covered areas of the world map.
            [2] => Explore DragonblightExplore Dragonblight, revealing the covered areas of the world map.
            [3] => Explore Storm PeaksExplore Storm Peaks, revealing the covered areas of the world map.
            [4] => The Green Hills of StranglethornComplete all of Hemet Nesingwary quests in Stranglethorn Vale up to and including The Green Hills of Stranglethorn and Big Game Hunter.
        )

    [1] => Array
        (
            [0] => Explore Howling Fjord
            [1] => Explore Borean Tundra
            [2] => Explore Dragonblight
            [3] => Explore Storm Peaks
            [4] => The Green Hills of Stranglethorn
        )

    [2] => Array
        (
            [0] => Explore Howling Fjord, revealing the covered areas of the world map.
            [1] => Explore Borean Tundra, revealing the covered areas of the world map.
            [2] => Explore Dragonblight, revealing the covered areas of the world map.
            [3] => Explore Storm Peaks, revealing the covered areas of the world map.
            [4] => Complete all of Hemet Nesingwary quests in Stranglethorn Vale up to and including The Green Hills of Stranglethorn and Big Game Hunter.
        )

)

 

but i cant get the "10" or the "01-13-2009" one issue is because of the () in the statement.. i hope this is clearer

Link to comment
Share on other sites

Well, here is what I come up with (not sure if this is what you are looking for):

 

$str = '<div class="s_ach_stat">10<img src="../images/achievements/tiny_shield.gif">
              (01-13-2009)
           </div>
<span>Explore Storm Peaks</span><span class="achv_desc">Explore Storm Peaks, revealing the covered areas of the world map.</span>';
$str = preg_replace('#\s{2,}#', '', $str);
$arr = preg_split('#<[^>]+>#', $str, -1, PREG_SPLIT_NO_EMPTY);
for($i = 0, $total = count($arr); $i < $total; $i++){
   $arr[$i] = trim($arr[$i], '()');
}
echo '<pre>'.print_r($arr, true);

 

Output:

Array
(
    [0] => 10
    [1] => 01-13-2009
    [2] => Explore Storm Peaks
    [3] => Explore Storm Peaks, revealing the covered areas of the world map.
)

Link to comment
Share on other sites

this is verry close to what i need it is grabing the wright info from the statement but there are 5 statements like this and the info is never allways the same

 

i used

preg_match_all('|class="s_ach_stat">(?<points>\d+)|', $text, $match4);

echo '<pre>';
print_r($match4);

to get the points but the date where it is in brackets is still giving me issues.

Link to comment
Share on other sites

What was wrong with nrg's solution? Below is my simplification: the expression is condensed and the looping is by reference, making the appearance a little more kind.

 

<pre>
<?php
$str = <<<STR
<div class="s_ach_stat">10<img src="../images/achievements/tiny_shield.gif">
              (01-13-2009)
           </div>
<span>Explore Storm Peaks</span><span class="achv_desc">Explore Storm Peaks, revealing the covered areas of the world map.</span>
STR;
$pieces = preg_split('#\s*<[^>]*>\s*#', $str, -1, PREG_SPLIT_NO_EMPTY);
foreach ($pieces as &$piece) {
	$piece = trim($piece, '()');
}
print_r($pieces);
?>
</pre>

Link to comment
Share on other sites

What was wrong with nrg's solution? Below is my simplification: the expression is condensed and the looping is by reference, making the appearance a little more kind.

 

<pre>
<?php
$str = <<<STR
<div class="s_ach_stat">10<img src="../images/achievements/tiny_shield.gif">
              (01-13-2009)
           </div>
<span>Explore Storm Peaks</span><span class="achv_desc">Explore Storm Peaks, revealing the covered areas of the world map.</span>
STR;
$pieces = preg_split('#\s*<[^>]*>\s*#', $str, -1, PREG_SPLIT_NO_EMPTY);
foreach ($pieces as &$piece) {
	$piece = trim($piece, '()');
}
print_r($pieces);
?>
</pre>

 

Yep, even better [again! lol]

 

As for the pattern, this one is indeed more simple (I really tend to overthink things sometimes).

I realized afterwards about using a foreach and passing the value by reference instead of the for loop.. I even tried out:

$pieces = array_map(create_function('$val', 'return(trim($val, "()"));'), $pieces);

But in a timed test, the foreach with values passed by reference was the fastest... D'oh!

 

SO basically, to the OP, Effigy's method is the fastest / sleekest. Go with that one. Not sure why you're so hooked up on this bracket stuff. Somtimes, the solution is not all pure regex, but a combination of regex and other functionailties...

Link to comment
Share on other sites

On second thought...

 

<pre>
<?php
   $str = <<<STR
<div class="s_ach_stat">10<img src="../images/achievements/tiny_shield.gif">
              (01-13-2009)
           </div>
<span>Explore Storm Peaks</span><span class="achv_desc">Explore Storm Peaks, revealing the covered areas of the world map.</span>
STR;
   $pieces = preg_split('#[()]|\s*<[^>]*>\s*#', $str, -1, PREG_SPLIT_NO_EMPTY);
   print_r($pieces);
?>
</pre>

Link to comment
Share on other sites

Actually Effigy, would you believe your previous solution is faster?

 

$str = <<<STR
<div class="s_ach_stat">10<img src="../images/achievements/tiny_shield.gif">
              (01-13-2009)
           </div>
<span>Explore Storm Peaks</span><span class="achv_desc">Explore Storm Peaks, revealing the covered areas of the world map.</span>
STR;
$loop = 100;

$time_start = microtime(true);
for($i = 0; $i < $loop; $i++){
   $pieces = preg_split('#[()]|\s*<[^>]*>\s*#', $str, -1, PREG_SPLIT_NO_EMPTY);
}
$time_end = microtime(true);
$elapsed_time = round($time_end-$time_start, 4);
echo '<pre>'.print_r($pieces, true);
echo 'Using: #[()]|\s*<[^>]*>\s*#<br />Time: ' . $elapsed_time . '<br />';

$time_start = microtime(true);
for($i = 0; $i < $loop; $i++){
   $pieces = preg_split('#\s*<[^>]*>\s*#', $str, -1, PREG_SPLIT_NO_EMPTY);
   foreach ($pieces as &$piece) {
      $piece = trim($piece, '()');
   }
}
$time_end = microtime(true);
$elapsed_time = round($time_end-$time_start, 4);
echo '<pre>'.print_r($pieces, true);
echo 'Using: #\s*<[^>]*>\s*# with foreach and values by reference<br />Time: ' . $elapsed_time . '<br />';

 

output:

Array
(
    [0] => 10
    [1] => 01-13-2009
    [2] => Explore Storm Peaks
    [3] => Explore Storm Peaks, revealing the covered areas of the world map.
)
Using: #[()]|\s*<[^>]*>\s*#
Time: 0.0092

Array
(
    [0] => 10
    [1] => 01-13-2009
    [2] => Explore Storm Peaks
    [3] => Explore Storm Peaks, revealing the covered areas of the world map.
)
Using: #\s*<[^>]*>\s*# with foreach and values by reference
Time: 0.0068

 

I looped it 100 times, and even with the foreach trim functionality included within the loop, it's a quicker solution.. Removing the foreach out of the loop made that solution faster still.

 

Granted, we don't know how much code the OP has to sift through, and perhaps as the array list gets larger, maybe your newest solution would start to close the gap?

 

I may be splitting hairs here.. but I think that since trim is faster than regex, I would personally side with your previous solution (but perhaps in reality, the speed might be negligible and thus might not matter much [if at all]).

Link to comment
Share on other sites

try

<?php
$str = '<div class="s_ach_stat">10<img src="../images/achievements/tiny_shield.gif">
              (01-13-2009)
           </div>
<span>Explore Storm Peaks</span><span class="achv_desc">Explore Storm Peaks, revealing the covered areas of the world map.</span>';
$pattern = '/<div class="s_ach_stat">(\d+)<img[^>]+>\s*\(([^\)]*)\)\s*<\/div>\s+<span>([^<]*)<\/span><span[^>]*>([^<]*)</s';
preg_match($pattern, $str, $arr);
echo '<pre>',print_r($arr, true),'</pre>';
?>

Link to comment
Share on other sites

  • 1 month later...

wow a lot of help ive since solved this issue and moved to a nother one im trying to repair a script for the wow roster comunity called armory sync and i am getting more issues with regex

 

imformation is stored in an ahtml array that looks like this

 

<div class="(VAL1)">
<div class="rep-lbg">
<div class="rep-lr">
<div class="rep-ll">
<ul>
<li class="faction-name">
<a class="staticTip" href="(VAL3)" onMouseOver=" setTipText('Click here to learn more about this faction');" target="_blank">(VAL4)</a>
</li>
<li class="faction-bar">
<a class="rep-data">(VAL5)</a>
<div class="bar-color" style=" width: 12%"></div>
</li>
<li class="faction-level">
<p class="rep-icon">(VAL6)</p>
</li>
</ul>
</div>
</div>
</div>
</div>

 

And i have tryed to get this to work with all the codes pasted here for me and i got nothing .... there are 10-35 groups of lines like this in the array so i need everything to match indexes in arrays so i can store the data any help would be awsome guys

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.