RegEx Loop

Helminthophobe · March 26, 2008

Is it possible to create a loop with RegEx when looking for information? I'm sure my terminology is a bit off since I am absolutely new to RegEx so I'll give an example. I've built a script that digs through the source code of another site looking for data (see the bottom of the post for a preview of the code). I'm having trouble pulling the data from the following bit of source code (some source code missing in the example):

<img id="ctl00_mainContent_rptWeapons_ctl00_imgWeapon" class="weapon" src="/images/halo3stats/weapons/e2b3837c-c27f-4497-a07d-8e59f153cff6.gif" style="border-width:0px;" />
     <div class="num">99  (33.00%)</div></div>
<img id="ctl00_mainContent_rptWeapons_ctl01_imgWeapon" class="weapon" src="/images/halo3stats/weapons/5f8fbbf9-6267-4257-9a2d-24f8c2e5441d.gif" style="border-width:0px;" />
     <div class="num">71  (23.67%)</div></div>
<img id="ctl00_mainContent_rptWeapons_ctl02_imgWeapon" class="weapon" src="/images/halo3stats/weapons/fdb4005f-45a4-472a-8646-9763ebc75aad.gif" style="border-width:0px;" />
     <div class="num">45  (15.00%)</div></div>

Is it possible to build a loop that finds the following and saves each result in a different variable every time the pattern is found? There is no set number of times the pattern may be found. It will be different each time. It may show up 20 times for one user and only 5 for another.

<img id=\"(.+?)" class=\"weapon\" src=\"(.+?)" style=\"border-width:0px;\" \/>\s+<div class=\"num\">(.+?)<\/div><\/div>

This is the script I am using now to find the other data that doesn't require a loop or anything. The URL contains the data for $tag.

$ch = curl_init();
$timeout = 5;
curl_setopt ($ch, CURLOPT_URL, 'http://www.bungie.net/stats/halo3/CareerStats.aspx?player=' . $tag . '&social=true&map=0');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$in1 = curl_exec($ch);
curl_close($ch);

preg_match("/Kills :<\/td>\s+<td class=\"values\">(.+?)<\/td>/",$in1, $social_stats_kills); 
preg_match("/Deaths :<\/td>\s+<td class=\"values\">(.+?)<\/td>/",$in1, $social_stats_deaths); 
preg_match("/K\/D Ratio :<\/td>\s+<td class=\"values\">(.+?)<\/td>/",$in1, $social_stats_kdr); 

$h3gamertag = str_replace("%20"," ", $tag);
$social_stats_kills = $social_stats_kills[1];
$social_stats_deaths = $social_stats_deaths[1];
$social_stats_kdr = $social_stats_kdr[1];

I hope I made sense. Thank you in advance for any help that is provided.

effigy · March 26, 2008

Use preg_match_all.

Helminthophobe · March 26, 2008

I still have troubles with understanding how to work with arrays and from what I understand preg_match_all saves the data in an array. How would I output the data using my code I posted in the orginal post?

Thank you for you help so far. It's much appreciated.

effigy · March 26, 2008

Per the docs:

If no order flag is given, PREG_PATTERN_ORDER is assumed.

PREG_PATTERN_ORDER

Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.

The easiest way to get used to arrays is to use pre and print_r to see what you're working with, e.g.:

<pre>
<?php
print_r($array);
?>
</pre>

Helminthophobe · March 27, 2008

I had to wait until I got home to fiddle with this. I was able to figure out how to display the content after playing with it for a while. Thank you for the link and assistance, effigy.

Helminthophobe · March 27, 2008

I'm still having a little trouble it seems.

The following is the source code I am working with (some parts missing that aren't important):

class="weapon" src="/images/halo3stats/weapons/0be8dc88-acc4-405d-9b82-1e0d8a4ca2f0.gif" style="border-width:0px;" />
     <div class="num">9,318  (26.71%)</div></div>
class="weapon" src="/images/halo3stats/weapons/0be8dc88-acc4-405d-9b82-1e0d8a4ca2f0.gif" style="border-width:0px;" />
     <div class="num">4,720  (13.53%)</div></div>
class="weapon" src="/images/halo3stats/weapons/0be8dc88-acc4-405d-9b82-1e0d8a4ca2f0.gif" style="border-width:0px;" />
     <div class="num">3,896  (11.17%)</div></div>
class="weapon" src="/images/halo3stats/weapons/0be8dc88-acc4-405d-9b82-1e0d8a4ca2f0.gif" style="border-width:0px;" />
     <div class="num">3,460  (9.92%)</div></div>

The following is my new code:

<?
$tag = str_replace(" ","%20",$tag);

$ch = curl_init();
$timeout = 5;
curl_setopt ($ch, CURLOPT_URL, 'http://www.bungie.net/stats/halo3/CareerStats.aspx?player=' . $tag . '&social=true&map=0');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$in1 = curl_exec($ch);
curl_close($ch);

preg_match_all("#class=\"weapon\" src=\"(.+?)\" style=\"border-width:0px;\" \/>\s+<div class=\"num\">(.+?)<\/div><\/div>#",$in1, $weapon_data);

echo "<img src=\"http://www.bungie.net" . $weapon_data[1][0] . "\"><br>" . $weapon_data[2][0] . "<br><br>\n";
echo "<img src=\"http://www.bungie.net" . $weapon_data[1][1] . "\"><br>" . $weapon_data[2][1] . "<br><br>\n";
echo "<img src=\"http://www.bungie.net" . $weapon_data[1][2] . "\"><br>" . $weapon_data[2][2] . "<br><br>\n";
echo "<img src=\"http://www.bungie.net" . $weapon_data[1][3] . "\"><br>" . $weapon_data[2][3] . "<br><br>\n";

?>

It works perfect with the exception of the output from $weapon_data[2][0]. This is the output of $weapon_data[2][0]:

9,318Â Â (26.71%)

So I decided to separate the "9,318" and the "26.71%". I used the following:

preg_match_all("#class=\"weapon\" src=\"(.+?)\" style=\"border-width:0px;\" \/>\s+<div class=\"num\">([\,\d]+)\s\s\(([\.\d]+)\%\)<\/div><\/div>#",$in1, $weapon_data);

It doesn't find anything. I tested ([\,\d]+)\s\s$([\.\d]+)\%$ with the PHP Live Regex Tester and it worked when just looking for 9,318 (26.71%). Any suggestions on a solution? I'm stumped.

effigy · March 27, 2008

What character set is the page using? (Check the META tag.)

Helminthophobe · March 27, 2008

Is this what you mean?

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

charset=utf-8?

effigy · March 27, 2008

Yes. You have two options: (1) Use UTF-8 also; or (2) convert the UTF-8 into whatever character set you're using.

Helminthophobe · March 27, 2008

I'm thinking option 1 will be the easiest but how would I go about option 2?

I really, really appreciate the help you've given me. I've been real excited about the results I've been getting from this little project. You've been a huge help!

effigy · March 27, 2008

iconv

Sign In

RegEx Loop

Recommended Posts

Helminthophobe

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

Helminthophobe

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

Helminthophobe

Link to comment

Share on other sites

Helminthophobe

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

Helminthophobe

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

Helminthophobe

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information