Helminthophobe Posted March 26, 2008 Share Posted March 26, 2008 Is it possible to create a loop with RegEx when looking for information? I'm sure my terminology is a bit off since I am absolutely new to RegEx so I'll give an example. I've built a script that digs through the source code of another site looking for data (see the bottom of the post for a preview of the code). I'm having trouble pulling the data from the following bit of source code (some source code missing in the example): <img id="ctl00_mainContent_rptWeapons_ctl00_imgWeapon" class="weapon" src="/images/halo3stats/weapons/e2b3837c-c27f-4497-a07d-8e59f153cff6.gif" style="border-width:0px;" /> <div class="num">99 (33.00%)</div></div> <img id="ctl00_mainContent_rptWeapons_ctl01_imgWeapon" class="weapon" src="/images/halo3stats/weapons/5f8fbbf9-6267-4257-9a2d-24f8c2e5441d.gif" style="border-width:0px;" /> <div class="num">71 (23.67%)</div></div> <img id="ctl00_mainContent_rptWeapons_ctl02_imgWeapon" class="weapon" src="/images/halo3stats/weapons/fdb4005f-45a4-472a-8646-9763ebc75aad.gif" style="border-width:0px;" /> <div class="num">45 (15.00%)</div></div> Is it possible to build a loop that finds the following and saves each result in a different variable every time the pattern is found? There is no set number of times the pattern may be found. It will be different each time. It may show up 20 times for one user and only 5 for another. <img id=\"(.+?)" class=\"weapon\" src=\"(.+?)" style=\"border-width:0px;\" \/>\s+<div class=\"num\">(.+?)<\/div><\/div> This is the script I am using now to find the other data that doesn't require a loop or anything. The URL contains the data for $tag. $ch = curl_init(); $timeout = 5; curl_setopt ($ch, CURLOPT_URL, 'http://www.bungie.net/stats/halo3/CareerStats.aspx?player=' . $tag . '&social=true&map=0'); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $in1 = curl_exec($ch); curl_close($ch); preg_match("/Kills :<\/td>\s+<td class=\"values\">(.+?)<\/td>/",$in1, $social_stats_kills); preg_match("/Deaths :<\/td>\s+<td class=\"values\">(.+?)<\/td>/",$in1, $social_stats_deaths); preg_match("/K\/D Ratio :<\/td>\s+<td class=\"values\">(.+?)<\/td>/",$in1, $social_stats_kdr); $h3gamertag = str_replace("%20"," ", $tag); $social_stats_kills = $social_stats_kills[1]; $social_stats_deaths = $social_stats_deaths[1]; $social_stats_kdr = $social_stats_kdr[1]; I hope I made sense. Thank you in advance for any help that is provided. Quote Link to comment Share on other sites More sharing options...
effigy Posted March 26, 2008 Share Posted March 26, 2008 Use preg_match_all. Quote Link to comment Share on other sites More sharing options...
Helminthophobe Posted March 26, 2008 Author Share Posted March 26, 2008 I still have troubles with understanding how to work with arrays and from what I understand preg_match_all saves the data in an array. How would I output the data using my code I posted in the orginal post? Thank you for you help so far. It's much appreciated. Quote Link to comment Share on other sites More sharing options...
effigy Posted March 26, 2008 Share Posted March 26, 2008 Per the docs: If no order flag is given, PREG_PATTERN_ORDER is assumed. PREG_PATTERN_ORDER Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on. The easiest way to get used to arrays is to use pre and print_r to see what you're working with, e.g.: <pre> <?php print_r($array); ?> </pre> Quote Link to comment Share on other sites More sharing options...
Helminthophobe Posted March 27, 2008 Author Share Posted March 27, 2008 I had to wait until I got home to fiddle with this. I was able to figure out how to display the content after playing with it for a while. Thank you for the link and assistance, effigy. Quote Link to comment Share on other sites More sharing options...
Helminthophobe Posted March 27, 2008 Author Share Posted March 27, 2008 I'm still having a little trouble it seems. The following is the source code I am working with (some parts missing that aren't important): class="weapon" src="/images/halo3stats/weapons/0be8dc88-acc4-405d-9b82-1e0d8a4ca2f0.gif" style="border-width:0px;" /> <div class="num">9,318 (26.71%)</div></div> class="weapon" src="/images/halo3stats/weapons/0be8dc88-acc4-405d-9b82-1e0d8a4ca2f0.gif" style="border-width:0px;" /> <div class="num">4,720 (13.53%)</div></div> class="weapon" src="/images/halo3stats/weapons/0be8dc88-acc4-405d-9b82-1e0d8a4ca2f0.gif" style="border-width:0px;" /> <div class="num">3,896 (11.17%)</div></div> class="weapon" src="/images/halo3stats/weapons/0be8dc88-acc4-405d-9b82-1e0d8a4ca2f0.gif" style="border-width:0px;" /> <div class="num">3,460 (9.92%)</div></div> The following is my new code: <? $tag = str_replace(" ","%20",$tag); $ch = curl_init(); $timeout = 5; curl_setopt ($ch, CURLOPT_URL, 'http://www.bungie.net/stats/halo3/CareerStats.aspx?player=' . $tag . '&social=true&map=0'); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $in1 = curl_exec($ch); curl_close($ch); preg_match_all("#class=\"weapon\" src=\"(.+?)\" style=\"border-width:0px;\" \/>\s+<div class=\"num\">(.+?)<\/div><\/div>#",$in1, $weapon_data); echo "<img src=\"http://www.bungie.net" . $weapon_data[1][0] . "\"><br>" . $weapon_data[2][0] . "<br><br>\n"; echo "<img src=\"http://www.bungie.net" . $weapon_data[1][1] . "\"><br>" . $weapon_data[2][1] . "<br><br>\n"; echo "<img src=\"http://www.bungie.net" . $weapon_data[1][2] . "\"><br>" . $weapon_data[2][2] . "<br><br>\n"; echo "<img src=\"http://www.bungie.net" . $weapon_data[1][3] . "\"><br>" . $weapon_data[2][3] . "<br><br>\n"; ?> It works perfect with the exception of the output from $weapon_data[2][0]. This is the output of $weapon_data[2][0]: 9,318Â Â (26.71%) So I decided to separate the "9,318" and the "26.71%". I used the following: preg_match_all("#class=\"weapon\" src=\"(.+?)\" style=\"border-width:0px;\" \/>\s+<div class=\"num\">([\,\d]+)\s\s\(([\.\d]+)\%\)<\/div><\/div>#",$in1, $weapon_data); It doesn't find anything. I tested ([\,\d]+)\s\s\(([\.\d]+)\%\) with the PHP Live Regex Tester and it worked when just looking for 9,318 (26.71%). Any suggestions on a solution? I'm stumped. Quote Link to comment Share on other sites More sharing options...
effigy Posted March 27, 2008 Share Posted March 27, 2008 What character set is the page using? (Check the META tag.) Quote Link to comment Share on other sites More sharing options...
Helminthophobe Posted March 27, 2008 Author Share Posted March 27, 2008 Is this what you mean? <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> charset=utf-8? Quote Link to comment Share on other sites More sharing options...
effigy Posted March 27, 2008 Share Posted March 27, 2008 Yes. You have two options: (1) Use UTF-8 also; or (2) convert the UTF-8 into whatever character set you're using. Quote Link to comment Share on other sites More sharing options...
Helminthophobe Posted March 27, 2008 Author Share Posted March 27, 2008 I'm thinking option 1 will be the easiest but how would I go about option 2? I really, really appreciate the help you've given me. I've been real excited about the results I've been getting from this little project. You've been a huge help! Quote Link to comment Share on other sites More sharing options...
effigy Posted March 27, 2008 Share Posted March 27, 2008 iconv Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.