Modernvox Posted February 9, 2010 Share Posted February 9, 2010 preg_match_all('/<a href="([^"]+)">([^<]+)<\/a><font size="-1">([^"]+)<\/font>/s', $html,$posts,PREG_SET_ORDER); Here's a few of the target URL's <p><a href="http://southcoast.craigslist.org/muc/1564255288.html">Drummer looking for weeknight gigs</a> - <font size="-1"> (New Bedford)</font></p> <p><a href="http://southcoast.craigslist.org/muc/1564167149.html">Pagan Musicians</a> - </p> <p><a href="http://southcoast.craigslist.org/muc/1564061446.html">Seeking 5th member</a> - <font size="-1"> (RI/Southern, MA)</font></p> <p><a href="http://southcoast.craigslist.org/muc/1563926651.html">Gigging cover band in search for new lead guitarist </a> - <font size="-1"> ((south shore))</font></p> <p><a href="http://southcoast.craigslist.org/muc/1563506659.html">Acoustic Guitarist Wanted</a> - <font size="-1"> (New Bedford/Fall River/Providence/East Bay area)</font></p> <p><a href="http://southcoast.craigslist.org/muc/1563233552.html">Need Help Writing Raps?</a> - <font size="-1"> (Fall River, Ma)</font> <span class="p"> pic</span></p> <h4>Wed Jan 20</h4> <p><a href="http://southcoast.craigslist.org/muc/1562404109.html">drums and guitar looking for bass w/ vocals</a> - <font size="-1"> (taunton)</font></p> <p><a href="http://southcoast.craigslist.org/muc/1562389093.html">wack ass egyptians need guitarist</a> - <font size="-1"> (quincy/whitman)</font></p> <p><a href="http://southcoast.craigslist.org/muc/1561458375.html">Looking for a few good men - Bass/Baritones</a> - <font size="-1"> (Fall River Area)</font> <span class="p"> pic</span></p> <h4>Tue Jan 19</h4> <p><a href="http://southcoast.craigslist.org/muc/1561104614.html">singer/guitarist looking</a> - </p> <p><a href="http://southcoast.craigslist.org/muc/1560864071.html">south shore cover band needs bass</a> - <font size="-1"> (plymouth,ma)</font></p> <p><a href="http://southcoast.craigslist.org/muc/1559645835.html">Looking for Rhythm Guitarist</a> - <font size="-1"> (Taunton, Ma)</font></p> <h4>Mon Jan 18</h4> <p><a href="http://southcoast.craigslist.org/muc/1558191492.html">Working Cover Rock Band Looking for GOOD Lead Singer</a> - <font size="-1"> (SE MA/RI)</font></p> <p><a href="http://southcoast.craigslist.org/muc/1557842807.html">wanted: guitar player (christian)</a> - <font size="-1"> (Dartmouth)</font></p> Here's my code which worked fine up until yesterday. Now it only works when I strip the <font syntax at the end. <?php error_reporting(E_ALL); ini_set("display_errors", 1); $st = isset($_POST['submit']) ? $_POST['state'] : ''; $urls= array("http://" . $st . ".craigslist.org"); foreach ($urls as $url) { $html = file_get_contents("$url/muc/"); preg_match_all('/<a href="([^"]+)">([^<]+)<\/a><font size="-1">([^"]+)<\/font>/s', $html,$posts,PREG_SET_ORDER); //echo "<pre>";print_r($posts); $i = 1; //set start point; $limit = 60; //set limit; foreach ($posts as $post) { //print_r $post[0]; //HTML $post[2] = str_ireplace($url,"",$post[2]); //remove domain echo "<a href=\"$url{$post[1]}\" target=\"_blank\">{$post[2]}<font size=\"3\">{$post[3]}</font></a><br />"; print "<BR />\n"; if ($i == $limit) { break; } $i++; } } ?> [code] When I remove <font size="-1">([^"]+)<\/font> ir works however it displays all some thinks I don't want as before this Regex worked perfect? Thanks in advance Quote Link to comment Share on other sites More sharing options...
MadTechie Posted February 10, 2010 Share Posted February 10, 2010 That's because they added a - a simple RegEx update would be <a href="([^"]+)">([^<]+)</a> - (?:<font size="-1">([^"]+)</font>)? Also i have pointed this out before but do you have permission to collect this data ? as if you don't it would be unlawful! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.