theprovider Posted October 24, 2007 Share Posted October 24, 2007 Ok, I'm a bit experienced with PHP and Regex, but this is just over my head. Here is an example for the original string input: <td><font face="verdana,sans-serif" size=1> 153277</td> <td> <a href="/url/"><font face="verdana,sans-serif" size=1 color=#000000>DATA< /a ></td> I would like to scrap everything except '153277', '/url/', and 'DATA' -- and I would prefer these to be in seperate strings. For example, $number = "153277", $url = "/url/", and $data = "DATA" What would be the regex and PHP code to do this? I'm just completely lost.. Forgive my newbieness. Quote Link to comment Share on other sites More sharing options...
Zane Posted October 25, 2007 Share Posted October 25, 2007 $string = " 153277 DATA"; $regs = array(); ereg("(\d*)\n.*href=\"(.*)\".*#.*>(.*) $number = $regs[1]; $url = $regs[2]; $data = $regs[3]; Quote Link to comment Share on other sites More sharing options...
theprovider Posted October 25, 2007 Author Share Posted October 25, 2007 $string = "<td><font face="verdana,sans-serif" size=1> 153277</td> <td> <a href="/url/"><font face="verdana,sans-serif" size=1 color=#000000>DATA< /a ></td>"; $regs = array(); ereg("(\d*)</td>\n.*href=\"(.*)\".*#.*>(.*)<\s?/a", $string, $regs); $number = $regs[1]; $url = $regs[2]; $data = $regs[3]; Thanks! That's exactly what I needed, however, in my idiocy, I asked the question incorrectly. Here is the problem: I have a full HTML document in my string, and I forgot to ask how to remove everything in the document except what is between the <td></td> tags. To illustrate: What I want to remove: <html><head><title>Blah blah</title></head><body>Bunch of crap <table border=0 cellpadding=1 cellspacing=0 bgcolor=#FFFFFF width=270><tr><th>More crap</th></tr><tr><th>And some more</th></tr> What I want to iterate through and store in variables: <tr bgcolor=#333333> <td><font face="verdana,sans-serif" size=2 color=E8E8E8> NUMBER</td> <td><font face="verdana,sans-serif" size=2 color=E8E8E8> DATA</a></td> </tr> <tr bgcolor="#F4F4F4"> <td><font face="verdana,sans-serif" size=1> NUMBER2</td> <td> <a href="/URL/"><font face="verdana,sans-serif" size=1 color=#000000>DATA2</a></td> </tr> <tr> <td><font face="verdana,sans-serif" size=1> NUMBER3</td> <td> <a href="/URL2/"><font face="verdana,sans-serif" size=1 color=#000000>DATA3</a></td> </tr> To be honest, I don't even need the URLs. (Notice the first result has no URL associated with it) I want to scrap everything except the table rows (the meat and potatoes), then iterate through each row and store each NUMBER and DATA. The number of rows will vary each time, and I need to associate the NUMBER with the DATA. I don't know if an array could do what I need, but I can readily use SQL if necessary. In fact, that might be preferable. I'm sorry if I'm not being clear, I've been up for quite a while and I can't seem to formulate an intelligent question. :-\ If there is any more information I can provide to help you help me, don't hesitate to ask. Quote Link to comment Share on other sites More sharing options...
effigy Posted October 25, 2007 Share Posted October 25, 2007 Do you really need to scrap (replace) the unwanted data, or do you want to extract (match) the desired data? The latter is easier and--I think--what you want to do. Quote Link to comment Share on other sites More sharing options...
theprovider Posted October 25, 2007 Author Share Posted October 25, 2007 Do you really need to scrap (replace) the unwanted data, or do you want to extract (match) the desired data? The latter is easier and--I think--what you want to do. I suppose either would work, right? I'll definitely take the easier route (who wouldn't).. The only important thing is that I can extract the NUMBER and DATA, and store them in a database. I can do the SQL easily, but how would you recommend I extract the variables? I am at your mercy. Quote Link to comment Share on other sites More sharing options...
effigy Posted October 25, 2007 Share Posted October 25, 2007 <pre> <?php $data = <<<DATA <tr bgcolor=#333333> <td><font face="verdana,sans-serif" size=2 color=E8E8E8> NUMBER</td> <td><font face="verdana,sans-serif" size=2 color=E8E8E8> DATA</a></td> </tr> <tr bgcolor="#F4F4F4"> <td><font face="verdana,sans-serif" size=1> NUMBER2</td> <td> <a href="/URL/"><font face="verdana,sans-serif" size=1 color=#000000>DATA2</a></td> </tr> <tr> <td><font face="verdana,sans-serif" size=1> NUMBER3</td> <td> <a href="/URL2/"><font face="verdana,sans-serif" size=1 color=#000000>DATA3</a></td> </tr> DATA; preg_match_all('#<td[^>]*>(.*?)</td>#s', $data, $matches); array_shift($matches); foreach ($matches[0] as &$match) { $match = strip_tags($match); $match = str_replace(' ', '', $match); } print_r($matches); ?> </pre> Quote Link to comment Share on other sites More sharing options...
theprovider Posted October 25, 2007 Author Share Posted October 25, 2007 <pre> <?php $data = <<<DATA <tr bgcolor=#333333> <td><font face="verdana,sans-serif" size=2 color=E8E8E8> NUMBER</td> <td><font face="verdana,sans-serif" size=2 color=E8E8E8> DATA</a></td> </tr> <tr bgcolor="#F4F4F4"> <td><font face="verdana,sans-serif" size=1> NUMBER2</td> <td> <a href="/URL/"><font face="verdana,sans-serif" size=1 color=#000000>DATA2</a></td> </tr> <tr> <td><font face="verdana,sans-serif" size=1> NUMBER3</td> <td> <a href="/URL2/"><font face="verdana,sans-serif" size=1 color=#000000>DATA3</a></td> </tr> DATA; preg_match_all('#<td[^>]*>(.*?)</td>#s', $data, $matches); array_shift($matches); foreach ($matches[0] as &$match) { $match = strip_tags($match); $match = str_replace(' ', '', $match); } print_r($matches); ?> </pre> Thanks! That did it (for the most part) -- There's still a few quirks, but it has nothing to do with your code. I'm now one step closer, and I'll keep fiddling with it before asking more questions. Thanks again, you're a lifesaver! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.