factoring2117 Posted March 31, 2009 Share Posted March 31, 2009 I need to preg match multiple values, but my code only seems to want to grab the first value then it stops. Here is an example HTML Code: <Td><a href="list.php?a=add&id=274619213&g=1">me</a></td> <Td ><a href="list.php?a=add&id=463335839&g=1">me</a></td> <Td><a href="list.php?a=add&id=106690164&g=1">me</a></td> I need to extract the id number from every line on the page, but there are hundreds on the page. This is the code I have so for. I believe I need a for statement but I don't know how to set it up. if (preg_match('#&id=(.+)&g=1#', $html, $matches)) { $id = $matches[1]; } Please help me figure this out. Thank you. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted March 31, 2009 Share Posted March 31, 2009 Here is my example: $data = <<<HTML <Td><a href="list.php?a=add&id=274619213&g=1">me</a></td> <Td ><a href="list.php?a=add&id=463335839&g=1">me</a></td> <Td><a href="list.php?a=add&id=106690164&g=1">me</a></td> HTML; preg_match_all('#<td[^>]*><a.+?id=(\d+).*?>.*?</td>#is', $data, $matches); echo '<pre>'.print_r($matches[1], true); output: Array ( [0] => 274619213 [1] => 463335839 [2] => 106690164 ) I am making some assumptions...they are: a) I use the i modifiers (for case insensitivity, as there may be <td or <Td or <TD), and I use the s modifier incase some segements within the pattern are on another line... most likely, you won't need the s, but I added it as a safe guard just in case. b) Since one the examples is <Td > there is a space there, so I used <td[^>]*> to match anything up to, and including the >. c) I am assuming that all ids are found with the a tag... The solution I provided is a 'quick and dirty' way, which isn't necessarily bulletproof. But for the example you provided, assuming the pages have that sort of formatting, it should do the trick. I think you could also use this pattern: #<td[^>]*><a[^>]+id=(\d+).*?>.*?</td>#is the [^>]+ will match up to the last character before the first > of the opening a tag, then backtrack to find id=.... This method is slower I would wager, however might add an extra layer of assurance that it checks for id= as an attribute with the opening a tag, and not match some id somewhere else. EDIT, actually, I'm not so sure about that last example / explanation, so just try the first one and see what it gives you. Quote Link to comment Share on other sites More sharing options...
factoring2117 Posted March 31, 2009 Author Share Posted March 31, 2009 That works perfect. Thank you. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted March 31, 2009 Share Posted March 31, 2009 Another alternative (using DOMDocument / XPath()) could include: $data = <<<HTML <Td><a href="list.php?a=add&id=274619213&g=1">me</a></td> <Td ><a href="list.php?a=add&id=463335839&g=1">me</a></td> <Td><a href="list.php?a=add&id=106690164&g=1">me</a></td> HTML; $dom = new DOMDocument; @$dom->loadHTML($data); $xpath = new DOMXPath($dom); $aTag = $xpath->query('//td/a[@href]'); foreach ($aTag as $val) { if(preg_match('#id=(\d+)#', $val->getAttribute('href'), $match)){ echo $match[1] . "<br />\n"; } } This would be a better alternative IMO. Feels more solid with less room for mishaps. For this to work on a site page, you would change: @$dom->loadHTML($data); to: @$dom->loadHTMLFile('http://www.whateversite.whatever'); // insert the URL in question within the quotes. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.