Preg Match Question Multiple Values

factoring2117 · March 31, 2009

I need to preg match multiple values, but my code only seems to want to grab the first value then it stops.

Here is an example

HTML Code:

<Td><a href="list.php?a=add&id=274619213&g=1">me</a></td>
<Td ><a href="list.php?a=add&id=463335839&g=1">me</a></td>
<Td><a href="list.php?a=add&id=106690164&g=1">me</a></td>

I need to extract the id number from every line on the page, but there are hundreds on the page. This is the code I have so for. I believe I need a for statement but I don't know how to set it up.

if (preg_match('#&id=(.+)&g=1#', $html, $matches)) {
$id = $matches[1];
}

Please help me figure this out.

Thank you.

nrg_alpha · March 31, 2009

Here is my example:

$data = <<<HTML
<Td><a href="list.php?a=add&id=274619213&g=1">me</a></td>
<Td ><a href="list.php?a=add&id=463335839&g=1">me</a></td>
<Td><a href="list.php?a=add&id=106690164&g=1">me</a></td>
HTML;

preg_match_all('#<td[^>]*><a.+?id=(\d+).*?>.*?</td>#is', $data, $matches);
echo '<pre>'.print_r($matches[1], true);

output:

Array
(
    [0] => 274619213
    [1] => 463335839
    [2] => 106690164
)

I am making some assumptions...they are:

a) I use the i modifiers (for case insensitivity, as there may be <td or <Td or <TD), and I use the s modifier incase some segements within the pattern are on another line... most likely, you won't need the s, but I added it as a safe guard just in case.

b) Since one the examples is <Td > there is a space there, so I used <td[^>]*> to match anything up to, and including the >.

c) I am assuming that all ids are found with the a tag...

The solution I provided is a 'quick and dirty' way, which isn't necessarily bulletproof. But for the example you provided, assuming the pages have that sort of formatting, it should do the trick.

I think you could also use this pattern:

#<td[^>]*><a[^>]+id=(\d+).*?>.*?</td>#is

the [^>]+ will match up to the last character before the first > of the opening a tag, then backtrack to find id=.... This method is slower I would wager, however might add an extra layer of assurance that it checks for id= as an attribute with the opening a tag, and not match some id somewhere else.

EDIT, actually, I'm not so sure about that last example / explanation, so just try the first one and see what it gives you.

factoring2117 · March 31, 2009

That works perfect. Thank you.

nrg_alpha · March 31, 2009

Another alternative (using DOMDocument / XPath()) could include:

$data = <<<HTML
<Td><a href="list.php?a=add&id=274619213&g=1">me</a></td>
<Td ><a href="list.php?a=add&id=463335839&g=1">me</a></td>
<Td><a href="list.php?a=add&id=106690164&g=1">me</a></td>
HTML;

$dom = new DOMDocument;
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$aTag = $xpath->query('//td/a[@href]');
foreach ($aTag as $val) {
if(preg_match('#id=(\d+)#', $val->getAttribute('href'), $match)){
	echo $match[1] . "<br />\n";
}
}

This would be a better alternative IMO. Feels more solid with less room for mishaps.

For this to work on a site page, you would change:

@$dom->loadHTML($data);

to:

@$dom->loadHTMLFile('http://www.whateversite.whatever'); // insert the URL in question within the quotes.

Sign In

Preg Match Question Multiple Values

Recommended Posts

factoring2117

Link to comment

Share on other sites

nrg_alpha

Link to comment

Share on other sites

factoring2117

Link to comment

Share on other sites

nrg_alpha

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information