Jump to content

Preg Match Question Multiple Values


factoring2117

Recommended Posts

I need to preg match multiple values, but my code only seems to want to grab the first value then it stops.

 

Here is an example

 

HTML Code:

<Td><a href="list.php?a=add&id=274619213&g=1">me</a></td>
<Td ><a href="list.php?a=add&id=463335839&g=1">me</a></td>
<Td><a href="list.php?a=add&id=106690164&g=1">me</a></td>

 

I need to extract the id number from every line on the page, but there are hundreds on the page. This is the code I have so for. I believe I need a for statement but I don't know how to set it up.

 

if (preg_match('#&id=(.+)&g=1#', $html, $matches)) {
$id = $matches[1];
}

 

Please help me figure this out.

 

Thank you.

Link to comment
https://forums.phpfreaks.com/topic/151945-preg-match-question-multiple-values/
Share on other sites

Here is my example:

 

$data = <<<HTML
<Td><a href="list.php?a=add&id=274619213&g=1">me</a></td>
<Td ><a href="list.php?a=add&id=463335839&g=1">me</a></td>
<Td><a href="list.php?a=add&id=106690164&g=1">me</a></td>
HTML;

preg_match_all('#<td[^>]*><a.+?id=(\d+).*?>.*?</td>#is', $data, $matches);
echo '<pre>'.print_r($matches[1], true);

 

output:

Array
(
    [0] => 274619213
    [1] => 463335839
    [2] => 106690164
)

 

I am making some assumptions...they are:

a) I use the i modifiers (for case insensitivity, as there may be <td or <Td or <TD), and I use the s modifier incase some segements within the pattern are on another line... most likely, you won't need the s, but I added it as a safe guard just in case.

b) Since one the examples is <Td > there is a space there, so I used <td[^>]*> to match anything up to, and including the >.

c) I am assuming that all ids are found with the a tag...

 

The solution I provided is a 'quick and dirty' way, which isn't necessarily bulletproof. But for the example you provided, assuming the pages have that sort of formatting, it should do the trick.

 

I think you could also use this pattern:

#<td[^>]*><a[^>]+id=(\d+).*?>.*?</td>#is

 

the [^>]+ will match up to the last character before the first > of the opening a tag, then backtrack to find id=.... This method is slower I would wager, however might add an extra layer of assurance that it checks for id= as an attribute with the opening a tag, and not match some id somewhere else.

 

EDIT, actually, I'm not so sure about that last example / explanation, so just try the first one and see what it gives you.

Another alternative (using DOMDocument / XPath()) could include:

 

$data = <<<HTML
<Td><a href="list.php?a=add&id=274619213&g=1">me</a></td>
<Td ><a href="list.php?a=add&id=463335839&g=1">me</a></td>
<Td><a href="list.php?a=add&id=106690164&g=1">me</a></td>
HTML;

$dom = new DOMDocument;
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$aTag = $xpath->query('//td/a[@href]');
foreach ($aTag as $val) {
if(preg_match('#id=(\d+)#', $val->getAttribute('href'), $match)){
	echo $match[1] . "<br />\n";
}
}

 

This would be a better alternative IMO. Feels more solid with less room for mishaps.

For this to work on a site page, you would change:

 

@$dom->loadHTML($data);

 

to:

 

@$dom->loadHTMLFile('http://www.whateversite.whatever'); // insert the URL in question within the quotes.

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.