[SOLVED] How do I match this?

daydreamer · October 5, 2009

Hi,

<th scope="row">

some words in here

<span class="thisisonecrazyclass"></span>

</th>

<td>get all in here</td>

"some words in here" will be the same all the time.

"get all in here" changes and I want to store this using a preg_match.

This is what I am trying to do with no results:

<?php

preg_match("~some\swords\sin\shere[\n.\s]*<td>(.*)</td>~i", $xxx, $matches);

?>

Where am i going wrong?

Thanks.

cags · October 5, 2009

Is there only going to be one set of <td> </td> tags in the source or will there be multiple. If theres multiple do you want all the values or just the first? I assume the information between some words in here and <td> will vary?

daydreamer · October 5, 2009

there will be multiple <td> </td> tags.

But I only need what ever is inside of them after the text "some words in here".

Yes the information between these two will vary, but not alot. The class of the span might change, or the span might not be their. "some words in here" will be the same.

cags · October 5, 2009

This should work, but I'm sure somebody could come up with a better solution...

preg_match("~some words in here.+?<td>(.+)?</td>~s", $src, $out);
echo $out[1];

nrg_alpha · October 6, 2009

My take on it (using DOM / XPath):

Example:

$html = <<<EOF
<table>
<th scope="row">

some words in here

<span class="thisisonecrazyclass"></span>
</th>

<td>get all in here, because I'm 1st!</td>
<td>Some garbage...</td>
<th scope="row">

some words in here

<span class="thisisonecrazyclass"></span>

</th>

<td>Get it all in here too! 2nd, yo!</td>
<a href="blah">text</blah>
<h2>this is a header</h2>
</table>
EOF;

$dom = new DOMDocument;
@$dom->loadHTML($html); // change loadHTML to loadHTMLFile and put a legit url in quotes within the parenthesis if you want to apply this to a live site
$xpath = new DOMXPath($dom);
$tdTag = $xpath->query('//th[@scope="row"]/text()[contains(.,"some words in here")]/../following-sibling::td[1]'); // change "Some words here" to the actual words in question

foreach ($tdTag as $val) {
    echo $val->nodeValue . "<br />\n";
}

Output:

get all in here, because I'm 1st!
Get it all in here too! 2nd, yo!

This all makes some assumptions;

a) It is a th tag that precedes the desired td tag in question

b) that the th tag needs to have the attribute scope which has the value "row". If this part is not required, you can simply delete the the first predicate ([@scope=row]) from the query.

Obviously, since the 'some words in here' is going to be the same (and thus used as part of determining which th is being used), use the actual words in place of that in the xpath query.

nrg_alpha · October 6, 2009

To elaborate on assumption a), the code will fetch the first td tag it finds after the correct th tag is found (so in other words, there could be more tags between the th and the first td afterwards...

daydreamer · October 8, 2009

Thanks for the suggestion nrg_alpha, ill have a look into the xpath way of getting data.

cags, that code didnt work, but I got the expression to work by using a different expression.

Sign In

[SOLVED] How do I match this?

Recommended Posts

daydreamer

Link to comment

Share on other sites

cags

Link to comment

Share on other sites

daydreamer

Link to comment

Share on other sites

cags

Link to comment

Share on other sites

nrg_alpha

Link to comment

Share on other sites

nrg_alpha

Link to comment

Share on other sites

daydreamer

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information