Jump to content

Help with extracting innerHTML of an element


quuxbazer

Recommended Posts

Hi,

 

I just read a quick tutorial about regex but couldn't make what I wanted to work...sorry if this is something frequently asked.

 

So, I want to extract the HTML code between <td> and </td> in a string that I used cURL to get. I know what comes right before the <td> tag:

 

...
<strong>Speed:</strong>
"  Faster"
</div>
</td>
<td>STRING I WANT TO EXTRACT</td>
<td class="align-center" ......

 

I hope I'm being somewhat clear on this; any help would be much appreciated!

Link to comment
Share on other sites

Hi,

 

I just read a quick tutorial about regex but couldn't make what I wanted to work...sorry if this is something frequently asked.

 

So, I want to extract the HTML code between <td> and </td> in a string that I used cURL to get. I know what comes right before the <td> tag:

 

...
<strong>Speed:</strong>
"  Faster"
</div>
</td>
<td>STRING I WANT TO EXTRACT</td>
<td class="align-center" ......

 

I hope I'm being somewhat clear on this; any help would be much appreciated!

 

You ask and you shall receive!

 

function getTD($string)
{
$matchArray = array();
$pattern    = "~^<td>(.*?)</td>$~";
preg_match($pattern, $string, $matchArray);
return $matchArray[0];
}

 

Just run the function like so:

echo getTD("<td>this string</td>");
// @return this string

Link to comment
Share on other sites

Hi Zurev,

 

Thanks for the response...but for my purposes, I need to find the correct <td> tag based on what comes right before it and right after. In the HTML document stored in the variable, there are a lot of <td> elements, and I need a specific one based on the pattern as I have shown in my first post.

 

Thanks!

 

By the way, why do you write (.*?) with the question mark? I know .* means 0 or more of any character, but why the ? afterwards?

Link to comment
Share on other sites

Hi Zurev,

 

Thanks for the response...but for my purposes, I need to find the correct <td> tag based on what comes right before it and right after. In the HTML document stored in the variable, there are a lot of <td> elements, and I need a specific one based on the pattern as I have shown in my first post.

 

Thanks!

 

By the way, why do you write (.*?) with the question mark? I know .* means 0 or more of any character, but why the ? afterwards?

 

As per the reasoning behind the question mark, interestingly enough, I really probably shouldn't have used it, more on that here:

Greedy vs Non-Greedy

http://www.itworld.com/nl/perl/01112001

 

Simply put, use the ? if you're just trying to validate, not extract is what it seems to be saying.

 

So can you tell me more about what comes before and after and what the different variations are? Perhaps what code you have right now that you're stuck on?

Link to comment
Share on other sites

Now that I think about it...a combination of the strpos and substr functions would probably work better for my case, since the "pattern" I'm looking for is always the same characters

 

I just wanted to use regex because it seemed so cool! But I'm sure I'll have to look at it again when I'm validating the form data  :P

 

Thanks again for the input.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.