Jump to content

Retrieve data that's between tags


nick1

Recommended Posts

Greetings,

To start, lets say you have a script that returns a variable which contains a random string that is never
the same length and never contains the same information.  It could look something like this:

[code]$results = <td class="name1">some data1</td><td class="name1">more data</td><td class="name1">12345</td><td class="name1">4/9/2006</td><td class="name1">You get the picture</td>[/code]

My question is this:
How do I retrieve only what is between the <td></td> tags?

For example:
[code]<td class="name1">some data1</td>
$a = some data1[/code]

I only want the information that is between the <td></td> tags, nothing else.
I would probably want to place each piece of data between the <td></td> tags into an array
so that I could later take each key => value and write value to a database, into it's proper column.

Many thanks in advance,

*Nick*
Link to comment
Share on other sites

This is something I have been fighting with for a few months, data harvest, I have a friend who creates bots for a living, and he got me into it, I am trying to figure out how to get certain tags, that have specific id's or attributes.  Or standard tags, the only way he said, is to learn regular expressions, once you learn how to find and retrieve data from regular expressions, all of that will come natural, you have ot learn how to find data, and how to replace data with regular expressions, and you could right up something to check for that on a page.
Link to comment
Share on other sites

Why couldn't you just use something like this...

$pieces = explode("</td>", $results);

[code]
<?php
//function to get a substring between between two other substrings

function substring_between($haystack,$start,$end) {
  if (strpos($haystack,$start) === false || strpos($haystack,$end) === false) {
      return false;
  } else {
      $start_position = strpos($haystack,$start)+strlen($start);
      $end_position = strpos($haystack,$end);
      return substr($haystack,$start_position,$end_position-$start_position);
  }
}

//use of this function to get the title of an html document

$handle = fopen($filename, 'r');
$contents = fread($handle, filesize($filename));
fclose($handle);

$contents = htmlspecialchars($contents);
$title = substring_between($contents,'<title>','</title>');

?>
[/code]

now of course that code would need to be altered a little bit but that is the basic jist of what you are trying to accomplish..
Link to comment
Share on other sites

Just found a snippet regex.com for you:
[code]$m=array();
$pattern = "</?(\w+)(\s+\w+=(\w+|\"[^\"]*\"|\'[^\']*\'))*>";
$text = "<td xxxxx>ABCD</td>";
preg_match_all($pattern, $text, $m);
echo '<pre>'; print_r($m);[/code]

Prints:
[code]Array
(
    [0] => Array
        (
            [0] => td
            [1] => xxxxx
            [2] => ABCD
            [3] => /td
        )
[/code]

Ronald  :cool:
Link to comment
Share on other sites

[code]
<pre>
<?php
$results = <<<STR
<td class="name1">some data1</td><td class="name1">more data</td><td class="name1">12345</td><td class="name1">4/9/2006</td><td class="name1">You get the picture</td>
STR;

preg_match_all('%<td.*?>(.+?)</td>%', $results, $matches);
array_shift($matches);
print_r($matches);
?>
</pre>
[/code]
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.