Jump to content

Ostensibly a simple question - but I can't figure it out!~


matstars

Recommended Posts

I am trying to get any text that is contained between

 

"<a target*>" AND "</a>" 

 

* being a wild card

 

so that each of these would respond with The Quick Brown Fox:

 

1.  <a targetafadfa>The Quick Brown Fox</a>

2.  <a target>The Quick Brown Fox</a>

3.  <a target in the>The Quick Brown Fox</a>

 

 

Now - as I see it -

 

I would need to tell regex to do the following-

 

Look for

1. "<a target"

2. a subsequent ">"

3. Grab ANY text following such - until you see "</a>"

 

I just don't know how to write that in PHP RegEx.

 

I used txt2re.com but there is no "any text" - so it came up with below (in bold)

 

<?php

  # URL that generated this code:

  # [L=http://txt2re.com/index-php.ph...&-47&-37&-2&-43&-6&-38]http://txt2re.com/index-php.php3?s=%3Ca%20target%20test%3EThe%20Quick%3C/a%3E&-41&-47&-37&-2&-43&-6&-38[/L]

 

  $txt='<a target test>The Quick Brown Fox</a>';

 

  $re1='(<)'; # Any Single Character 1

  $re2='(a)'; # Any Single Character 2

  $re3='( )'; # Any Single Character 3

  $re4='(target)'; # Word 1

  $re5='( )'; # Any Single Character 4

  $re6='.*?'; # Non-greedy match on filler

  $re7='(>)'; # Any Single Character 5

  $re8='.*'; # Non-greedy match on filler

  $re9='(<\\/a>)'; # Tag 1

 

  if ($c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9."/is", $txt, $matches))

  {

      $c1=$matches[1][0];

      $c2=$matches[2][0];

      $c3=$matches[3][0];

      $word1=$matches[4][0];

      $c4=$matches[5][0];

      $c5=$matches[6][0];

      $tag1=$matches[7][0];

      print "($c1) ($c2) ($c3) ($word1) ($c4) ($c5) ($tag1) \n";

  }

 

  echo "<BR><BR><BR><BR>";

  echo var_dump($matches);

 

  #-----

  # Paste the code into a new php file. Then in Unix:

  # $ php x.php

  #-----

 

I added to it the echo vardump of $matches - an attempt to maybe find what I am looking for in the array $matches - but that was to no avail because it outputs (in bold):

 

(<) (a) ( ) (target) ( ) (>) ()

 

 

 

array(.8) { [0]=> array(1) { [0]=> string(38) "The Quick Brown Fox" } [1]=> array(1) { [0]=> string(1) "<" } [2]=> array(1) { [0]=> string(1) "a" } [3]=> array(1) { [0]=> string(1) " " } [4]=> array(1) { [0]=> string(6) "target" } [5]=> array(1) { [0]=> string(1) " " } [6]=> array(1) { [0]=> string(1) ">" } [7]=> array(1) { [0]=> string(4) "" } }

 

in the echo var_dump I added a . before 8 to not have it give the emoticon :)

 

Any help?

 

-Mat

try:

<?php
$text = "<a targetafadfa>The Quick Brown Fox</a>
<br> <hr> some middle text<a target>The Quick Brown Fox</a><a target in the>The Quick Brown Fox</a>";

if(preg_match_all('/<a target.*?>(.+?)<\/a>/',$text,$matches)){
  print_r($matches[1]);
}

?>

Using negative look ahead assertions offers the additional flexability of matching sub nested (x)html tags within the target <a> tags...

 

$str = <<<DATA
1.  this is a <a targetafadfa>The <em>Quick</em> Brown Fox</a> hyperlink.
2.  <a target>The <strong>Slow</strong> orange turtle</a>
3.  <a target in the>The Moderate-paced yellow bird</a>
DATA;

preg_match_all('#<a[^>]+>(??!</a>).)+</a>#s', $str, $matches);
echo '<pre>';
print_r($matches[0]);
echo '</pre>';

 

Output:

Array
(
    [0] => The Quick Brown Fox
    [1] => The Slow orange turtle
    [2] => The Moderate-paced yellow bird
)

 

EDIT: Granted, my solution is not the wisest approach if there is no nested tags within the a tag.

Indeed rhodesa, yours works better :) Sometimes I overthink things lol

There can be one modification to sqeeze an extra smidgit of speed / efficiency out of it:

 

#<a target[^>]*>(.+?)</a>#s

 

Negated character classes are faster than lazy quantifiers... but for medial tasks, we are probably splitting hairs here. :)

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.