Jump to content

[SOLVED] please Help, preg_match() is not working, probably my string patter is off D:


Recommended Posts

Here is the relevant string that I am searching:

ont size="-1">This is some text, has a ., comma, and : and maybe ; or whatever is normal, some numbers too $44-billion on <b>...</b&gt

The prefix:

<b>

The suffix:

<b>...

 

Here is my try at it:

<?php
preg_match('/<b>[a-zA-Z0-9\.\:\,\$]<b>.../',$pointer->description,$result);
if($result){
echo $result[0];	
}
?>

 

Could anyone help me? It doesn't seem to be working.. :confused:

A couple of things;

 

  • The prefix does not appear to be in the string that you provided. The suffix is, but the prefix not.
  • Your regular expression only looks for a single character (from the list you provide) between the prefix and suffix. You probably want to use a quantifier to ask for more than one matching character.

 

 

In addition to salathe post, the RegEx is also is missing " "(space) ;(semi-colon) and -(hyphen),

(and maybe a capture)

Assuming the prefix is

">

then you probably want this

if (preg_match('/">([a-zA-Z0-9.:;,$\s-]*)<b>\.\.\./', $pointer->description,$result)) {
echo $result[1]; //OR 0
}

Ok idk what is wrong, It is still not working!!

 

Here, I'll just post all the text that it is looking through:

<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.financialpost.com%2Fstory.html%3Fid%3D2175056&amp;usg=AFQjCNGpbl7oJlW5BPknQ8-LRoAgbXpbVQ"><img src="http://nt3.ggpht.com/news/tbn/k3LkzFV8cL9S_M/6.jpg" alt="" border="1" width="80" height="80" /><br /><font size="-2">Financial Post</font></a></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br /><div style="padding-top:0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.thestar.com%2Fbusiness%2Farticle%2F720314--tsx-gains-as-gold-hits-new-record&amp;usg=AFQjCNHSRaWAgCNBtmfDW-2ItQHsLv2p4g"><b>TSX gains as gold hits new record</b></a><br /><font size="-1"><b><font color="#6f6f6f">Toronto Star</font></b></font><br /><font size="-1">The Toronto stock market maintained a solid lead Tuesday as gold stocks surged on record high bullion prices, railway stocks gained following major <b>...</b></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.metronews.ca%2Ftoronto%2Fbusiness%2Farticle%2F346561--tsx-venture-exchange-closed-down-1-79-points-to-1-289-62&amp;usg=AFQjCNH1ZJZ9z8fDkdaZ0dO-ypU0_DINbA">TSX Venture Exchange closed down 1.79 points to 1289.62</a><font size="-1" color="#6f6f6f"><nobr>Metro Canada - Toronto</nobr></font></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fca.reuters.com%2Farticle%2FbusinessNews%2FidCATRE5A12DD20091103&amp;usg=AFQjCNHvjk_ey0_WrleBO8G6zP69xiivYA">TSX erases early skid as gold miners rally</a><font size="-1" color="#6f6f6f"><nobr>Reuters Canada</nobr></font></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.reuters.com%2Farticle%2FcompanyNewsAndPR%2FidUSN0349369920091103&amp;usg=AFQjCNEB0Ewi0Kn6dWdyNl0fbUz43B8ltw">CANADA STOCKS-TSX falls on banks but railways ride higher</a><font size="-1" color="#6f6f6f"><nobr>Reuters</nobr></font></font><br /><font size="-1" class="p"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.thestar.com%2Fbusiness%2Farticle%2F720102--tsx-dips-amid-worry-over-u-s-rebound&amp;usg=AFQjCNEp6cor5UdtYJ__11o3Dut_3WGDuQ"><nobr>Toronto Star</nobr></a>&nbsp;-<a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.metronews.ca%2Ftoronto%2Fbusiness%2Farticle%2F346247--tsx-venture-exchange-is-up-8-36-points-to-1-299-77-at-noon-et&amp;usg=AFQjCNGP0_1zq2UfYngThvOHLlzZcmYX_Q"><nobr>Metro Canada - Toronto</nobr></a></font><br /><font class="p" size="-1"><a class="p" href="http://news.google.com/news/more?pz=1&amp;geo=toronto+on&amp;ncl=dQCMF26BgDnbrbMwT5xqg-b9aNKLM"><nobr><b>all 168 news articles&nbsp;&raquo;</b></nobr></a></font></div></font></td></tr></table>

as you can see, the prefix is not unique, but the suffix is.

 

Yea, it's from within the <description> tags of a RSS feed. I am trying to get out "just the nice text" with this technique. I tried the code you gave me and its not working. Could you try it to see if maybe its a problem on my end?

 

If, while you'r at it, you could figure out how to get the image url too that would be awesome!!

For cleaning up, a simple solution would be this, with allowed img tags

<?php
$X = "<font size="-1">This is some text, has a ., comma, and : and maybe ; or whatever is normal, some numbers too $44-billion on <b>...</b&gt";

$X = html_entity_decode($X);
$X = html_entity_decode($X);
$X = preg_replace('%<br\s*/?>%', "\n", $X);
$X = strip_tags($X,"img");

echo $X;

That works, but It doesnt remove the other text. Like the old anchor text of links that I wanted to eliminate, which would have been eliminated with preg_match(), remain as just text. I only want the main description, which as you can see is both the longest set of clean string, and is surrounded by prefix and suffix. The suffix is unique to the whole section.

 

So could you check on the preg_match please? It seems only it can get me what I need.

 

*ps if you want to see how code looks unfiltered, just show it as html, save it to text file and open with firefox. You will see the "descroption" that I want (black text)

 

thanks!

The last code i posted was just an idea, i don't know what your trying to pull out, as you say its not pulling everything out and

the prefix is not unique, but the suffix is

But without know exactly what you want its going to be guess work!

Ok, let me recompile everthing for you, properly this time.

 

Input String:

<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.financialpost.com%2Fstory.html%3Fid%3D2175056&amp;usg=AFQjCNGpbl7oJlW5BPknQ8-LRoAgbXpbVQ"><img src="http://nt3.ggpht.com/news/tbn/k3LkzFV8cL9S_M/6.jpg" alt="" border="1" width="80" height="80" /><br /><font size="-2">Financial Post</font></a></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br /><div style="padding-top:0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.thestar.com%2Fbusiness%2Farticle%2F720314--tsx-gains-as-gold-hits-new-record&amp;usg=AFQjCNHSRaWAgCNBtmfDW-2ItQHsLv2p4g"><b>TSX gains as gold hits new record</b></a><br /><font size="-1"><b><font color="#6f6f6f">Toronto Star</font></b></font><br /><font size="-1">The Toronto stock market maintained a solid lead Tuesday as gold stocks surged on record high bullion prices, railway stocks gained following major <b>...</b></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.metronews.ca%2Ftoronto%2Fbusiness%2Farticle%2F346561--tsx-venture-exchange-closed-down-1-79-points-to-1-289-62&amp;usg=AFQjCNH1ZJZ9z8fDkdaZ0dO-ypU0_DINbA">TSX Venture Exchange closed down 1.79 points to 1289.62</a><font size="-1" color="#6f6f6f"><nobr>Metro Canada - Toronto</nobr></font></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fca.reuters.com%2Farticle%2FbusinessNews%2FidCATRE5A12DD20091103&amp;usg=AFQjCNHvjk_ey0_WrleBO8G6zP69xiivYA">TSX erases early skid as gold miners rally</a><font size="-1" color="#6f6f6f"><nobr>Reuters Canada</nobr></font></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.reuters.com%2Farticle%2FcompanyNewsAndPR%2FidUSN0349369920091103&amp;usg=AFQjCNEB0Ewi0Kn6dWdyNl0fbUz43B8ltw">CANADA STOCKS-TSX falls on banks but railways ride higher</a><font size="-1" color="#6f6f6f"><nobr>Reuters</nobr></font></font><br /><font size="-1" class="p"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.thestar.com%2Fbusiness%2Farticle%2F720102--tsx-dips-amid-worry-over-u-s-rebound&amp;usg=AFQjCNEp6cor5UdtYJ__11o3Dut_3WGDuQ"><nobr>Toronto Star</nobr></a>&nbsp;-<a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.metronews.ca%2Ftoronto%2Fbusiness%2Farticle%2F346247--tsx-venture-exchange-is-up-8-36-points-to-1-299-77-at-noon-et&amp;usg=AFQjCNGP0_1zq2UfYngThvOHLlzZcmYX_Q"><nobr>Metro Canada - Toronto</nobr></a></font><br /><font class="p" size="-1"><a class="p" href="http://news.google.com/news/more?pz=1&amp;geo=toronto+on&amp;ncl=dQCMF26BgDnbrbMwT5xqg-b9aNKLM"><nobr><b>all 168 news articles&nbsp;&raquo;</b></nobr></a></font></div></font></td></tr></table>

prefix:

quot;>

suffix:

<b>...

 

So, the intended output *should* be:

The Toronto stock market maintained a solid lead Tuesday as gold stocks surged on record high bullion prices, railway stocks gained following major

 

So could you please show me what the appropriate working "string patter" would be? This is what I am trying, and it is not working:

<?php
if (preg_match('/">([a-zA-Z0-9.:;,$\s-]*)<b>\.\.\./', $text,$result)) {
echo $result[0];
}
?>

 

Thanks!

It is still not working.

Here maybe this will help, it is the rest of the code around the preg_match() attempt:

<?php
$url = 'http://news.google.com/news?pz=1&cf=all&ned=ca&hl=en&geo=toronto+on&output=rss';
$pointer = new SimpleXmlElement(file_get_contents($url));  
$count = 1;
foreach($pointer->channel->item as $entry) {
	echo "<a href='$entry->link' title='$entry->title'>" . $entry->title . "</a><br>";
	//Get description

	$description =  $entry->description;
	if (preg_match('/">([a-zA-Z0-9.:;,$\s-]*)<b>\.\.\./', $description ,$result)) {
		echo "<br><br>YESSSS:".$result[1];
	}else{
		echo "<br><br>NOOOOO".$description;
	}
	if($count>=$limit){
		break;
	}
	$count++;
}
?>

 

Any Idea? Try it yourself idk won't work for me

Ahh that makes more sense now

<?php
   $url = 'http://news.google.com/news?pz=1&cf=all&ned=ca&hl=en&geo=toronto+on&output=rss';
   $pointer = new SimpleXmlElement(file_get_contents($url));  
   $count = 1;
   $limit = 5; //added
   
   foreach($pointer->channel->item as $entry) {
      echo "<a href='$entry->link' title='$entry->title'>" . $entry->title . "</a><br>";
      //Get description
      
      $description = (string)$entry->description; //UPDATED to get the string instead of the object
      //updated RegEx to HTML instead of encoded HTML
      if (preg_match('/">([a-zA-Z0-9.:;,$\s-]*)<b>\.\.\./', $description ,$result)) {
         echo "<br><br>YESSSS:".$result[1];
      }else{
         echo "<br><br>NOOOOO".$description;
      }
      if($count>=$limit){
         break;
      }
      $count++;
   }
?>

Ahh finally thank you so much!

Goes to show I shouldn't assume what is relevant and what is not...

 

Anyways I finally succeeded at making my own news poster! Sick thanks alot!

 

EDIT: Actually, I need to ask you a couple more questions:

So the string that it was processing, was that the raw "source code", or "outputted code"?

Any advice on finding the <img> url in there? I think I could do it but I need to know if the string is raw HTML or outputted HTML. I didn't really understand that part of your comment.

Okay when you have XML data the strings get encoded. so < becomes < and > becomes >

the reason is simple

here some invalid XML

<DATA>
This is some text but if i had </DATA> in here it would mess up 
</DATA>

 

Now this will fail because it closes the DATA tag in the data contents (value)

 

so to stop that they encode it

valid XML

<DATA>
This is some text but if i had </DATA> in here it would mess up 
</DATA>

 

Hope that makes sense, now when PHP is reading the data it decode it back to what it should be.. so while the XML had

">

the value is

">

and that's what PHP gets

 

So to sum up.. you are looking for the HTML info ie <img

to grab the img URL you could use something like this (should work)

$HTML = 'la lalalalalalal <img height="400px" src="I am a URL" width="10px" > lalalal ala';
if (preg_match('/<img[^>]*src\s*=([\'"])([^\1]*?)\1[^>]*>/', $HTML, $reg)) {
echo $reg[2];
}

 

EDIT: oow forgot a ; but fixed now

I am kinda tired so i may not of made sense!

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.