[SOLVED] please Help, preg_match() is not working, probably my string patter is off D:

physaux · November 3, 2009

Here is the relevant string that I am searching:

ont size="-1">This is some text, has a ., comma, and : and maybe ; or whatever is normal, some numbers too $44-billion on <b>...</b&gt

The prefix:

<b>

The suffix:

<b>...

Here is my try at it:

<?php
preg_match('/<b>[a-zA-Z0-9\.\:\,\$]<b>.../',$pointer->description,$result);
if($result){
echo $result[0];	
}
?>

Could anyone help me? It doesn't seem to be working.. :confused:

salathe · November 3, 2009

A couple of things;

The prefix does not appear to be in the string that you provided. The suffix is, but the prefix not.
Your regular expression only looks for a single character (from the list you provide) between the prefix and suffix. You probably want to use a quantifier to ask for more than one matching character.

MadTechie · November 3, 2009

In addition to salathe post, the RegEx is also is missing " "(space) ;(semi-colon) and -(hyphen),

(and maybe a capture)

Assuming the prefix is

">

then you probably want this

if (preg_match('/">([a-zA-Z0-9.:;,$\s-]*)<b>\.\.\./', $pointer->description,$result)) {
echo $result[1]; //OR 0
}

physaux · November 3, 2009

Ah, yes the prefix was wrong, I just copy- pasted it wrong. Here is proper one:

quot;>

I'm still reading into your replies though, thanks. (might have questions again!)

physaux · November 4, 2009

Ok idk what is wrong, It is still not working!!

Here, I'll just post all the text that it is looking through:

<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.financialpost.com%2Fstory.html%3Fid%3D2175056&amp;usg=AFQjCNGpbl7oJlW5BPknQ8-LRoAgbXpbVQ"><img src="http://nt3.ggpht.com/news/tbn/k3LkzFV8cL9S_M/6.jpg" alt="" border="1" width="80" height="80" /><br /><font size="-2">Financial Post</font></a></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br /><div style="padding-top:0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.thestar.com%2Fbusiness%2Farticle%2F720314--tsx-gains-as-gold-hits-new-record&amp;usg=AFQjCNHSRaWAgCNBtmfDW-2ItQHsLv2p4g"><b>TSX gains as gold hits new record</b></a><br /><font size="-1"><b><font color="#6f6f6f">Toronto Star</font></b></font><br /><font size="-1">The Toronto stock market maintained a solid lead Tuesday as gold stocks surged on record high bullion prices, railway stocks gained following major <b>...</b></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.metronews.ca%2Ftoronto%2Fbusiness%2Farticle%2F346561--tsx-venture-exchange-closed-down-1-79-points-to-1-289-62&amp;usg=AFQjCNH1ZJZ9z8fDkdaZ0dO-ypU0_DINbA">TSX Venture Exchange closed down 1.79 points to 1289.62</a><font size="-1" color="#6f6f6f"><nobr>Metro Canada - Toronto</nobr></font></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fca.reuters.com%2Farticle%2FbusinessNews%2FidCATRE5A12DD20091103&amp;usg=AFQjCNHvjk_ey0_WrleBO8G6zP69xiivYA">TSX erases early skid as gold miners rally</a><font size="-1" color="#6f6f6f"><nobr>Reuters Canada</nobr></font></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.reuters.com%2Farticle%2FcompanyNewsAndPR%2FidUSN0349369920091103&amp;usg=AFQjCNEB0Ewi0Kn6dWdyNl0fbUz43B8ltw">CANADA STOCKS-TSX falls on banks but railways ride higher</a><font size="-1" color="#6f6f6f"><nobr>Reuters</nobr></font></font><br /><font size="-1" class="p"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.thestar.com%2Fbusiness%2Farticle%2F720102--tsx-dips-amid-worry-over-u-s-rebound&amp;usg=AFQjCNEp6cor5UdtYJ__11o3Dut_3WGDuQ"><nobr>Toronto Star</nobr></a>&nbsp;-<a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.metronews.ca%2Ftoronto%2Fbusiness%2Farticle%2F346247--tsx-venture-exchange-is-up-8-36-points-to-1-299-77-at-noon-et&amp;usg=AFQjCNGP0_1zq2UfYngThvOHLlzZcmYX_Q"><nobr>Metro Canada - Toronto</nobr></a></font><br /><font class="p" size="-1"><a class="p" href="http://news.google.com/news/more?pz=1&amp;geo=toronto+on&amp;ncl=dQCMF26BgDnbrbMwT5xqg-b9aNKLM"><nobr><b>all 168 news articles&nbsp;&raquo;</b></nobr></a></font></div></font></td></tr></table>

as you can see, the prefix is not unique, but the suffix is.

Yea, it's from within the <description> tags of a RSS feed. I am trying to get out "just the nice text" with this technique. I tried the code you gave me and its not working. Could you try it to see if maybe its a problem on my end?

If, while you'r at it, you could figure out how to get the image url too that would be awesome!!

MadTechie · November 4, 2009

For cleaning up, a simple solution would be this, with allowed img tags

<?php
$X = "<font size="-1">This is some text, has a ., comma, and : and maybe ; or whatever is normal, some numbers too $44-billion on <b>...</b&gt";

$X = html_entity_decode($X);
$X = html_entity_decode($X);
$X = preg_replace('%<br\s*/?>%', "\n", $X);
$X = strip_tags($X,"img");

echo $X;

physaux · November 4, 2009

That works, but It doesnt remove the other text. Like the old anchor text of links that I wanted to eliminate, which would have been eliminated with preg_match(), remain as just text. I only want the main description, which as you can see is both the longest set of clean string, and is surrounded by prefix and suffix. The suffix is unique to the whole section.

So could you check on the preg_match please? It seems only it can get me what I need.

*ps if you want to see how code looks unfiltered, just show it as html, save it to text file and open with firefox. You will see the "descroption" that I want (black text)

thanks!

MadTechie · November 4, 2009

The last code i posted was just an idea, i don't know what your trying to pull out, as you say its not pulling everything out and

the prefix is not unique, but the suffix is

But without know exactly what you want its going to be guess work!

physaux · November 4, 2009

Ok, let me recompile everthing for you, properly this time.

Input String:

<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.financialpost.com%2Fstory.html%3Fid%3D2175056&amp;usg=AFQjCNGpbl7oJlW5BPknQ8-LRoAgbXpbVQ"><img src="http://nt3.ggpht.com/news/tbn/k3LkzFV8cL9S_M/6.jpg" alt="" border="1" width="80" height="80" /><br /><font size="-2">Financial Post</font></a></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br /><div style="padding-top:0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.thestar.com%2Fbusiness%2Farticle%2F720314--tsx-gains-as-gold-hits-new-record&amp;usg=AFQjCNHSRaWAgCNBtmfDW-2ItQHsLv2p4g"><b>TSX gains as gold hits new record</b></a><br /><font size="-1"><b><font color="#6f6f6f">Toronto Star</font></b></font><br /><font size="-1">The Toronto stock market maintained a solid lead Tuesday as gold stocks surged on record high bullion prices, railway stocks gained following major <b>...</b></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.metronews.ca%2Ftoronto%2Fbusiness%2Farticle%2F346561--tsx-venture-exchange-closed-down-1-79-points-to-1-289-62&amp;usg=AFQjCNH1ZJZ9z8fDkdaZ0dO-ypU0_DINbA">TSX Venture Exchange closed down 1.79 points to 1289.62</a><font size="-1" color="#6f6f6f"><nobr>Metro Canada - Toronto</nobr></font></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fca.reuters.com%2Farticle%2FbusinessNews%2FidCATRE5A12DD20091103&amp;usg=AFQjCNHvjk_ey0_WrleBO8G6zP69xiivYA">TSX erases early skid as gold miners rally</a><font size="-1" color="#6f6f6f"><nobr>Reuters Canada</nobr></font></font><br /><font size="-1"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.reuters.com%2Farticle%2FcompanyNewsAndPR%2FidUSN0349369920091103&amp;usg=AFQjCNEB0Ewi0Kn6dWdyNl0fbUz43B8ltw">CANADA STOCKS-TSX falls on banks but railways ride higher</a><font size="-1" color="#6f6f6f"><nobr>Reuters</nobr></font></font><br /><font size="-1" class="p"><a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.thestar.com%2Fbusiness%2Farticle%2F720102--tsx-dips-amid-worry-over-u-s-rebound&amp;usg=AFQjCNEp6cor5UdtYJ__11o3Dut_3WGDuQ"><nobr>Toronto Star</nobr></a>&nbsp;-<a href="http://news.google.com/news/url?fd=R&amp;sa=T&amp;url=http%3A%2F%2Fwww.metronews.ca%2Ftoronto%2Fbusiness%2Farticle%2F346247--tsx-venture-exchange-is-up-8-36-points-to-1-299-77-at-noon-et&amp;usg=AFQjCNGP0_1zq2UfYngThvOHLlzZcmYX_Q"><nobr>Metro Canada - Toronto</nobr></a></font><br /><font class="p" size="-1"><a class="p" href="http://news.google.com/news/more?pz=1&amp;geo=toronto+on&amp;ncl=dQCMF26BgDnbrbMwT5xqg-b9aNKLM"><nobr><b>all 168 news articles&nbsp;&raquo;</b></nobr></a></font></div></font></td></tr></table>

prefix:

quot;>

suffix:

<b>...

So, the intended output *should* be:

The Toronto stock market maintained a solid lead Tuesday as gold stocks surged on record high bullion prices, railway stocks gained following major

So could you please show me what the appropriate working "string patter" would be? This is what I am trying, and it is not working:

<?php
if (preg_match('/">([a-zA-Z0-9.:;,$\s-]*)<b>\.\.\./', $text,$result)) {
echo $result[0];
}
?>

Thanks!

MadTechie · November 4, 2009

okay you want just the captured data so change

echo $result[0];

to

echo $result[1];

physaux · November 4, 2009

It is still not working.

Here maybe this will help, it is the rest of the code around the preg_match() attempt:

<?php
$url = 'http://news.google.com/news?pz=1&cf=all&ned=ca&hl=en&geo=toronto+on&output=rss';
$pointer = new SimpleXmlElement(file_get_contents($url));  
$count = 1;
foreach($pointer->channel->item as $entry) {
	echo "<a href='$entry->link' title='$entry->title'>" . $entry->title . "</a><br>";
	//Get description

	$description =  $entry->description;
	if (preg_match('/">([a-zA-Z0-9.:;,$\s-]*)<b>\.\.\./', $description ,$result)) {
		echo "<br><br>YESSSS:".$result[1];
	}else{
		echo "<br><br>NOOOOO".$description;
	}
	if($count>=$limit){
		break;
	}
	$count++;
}
?>

Any Idea? Try it yourself idk won't work for me

MadTechie · November 4, 2009

Ahh that makes more sense now

<?php
   $url = 'http://news.google.com/news?pz=1&cf=all&ned=ca&hl=en&geo=toronto+on&output=rss';
   $pointer = new SimpleXmlElement(file_get_contents($url));  
   $count = 1;
   $limit = 5; //added
   
   foreach($pointer->channel->item as $entry) {
      echo "<a href='$entry->link' title='$entry->title'>" . $entry->title . "</a><br>";
      //Get description
      
      $description = (string)$entry->description; //UPDATED to get the string instead of the object
      //updated RegEx to HTML instead of encoded HTML
      if (preg_match('/">([a-zA-Z0-9.:;,$\s-]*)<b>\.\.\./', $description ,$result)) {
         echo "<br><br>YESSSS:".$result[1];
      }else{
         echo "<br><br>NOOOOO".$description;
      }
      if($count>=$limit){
         break;
      }
      $count++;
   }
?>

physaux · November 4, 2009

Ahh finally thank you so much!

Goes to show I shouldn't assume what is relevant and what is not...

Anyways I finally succeeded at making my own news poster! Sick thanks alot!

EDIT: Actually, I need to ask you a couple more questions:

So the string that it was processing, was that the raw "source code", or "outputted code"?

Any advice on finding the <img> url in there? I think I could do it but I need to know if the string is raw HTML or outputted HTML. I didn't really understand that part of your comment.

MadTechie · November 4, 2009

Okay when you have XML data the strings get encoded. so < becomes < and > becomes >

the reason is simple

here some invalid XML

<DATA>
This is some text but if i had </DATA> in here it would mess up 
</DATA>

Now this will fail because it closes the DATA tag in the data contents (value)

so to stop that they encode it

valid XML

<DATA>
This is some text but if i had </DATA> in here it would mess up 
</DATA>

Hope that makes sense, now when PHP is reading the data it decode it back to what it should be.. so while the XML had

">

the value is

">

and that's what PHP gets

So to sum up.. you are looking for the HTML info ie <img

to grab the img URL you could use something like this (should work)

$HTML = 'la lalalalalalal <img height="400px" src="I am a URL" width="10px" > lalalal ala';
if (preg_match('/<img[^>]*src\s*=([\'"])([^\1]*?)\1[^>]*>/', $HTML, $reg)) {
echo $reg[2];
}

EDIT: oow forgot a ; but fixed now

I am kinda tired so i may not of made sense!

physaux · November 4, 2009

Great, that worked perfectly

And also, thanks for the explanation, I totally understood it.

Like URL encoding "spaces" as "%20" and such, never figured that out.

so thanks for all ur help!

Sign In

[SOLVED] please Help, preg_match() is not working, probably my string patter is off D:

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information