Jump to content


Photo

regex not working


  • Please log in to reply
7 replies to this topic

#1 SidewinderX

SidewinderX
  • Members
  • PipPip
  • Member
  • 10 posts

Posted 08 August 2006 - 02:42 AM

ok so i have this script that displays the content of a webpage

<?php
$url = "https://www.novaworld.com/Players/Stats.aspx?id=33680801261&p=616065";

//Gets the content of the submited URL;
$content = file_get_contents($url); 

//Strips the tags of the $content;
$strip = strip_tags($content);

//Removes Everything in the up to For:;
$remove = stristr($strip, 'Hawk Down');

//Declares $remove as $getstats;
$stats = $remove;
echo $stats;
?>

what i want to do is parse the name Ozymandias. from the $stats string. While trying to build my regex i used a pseudo-code so i could avoide the long load of getting the actual content of the website. The psuedo code is below:

<?php
$string = 'Down Ozymandias. Rank: 1-Star General ( 14 ) PCID: A13-3E71AC Player Created: June 9, 2006 TABLE.statsx ';
preg_match('#Down (.*?) Rank#', $string, $matches);
$str = $matches[1];
echo $str;
?>

the $string is a word for word exerpt from the $stats string in the first code. So i figured I could use $string to test out my regex and when i get it to work just replace the static $string with the dynamic string ($stats). The regex above works fine with the static string but when i encorperated it into the larger script it does not work.

<?php
$url = "https://www.novaworld.com/Players/Stats.aspx?id=33680801261&p=616065";

//Gets the content of the submited URL;
$content = file_get_contents($url); 

//Strips the tags of the $content;
$strip = strip_tags($content);

//Removes Everything in the up to For:;
$remove = stristr($strip, 'Hawk Down');

//Declares $remove as $getstats;
$stats = $remove;

preg_match('#Down (.*?) Rank#', $stats, $matches);
$str = $matches[1];
echo $str;

?>

I have NO idea whats wrong, this is the only part of the code that is holding me back from completing this script. If someone could please help me solve this problem i would really appreciate it.

Thank you

#2 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 08 August 2006 - 03:29 AM

You need to do some debugging. I got an error (failed to open stream) when using your code; although, this could be a configuration issue on my part. Are your errors on? Have you echoed out each step to see that it did what you expected?
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#3 SidewinderX

SidewinderX
  • Members
  • PipPip
  • Member
  • 10 posts

Posted 08 August 2006 - 03:31 AM

i believe thats a configuration issue on your part, check your php info and see if https is a registered stream, i dont think it is.

For me:

Registered PHP Streams  php, file, http, ftp, compress.zlib, compress.bzip2, https, ftps



EDIT:
Well at first my errors werent on but i enabled theme and there were only a few warnings due to undefined variables, i fixed them but that didnt fix the overall problem as suspected; and yes, i have echoed every step of the way and it does exactley what i want. Infact i have parsed 100 stats from this page using a different parseing method and they all work fine. However because I am parseing a name which can be 16 alphanumeric characters including hypehns, underscores and spaces (and not just a number) only a regex would work for this.

Thank You for your input, any other ideas?

p.s. when i echo $stats in the first code it yields this:

Hawk Down Ozymandias. Rank: 1-Star General ( 14 ) PCID: A13-3E71AC Player Created: June 9, 2006 TABLE.statsx TD, TABLE.statsx TH { height: 24px; font-size: 10pt; white-space: nowrap; } TABLE.statsx TD.left { background: url(/Images/Stats/cont_grn_01.gif); width: 2px; } TABLE.statsx TH { background: url(/Images/Stats/cont_grn_02.gif); text-align: left; font-weight: bold; } TABLE.statsx TD { background: url(/Images/Stats/cont_grn_02.gif); text-align: right; } TABLE.statsx TD.right { background: url(/Images/Stats/cont_grn_03.gif); width: 2px; } TABLE.statsx TR.spacer TD { background: url(/Images/Stats/spacer.gif); height: 2px; } Total Team Games Played: 568 Total Time Played: 4d 6h 50m Total Kills: 9596 Team Win Percentage: 60.21% Favorite Weapon Class: Assault Rifle Minutes in Zone (KOTH) 1300 Flags Captured (FB) 2 Targets Destroyed (A&D) 258 Awards Received Army Commendation Medal with 1 Bronze Oak Leaf Cluster Bronze Star with 1 Bronze Oak Leaf Cluster CQB Badge 1st Award Headhunter's Medal 1st Award Hill Giant Medal 1st Award Combat Infantryman Badge 1st Award Marksman Badge Combat Medical Badge 1st Award Purple Heart (12) Sapper's Badge 1st Award Overall Statistics TABLE.statsy TD, TABLE.statsy TH { height: 24px; font-size: 10pt; } TABLE.statsy TD.left { background: url(/Images/Stats/cont_blue_01.gif); width: 2px; } TABLE.statsy TH { background: url(/Images/Stats/cont_blue_02.gif); text-align: left; font-weight: bold; } TABLE.statsy TD { background: url(/Images/Stats/cont_blue_02.gif); text-align: right; } TABLE.statsy TD.right { background: url(/Images/Stats/cont_blue_03.gif); width: 2px; } TABLE.statsy TR.spacer TD { background: url(/Images/Stats/spacer.gif); height: 2px; } TABLE.statsz TD, TABLE.statsz TH { height: 24px; font-size: 10pt; } TABLE.statsz TD.left { background: url(/Images/Stats/cont_blk_01.gif); width: 2px; } TABLE.statsz TH { background: url(/Images/Stats/cont_blk_02.gif); text-align: left; font-weight: bold; white-space: nowrap; } TABLE.statsz TD { background: url(/Images/Stats/cont_blk_02.gif); text-align: center; } TABLE.statsz TD.right { background: url(/Images/Stats/cont_blk_03.gif); width: 2px; } TABLE.statsz TR.spacer TD { background: url(/Images/Stats/spacer.gif); height: 2px; } TABLE.statsz TR.header TH { background: url(/Images/Stats/spacer.gif); text-align: center; } IMG

and the regex is set up perfectly to parse that because it works in my pseudo code

#4 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 08 August 2006 - 02:21 PM

I cannot reconfigure my PHP at the moment to enable the https. Based on what you've shown, your regex works.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#5 SidewinderX

SidewinderX
  • Members
  • PipPip
  • Member
  • 10 posts

Posted 08 August 2006 - 05:15 PM

thats exactley the problem, i know the regex works fine, it just has to be the content of the string isnt matching the regex pattern.

When the content is passed through these functions

$content = file_get_contents($url);
$strip = strip_tags($content);

and then echoed, the output (echo $strip;) must be different then the actual content ($strip), no?

like $strip might equal "Down<br>Ozymandias.<br>Rank" but when it is outputed the html is not displayed or something.....

#6 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 08 August 2006 - 06:25 PM

Try echo $stats; and then view the source to see what's really there. The HTML display is not necessarily an accurate depiction. It sounds like you may also need the /s switch so that the . matches new lines.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#7 SidewinderX

SidewinderX
  • Members
  • PipPip
  • Member
  • 10 posts

Posted 09 August 2006 - 01:33 AM

The source looks like this for the portion im trying to parse (the whole source is attached)

Player Statistics for Delta Force - Black Hawk Down










Ozymandias.
Rank: 1-Star General ( 14 )


EDIT: ok this script works with the regex...and does exactley what i want, but i need some help cleaning it up (dont laugh LOL)

<?php
$url = "https://www.novaworld.com/Players/Stats.aspx?id=8542545489&p=616065";

//Gets the content of the submited URL;
$content = file_get_contents($url); 

//Strips the tags of the $content;
$strip = strip_tags($content);
preg_match('#Down






	
	


	(.*?)
	Rank#', $strip, $matches);
$str = $matches[1];
echo $str;
?>

[attachment deleted by admin]

#8 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 09 August 2006 - 04:56 AM

#Down\s+(\S+)\s+Rank#
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users