HaLo2FrEeEk Posted June 24, 2007 Share Posted June 24, 2007 Hey, I'm trying to parse out the text of a weekly update from another site so I can convert it to bbcode and post it on my forum, an example of one of the weekly updates can be found here, my problem is that, if you look at the source, the start of the update is formatted like this: <div class="stdcontent" id="topStoryPreviewDiv"> <p>Frankie is super busy this week getting ready to show the game to a host of press doods who won’t be able to talk about jack or squat from Halo 3 until later this year (if they value their souls), So the defined start point of the preg_match would be the <div class="stdcontent" id="topStoryPreviewDiv"> part, but the update is on the next line, and I can't seem to get my preg_match to want to read the next line. This is my preg_match arguement: preg_match("| <div class=(.+?) id=(.+?)>\n\r <p>(.+?)|", $text, $match); but when I do an print_r($match), it shows nothing. If I remove the \n\r <p> from the argument, it shows the two values of the div tag (class and id) so I know it's not that that's broken. What can I do to fix this, any help will be appreciated. Thanks. Quote Link to comment Share on other sites More sharing options...
m-code Posted June 24, 2007 Share Posted June 24, 2007 I'm pretty new here, so I give it a shot of trying to help you. I don't really a big fan of using preg-functions so I just show you a different way of how I would do it. $content = file_get_contents("http://www.bungie.net/News/content.aspx?type=topnews&cid=12562",'r'); $content = strstr($content,'<div class="stdcontent" id="topStoryPreviewDiv">'); $len = strstr($content,'</div>'); $len = strlen($len); print substr($content,0,-$len); Good luck with it Quote Link to comment Share on other sites More sharing options...
HaLo2FrEeEk Posted June 24, 2007 Author Share Posted June 24, 2007 Well, for one I can't use file_get_contents becuase my server is secured so that that function is turned off, but I still have a way of getting the contents of the file by using the Snoopy library. If you actually look at the source though, you will notice that the beginning and end div's are on new lines, so I either need \n, \r, or both, but neither work. I tried your code but it didn't work, thank you though, at least you replied. Quote Link to comment Share on other sites More sharing options...
rea|and Posted June 24, 2007 Share Posted June 24, 2007 I didn't get if you want the entire DIV's content or only the first paragraph, anyways.. for the latter try this one: $rex='/(?<=<div class="stdcontent" id="topStoryPreviewDiv">)\s*<p>(.+)<\/p>/'; Quote Link to comment Share on other sites More sharing options...
HaLo2FrEeEk Posted June 24, 2007 Author Share Posted June 24, 2007 I need the entire div's content, the whole update. I will see if I can modify your code to help though, but any more assistance will be appreciated. Quote Link to comment Share on other sites More sharing options...
rea|and Posted June 24, 2007 Share Posted June 24, 2007 If you want to match each line within the div you needs two r.expressions. The first matches the div's and the second each line. Try this: if(preg_match('/<div class="stdcontent" id="topStoryPreviewDiv">(.+?)<\/div>/s',$htmlpage,$mth)) { preg_match_all('/<p>(.+?)<\/p>/s',$mth[1],$paragrahps); # if you want to exclude titles (span/strong lines) # preg_match_all('/<p>(?!<strong|<span)(.+?)(?:<br>)?<\/p>/s',$mth[1],$paragrahps); echo '<pre>'.print_r($paragrahps[1],true).'</pre>'; } else echo 'No matching found.'; Quote Link to comment Share on other sites More sharing options...
HaLo2FrEeEk Posted June 25, 2007 Author Share Posted June 25, 2007 No, look at the site I linked to, I want everything in the actual post itself, whether it is bold, a list, or whatever, I'll do a str_replace to change the html to bbcode. I also tried what you are using, rea|and and it did not work for what I wanted, becuase it isn't like the update text is between two div tags like this: <div>UPDATE CONTENT</div> It's like this: <div class="stdcontent" id="topStoryPreviewDiv"> <p>Frankie is super busy this week getting ready to show the game to a host of press doods who won’t be able to talk about jack or squat from Halo 3 until later this year (if they value their souls), so instead of a shimmering Atlas holding a golden pen this week, you get a tall(er) Ewok who needs a shave. </p> <p>Your tears, let me lick them. </p> (...) <p>Today marks the conclusion of the long-running Halo-themed pimping of your ride in <em>Forza 2</em>. By the time you get to this line, the official thread will be locked, pictures will be harvested and awe will no doubt strike the faces of those who gaze at what the community created. Next week, we’ll announce three winners and coordinate the claiming of prizes as well as our receipt of the cars, because Frank and I couldn’t ever make anything that rad using a livery editor. We need cheats. </p><br> <p></p> </div> The start and end div's are on different lines, which is why I thought I needed a \n and\or \r in the regex, but I can't get it to work. Quote Link to comment Share on other sites More sharing options...
rea|and Posted June 25, 2007 Share Posted June 25, 2007 I used that code against your link, anyways, try to use only the first preg_match, that works for multiline strings and it matches the div's content. $htmlpage='your html code here'; if(preg_match('/<div class="stdcontent" id="topStoryPreviewDiv">(.+?)<\/div>/s',$htmlpage,$mth)) echo $mth[1]; else echo 'No matching found.'; Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.