samoht Posted October 17, 2008 Share Posted October 17, 2008 hello, I need a little help writting a preg_match all expression. I have a file with recurring sets of guestbook entries - each entry looks like: <b>Hey Darren, Congratulations on your Broadway successes as well. Best, Tad</b><br> Tad <info@globaldog.com><br> Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr> and I have put the whole file into a php variable $oldStuff. I want to preg_match_all to be able to convert the entries into a proper syntax for a mysql insert. here is an example of the insert I want to do: INSERT INTO `jos_phocaguestbook_items` (`id`, `catid`, `sid`, `username`, `userid`, `email`, `homesite`, `ip`, `title`, `content`, `date`, `published`, `checked_out`, `checked_out_time`, `ordering`, `params`) VALUES (1, 1, 0, 'Alex Fregal',0,'','','','Tuscon, AZ USA', 'Saw you in Tuscon in 2005 with Movin Out and was blown away by your performance and stamina.Merry Christmas and a very successful 2008 and beyond.', '2008-09-26 11:41:53', 1, 0, '0000-00-00 00:00:00', 1, ''), And here is my preg_match_all so far: preg_match_all("|<[^>]+>(.*)</[^>]+>|U",$oldStuff,$out,PREG_PATTERN_ORDER); foreach($out as $input){ echo $input[2].','.$input[1].','.$input[0]; } which only grabs the info between <> and </> I also need to grab the name that is between the <br>'s and the location that is between the last <br> and a "-" then lastly I need to grab and convert the date which is between the "-" and the <hr> Can anyone help me write the preg_match_all to do this?? Thanks Quote Link to comment Share on other sites More sharing options...
MadTechie Posted October 17, 2008 Share Posted October 17, 2008 Humm try this <?php preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldStuff,$out,PREG_PATTERN_ORDER); foreach($out as $input){ echo $input[1].","; echo $input[2].","; echo $input[3].","; echo $input[4]."<br>"; } ?> Quote Link to comment Share on other sites More sharing options...
samoht Posted October 17, 2008 Author Share Posted October 17, 2008 um, well this came up with: ,,,<br>,,,<br>,,,<br>,,,<br>,,,<br> Quote Link to comment Share on other sites More sharing options...
ghostdog74 Posted October 17, 2008 Share Posted October 17, 2008 the way to work with strings is to think simple. break your work functions into simple steps. Here's a way without using complex regex. $file="file"; // since you said its recurring, we can split on <b> $data=split("<b>",file_get_contents($file)); foreach ($data as $k => $v){ if( !empty($v)) { // split on <br> $s = split("<br>",$v); print_r($s); //work on $s from here to get your items. } } Quote Link to comment Share on other sites More sharing options...
MadTechie Posted October 17, 2008 Share Posted October 17, 2008 PREG_PATTERN_ORDER should be PREG_SET_ORDER example <?php $oldStuff = ' <b>Hey Darren, Congratulations on your Broadway successes as well. Best, Tad</b><br> Tad <info@globaldog.com><br> Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>'; preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldStuff,$out,PREG_SET_ORDER); foreach($out as $input) { echo $input[1].","; echo $input[2].","; echo $input[3].","; echo $input[4]."<br>"; } ?> Quote Link to comment Share on other sites More sharing options...
samoht Posted October 17, 2008 Author Share Posted October 17, 2008 That seems to work with just the one entry - but if add another like: <?php $oldxStuff = ' <b>just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.</b><br> anthony sharkey <sharkeyanthony@aol.com><br> USA - Thursday, September 11, 2003 at 17:23:17 (EDT)<hr> <b>Hey Darren, Congratulations on your Broadway successes as well. Best, Tad</b><br> Tad <info@globaldog.com><br> Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>'; preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldxStuff,$out,PREG_SET_ORDER); foreach($out as $input) { echo $input[1].","; echo $input[2].","; echo $input[3].","; echo $input[4]."<br>"; } ?> The out put looks like: just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.</b><br> anthony sharkey <sharkeyanthony@aol.com><br> USA - Thursday, September 11, 2003 at 17:23:17 (EDT)<hr> <b>Hey Darren, Congratulations on your Broadway successes as well. Best, Tad,Tad,Nashville, TN USA,Saturday, September 06, 2003 at 10:42:56 (EDT)<br> ?? Quote Link to comment Share on other sites More sharing options...
MadTechie Posted October 17, 2008 Share Posted October 17, 2008 That look correct.. try changing the output ie echo "<br>LINE1: ".htmlspecialchars($input[1])."<br>------------------------------------------------------<br>"; echo "<br>LINE2: ".htmlspecialchars($input[2])."<br>------------------------------------------------------<br>"; etc Quote Link to comment Share on other sites More sharing options...
samoht Posted October 17, 2008 Author Share Posted October 17, 2008 Thanks for the help MadTechie, but it is actually puting everything into the first variable [1] except the last entry so my html looks like: <br>LINE1: I was in the fourth row last night and couldn't get enough of your eyes and that incredible smile of yours. Not to mention the voice. You are all great, but I would pay to see you alone on stage.</b><br> janet Ramstein<br> Pittsburgh, PA USA - Wednesday, September 17, 2008 at 14:05:11 (EDT)<hr> <b>Hi Darren,its been a while,just saw an add for the high kings.Congratulations on all you,ve achieved,best wishes always,ann Q.</b><br> ann quinlan <annquinlan1311@hotmail.com><br> clonmel, tipperary ireland - Friday, September 12, 2008 at 17:10:16 (EDT)<hr> ...lost of other entries here then Best, Tad<br>------------------------------------------------------<br><br>LINE2: Tad<br>------------------------------------------------------<br><br>LINE1: Nashville, TN USA<br>------------------------------------------------------<br><br>LINE2: Saturday, September 06, 2003 at 10:42:56 (EDT)<br>------------------------------------------------------<br> NOT quite what I need I should have as many LINE1: in my html as I do entries correct?? Quote Link to comment Share on other sites More sharing options...
samoht Posted October 17, 2008 Author Share Posted October 17, 2008 with preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldStuff,$out,PREG_SET_ORDER); foreach($out as $input) { $d = $input[4]; $d1 = str_replace('at ', '', $d); $d2 = date ('Y-m-d H:i:s', strtotime($d1)); echo "<br>LINE1: ".htmlspecialchars($input[1])."<br>------------------------------------------------------<br>"; echo "<br>LINE2: ".htmlspecialchars($input[2])."<br>------------------------------------------------------<br>"; echo "<br>LINE3: ".htmlspecialchars($input[3])."<br>------------------------------------------------------<br>"; echo "<br>LINE4: ".$d2."<br>------------------------------------------------------<br>"; } I am very close but again all the entries are in LINE1: ecxept the input[2] though [4] are pulling from the last entry could this problem be with the foreach? Quote Link to comment Share on other sites More sharing options...
MadTechie Posted October 17, 2008 Share Posted October 17, 2008 Okay just tested one (slight update) <?php $oldxStuff = ' <b>just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.</b><br> anthony sharkey <sharkeyanthony@aol.com><br> USA - Thursday, September 11, 2003 at 17:23:17 (EDT)<hr> <b>Hey Darren, Congratulations on your Broadway successes as well. Best, Tad</b><br> Tad <info@globaldog.com><br> Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>'; preg_match_all("%<[^>]+>([^>]*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldxStuff,$out,PREG_SET_ORDER); foreach($out as $input) { echo "<br>LINE1: ".htmlspecialchars($input[1])."<br>------------------------------------------------------<br>"; echo "<br>LINE2: ".htmlspecialchars($input[2])."<br>------------------------------------------------------<br>"; echo "<br>LINE3: ".htmlspecialchars($input[3])."<br>------------------------------------------------------<br>"; echo "<br>LINE4: ".htmlspecialchars($input[4])."<br>------------------------------------------------------<br>"; } ?> output LINE1: just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren. ------------------------------------------------------ LINE2: anthony sharkey ------------------------------------------------------ LINE3: USA ------------------------------------------------------ LINE4: Thursday, September 11, 2003 at 17:23:17 (EDT) ------------------------------------------------------ LINE1: Hey Darren, Congratulations on your Broadway successes as well. Best, Tad ------------------------------------------------------ LINE2: Tad ------------------------------------------------------ LINE3: Nashville, TN USA ------------------------------------------------------ LINE4: Saturday, September 06, 2003 at 10:42:56 (EDT) ------------------------------------------------------ Quote Link to comment Share on other sites More sharing options...
samoht Posted October 17, 2008 Author Share Posted October 17, 2008 Yes I believe that has done it!! Thank you very much!! I wish I could understand the expression though. Quote Link to comment Share on other sites More sharing options...
MadTechie Posted October 17, 2008 Share Posted October 17, 2008 Cool RegEx's get easier with pratice but it can be hell as well Quote Link to comment Share on other sites More sharing options...
discomatt Posted October 17, 2008 Share Posted October 17, 2008 Here's my take on it <pre><?php mysql_connect( 'localhost', 'root', '' ); // $data = file_get_contents( 'guestbook.txt' ); $data = <<<DATA <b>Hey Darren, Congratulations on your Broadway successes as well. Best, Tad</b><br> Tad <info@globaldog.com><br> Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr> <b>Hey Darren, Congratulations on your Broadway's successes as well. Best, Tad</b><br> Tad <info@globaldog.com><br> Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr> DATA; /*************************** THIS REGEX IS VERY SLOW AND INEFFICIENT I designed it as quickly and as simply as possible, asusming you're only going to run this script once to covert your data to MySQL. If this will be used on-the-fly I can write a sleeker regex pattern ***************************/ $regex = '%\s*+<b>(.*?)</b><br>\s*+(.*?) <(.*?)><br>\s*+(.*?) - (.*?)<hr>%si'; preg_match_all( $regex, $data, $posts, PREG_SET_ORDER ); $query = <<<SQL INSERT INTO `jos_phocaguestbook_items` (`username`, `email`, `title`, `content`, `date`, `published`, `ordering`) VALUES SQL; $values = array(); foreach( $posts as $post ) { unset( $post[0] ); $post[5] = date( 'Y-m-d H:i:s', strtotime( str_replace(' at', '', $post[5]) ) ); foreach( $post as &$val ) $val = mysql_real_escape_string( $val ); $values[] = "\t\t('$post[2]', '$post[3]', '$post[4]', '$post[1]', '$post[5]', 1, 1)"; } $query .= implode( ",\n", $values ); echo $query; ?></pre> Quote Link to comment Share on other sites More sharing options...
discomatt Posted October 17, 2008 Share Posted October 17, 2008 Yes I believe that has done it!! Thank you very much!! I wish I could understand the expression though. I recommend http://www.regexbuddy.com/. Well worth the 40 bucks IMO. Quote Link to comment Share on other sites More sharing options...
MadTechie Posted October 17, 2008 Share Posted October 17, 2008 Yeah its a good app, but you still need to know the syntax, as the builder messes up alot Quote Link to comment Share on other sites More sharing options...
discomatt Posted October 17, 2008 Share Posted October 17, 2008 Yeah its a good app, but you still need to know the syntax, as the builder messes up alot I use it more as a real-time reference... the builder is nice for the occasional keyword help ( i can never remember all the quantifiers or other minor things ). I agree the builder isn't suitable for ground-up complex regex, but the help files included are more than enough to get you around any assumptions the builder might make. I especially like the debugging feature... it really helps in building streamlines regex, and gives you a visual, dynamic look at how the regex engine works. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.