Jump to content

[SOLVED] help with preg_match_all expression


Recommended Posts

hello,

 

I need a little help writting a preg_match all expression.

 

I have a file with recurring sets of guestbook entries  - each entry looks like:

 

<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>

 

and I have put the whole file into a php variable $oldStuff.

 

I want to preg_match_all to be able to convert the entries into a proper syntax for a mysql insert.

here is an example of the insert I want to do:

INSERT INTO `jos_phocaguestbook_items` (`id`, `catid`, `sid`, `username`, `userid`, `email`, `homesite`, `ip`, `title`, `content`, `date`, `published`, `checked_out`, `checked_out_time`, `ordering`, `params`) VALUES
(1, 1, 0, 'Alex Fregal',0,'','','','Tuscon, AZ USA', 'Saw you in Tuscon in 2005 with Movin Out and was blown away by your performance and stamina.Merry Christmas and a very successful 2008 and beyond.', '2008-09-26 11:41:53', 1, 0, '0000-00-00 00:00:00', 1, ''),

 

And here is my preg_match_all so far:

preg_match_all("|<[^>]+>(.*)</[^>]+>|U",$oldStuff,$out,PREG_PATTERN_ORDER);
foreach($out as $input){
echo $input[2].','.$input[1].','.$input[0];
}

 

which only grabs the info between <> and </>

I also need to grab the name that is between the <br>'s

and the location that is between the last <br> and a "-"

then lastly I need to grab and convert the date which is between the "-" and the <hr>

 

Can anyone help me write the preg_match_all to do this??

 

Thanks

Humm try this

 

<?php
preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldStuff,$out,PREG_PATTERN_ORDER);
foreach($out as $input){
echo $input[1].",";
echo $input[2].",";
echo $input[3].",";
echo $input[4]."<br>";

}
?>

the way to work with strings is to think simple.

break your work functions into simple steps. Here's a way without using complex regex.

$file="file";
// since you said its recurring, we can split on <b>
$data=split("<b>",file_get_contents($file));
foreach ($data as $k => $v){
if( !empty($v)) {
    // split on <br>
    $s = split("<br>",$v);
    print_r($s);
    //work on $s from here to get your items.
} 
}


PREG_PATTERN_ORDER should be PREG_SET_ORDER

example

<?php
$oldStuff = '
<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>';
preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldStuff,$out,PREG_SET_ORDER);
foreach($out as $input)
{
echo $input[1].",";
echo $input[2].",";
echo $input[3].",";
echo $input[4]."<br>";

}
?>

That seems to work with just the one entry - but if add another like:

<?php
$oldxStuff = '
<b>just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.</b><br>
anthony sharkey <sharkeyanthony@aol.com><br>
USA - Thursday, September 11, 2003 at 17:23:17 (EDT)<hr>

<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>';
preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldxStuff,$out,PREG_SET_ORDER);
foreach($out as $input)
{
   echo $input[1].",";
   echo $input[2].",";
   echo $input[3].",";
   echo $input[4]."<br>";

}
?>

 

The out put looks like:

 

just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.</b><br>
anthony sharkey <sharkeyanthony@aol.com><br>
USA - Thursday, September 11, 2003 at 17:23:17 (EDT)<hr>

<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad,Tad,Nashville, TN USA,Saturday, September 06, 2003 at 10:42:56 (EDT)<br>


??

That look correct..

 

try changing the output ie

echo "<br>LINE1: ".htmlspecialchars($input[1])."<br>------------------------------------------------------<br>";
echo "<br>LINE2: ".htmlspecialchars($input[2])."<br>------------------------------------------------------<br>";

etc

Thanks for the help MadTechie, but it is actually puting everything into the first variable [1] except the last entry so my html looks like:

 

<br>LINE1: I was in the fourth row last night and couldn't get enough of your eyes and that incredible smile of yours. Not to mention the voice. You are all great, but I would pay to see you alone on stage.</b><br>
janet Ramstein<br>
Pittsburgh, PA USA - Wednesday, September 17, 2008 at 14:05:11 (EDT)<hr>

<b>Hi Darren,its been a while,just saw an add for the high kings.Congratulations on all you,ve achieved,best wishes always,ann Q.</b><br>
ann quinlan &lt;annquinlan1311@hotmail.com&gt;<br>

clonmel, tipperary ireland - Friday, September 12, 2008 at 17:10:16 (EDT)<hr>
...lost of other entries here
then 
Best, Tad<br>------------------------------------------------------<br><br>LINE2: Tad<br>------------------------------------------------------<br><br>LINE1: Nashville, TN USA<br>------------------------------------------------------<br><br>LINE2: Saturday, September 06, 2003 at 10:42:56 (EDT)<br>------------------------------------------------------<br>

 

NOT quite what I need

I should have as many LINE1: in my html as I do entries correct??

 

with

preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldStuff,$out,PREG_SET_ORDER);
foreach($out as $input)
{
$d = $input[4];
$d1 = str_replace('at ', '', $d);
$d2 = date ('Y-m-d H:i:s', strtotime($d1));

echo "<br>LINE1: ".htmlspecialchars($input[1])."<br>------------------------------------------------------<br>";
echo "<br>LINE2: ".htmlspecialchars($input[2])."<br>------------------------------------------------------<br>";
echo "<br>LINE3: ".htmlspecialchars($input[3])."<br>------------------------------------------------------<br>";
echo "<br>LINE4: ".$d2."<br>------------------------------------------------------<br>";

}

 

I am very close but again all the entries are in LINE1: ecxept the input[2] though [4] are pulling from the last entry

 

could this problem be with the foreach?

Okay just tested one (slight update)

<?php
$oldxStuff = '
<b>just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.</b><br>
anthony sharkey <sharkeyanthony@aol.com><br>
USA - Thursday, September 11, 2003 at 17:23:17 (EDT)<hr>

<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>';
preg_match_all("%<[^>]+>([^>]*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldxStuff,$out,PREG_SET_ORDER);
foreach($out as $input)
{
   echo "<br>LINE1: ".htmlspecialchars($input[1])."<br>------------------------------------------------------<br>";
   echo "<br>LINE2: ".htmlspecialchars($input[2])."<br>------------------------------------------------------<br>";
   echo "<br>LINE3: ".htmlspecialchars($input[3])."<br>------------------------------------------------------<br>";
   echo "<br>LINE4: ".htmlspecialchars($input[4])."<br>------------------------------------------------------<br>";
}
?>

 

output

 

LINE1: just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.

------------------------------------------------------

 

LINE2: anthony sharkey

------------------------------------------------------

 

LINE3: USA

------------------------------------------------------

 

LINE4: Thursday, September 11, 2003 at 17:23:17 (EDT)

------------------------------------------------------

 

LINE1: Hey Darren, Congratulations on your Broadway successes as well. Best, Tad

------------------------------------------------------

 

LINE2: Tad

------------------------------------------------------

 

LINE3: Nashville, TN USA

------------------------------------------------------

 

LINE4: Saturday, September 06, 2003 at 10:42:56 (EDT)

------------------------------------------------------

Here's my take on it

 

<pre><?php

mysql_connect( 'localhost', 'root', '' );

// $data = file_get_contents( 'guestbook.txt' );
$data = <<<DATA
<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>
<b>Hey Darren,

Congratulations on your Broadway's successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>
DATA;

/***************************
THIS REGEX IS VERY SLOW AND INEFFICIENT
I designed it as quickly and as simply as possible, asusming you're only going
to run this script once to covert your data to MySQL. If this will be used on-the-fly
I can write a sleeker regex pattern
***************************/
$regex = '%\s*+<b>(.*?)</b><br>\s*+(.*?) <(.*?)><br>\s*+(.*?) - (.*?)<hr>%si';

preg_match_all( $regex, $data, $posts, PREG_SET_ORDER );

$query = <<<SQL
INSERT INTO
	`jos_phocaguestbook_items`
		(`username`, `email`, `title`, `content`, `date`, `published`, `ordering`)
VALUES

SQL;

$values = array();
foreach( $posts as $post ) {
unset( $post[0] );
$post[5] = date(  'Y-m-d H:i:s', strtotime( str_replace(' at', '', $post[5]) )  );
foreach( $post as &$val )
	$val = mysql_real_escape_string( $val );
$values[] = "\t\t('$post[2]', '$post[3]', '$post[4]', '$post[1]', '$post[5]', 1, 1)";
}

$query .= implode( ",\n", $values );

echo $query;

?></pre>

Yeah its a good app, but you still need to know the syntax, as the builder messes up alot

 

I use it more as a real-time reference... the builder is nice for the occasional keyword help ( i can never remember all the quantifiers or other minor things ). I agree the builder isn't suitable for ground-up complex regex, but the help files included are more than enough to get you around any assumptions the builder might make.

 

I especially like the debugging feature... it really helps in building streamlines regex, and gives you a visual, dynamic look at how the regex engine works.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.