Jump to content

[SOLVED] help with preg_match_all expression


samoht

Recommended Posts

hello,

 

I need a little help writting a preg_match all expression.

 

I have a file with recurring sets of guestbook entries  - each entry looks like:

 

<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>

 

and I have put the whole file into a php variable $oldStuff.

 

I want to preg_match_all to be able to convert the entries into a proper syntax for a mysql insert.

here is an example of the insert I want to do:

INSERT INTO `jos_phocaguestbook_items` (`id`, `catid`, `sid`, `username`, `userid`, `email`, `homesite`, `ip`, `title`, `content`, `date`, `published`, `checked_out`, `checked_out_time`, `ordering`, `params`) VALUES
(1, 1, 0, 'Alex Fregal',0,'','','','Tuscon, AZ USA', 'Saw you in Tuscon in 2005 with Movin Out and was blown away by your performance and stamina.Merry Christmas and a very successful 2008 and beyond.', '2008-09-26 11:41:53', 1, 0, '0000-00-00 00:00:00', 1, ''),

 

And here is my preg_match_all so far:

preg_match_all("|<[^>]+>(.*)</[^>]+>|U",$oldStuff,$out,PREG_PATTERN_ORDER);
foreach($out as $input){
echo $input[2].','.$input[1].','.$input[0];
}

 

which only grabs the info between <> and </>

I also need to grab the name that is between the <br>'s

and the location that is between the last <br> and a "-"

then lastly I need to grab and convert the date which is between the "-" and the <hr>

 

Can anyone help me write the preg_match_all to do this??

 

Thanks

Humm try this

 

<?php
preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldStuff,$out,PREG_PATTERN_ORDER);
foreach($out as $input){
echo $input[1].",";
echo $input[2].",";
echo $input[3].",";
echo $input[4]."<br>";

}
?>

the way to work with strings is to think simple.

break your work functions into simple steps. Here's a way without using complex regex.

$file="file";
// since you said its recurring, we can split on <b>
$data=split("<b>",file_get_contents($file));
foreach ($data as $k => $v){
if( !empty($v)) {
    // split on <br>
    $s = split("<br>",$v);
    print_r($s);
    //work on $s from here to get your items.
} 
}


PREG_PATTERN_ORDER should be PREG_SET_ORDER

example

<?php
$oldStuff = '
<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>';
preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldStuff,$out,PREG_SET_ORDER);
foreach($out as $input)
{
echo $input[1].",";
echo $input[2].",";
echo $input[3].",";
echo $input[4]."<br>";

}
?>

That seems to work with just the one entry - but if add another like:

<?php
$oldxStuff = '
<b>just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.</b><br>
anthony sharkey <sharkeyanthony@aol.com><br>
USA - Thursday, September 11, 2003 at 17:23:17 (EDT)<hr>

<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>';
preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldxStuff,$out,PREG_SET_ORDER);
foreach($out as $input)
{
   echo $input[1].",";
   echo $input[2].",";
   echo $input[3].",";
   echo $input[4]."<br>";

}
?>

 

The out put looks like:

 

just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.</b><br>
anthony sharkey <sharkeyanthony@aol.com><br>
USA - Thursday, September 11, 2003 at 17:23:17 (EDT)<hr>

<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad,Tad,Nashville, TN USA,Saturday, September 06, 2003 at 10:42:56 (EDT)<br>


??

That look correct..

 

try changing the output ie

echo "<br>LINE1: ".htmlspecialchars($input[1])."<br>------------------------------------------------------<br>";
echo "<br>LINE2: ".htmlspecialchars($input[2])."<br>------------------------------------------------------<br>";

etc

Thanks for the help MadTechie, but it is actually puting everything into the first variable [1] except the last entry so my html looks like:

 

<br>LINE1: I was in the fourth row last night and couldn't get enough of your eyes and that incredible smile of yours. Not to mention the voice. You are all great, but I would pay to see you alone on stage.</b><br>
janet Ramstein<br>
Pittsburgh, PA USA - Wednesday, September 17, 2008 at 14:05:11 (EDT)<hr>

<b>Hi Darren,its been a while,just saw an add for the high kings.Congratulations on all you,ve achieved,best wishes always,ann Q.</b><br>
ann quinlan &lt;annquinlan1311@hotmail.com&gt;<br>

clonmel, tipperary ireland - Friday, September 12, 2008 at 17:10:16 (EDT)<hr>
...lost of other entries here
then 
Best, Tad<br>------------------------------------------------------<br><br>LINE2: Tad<br>------------------------------------------------------<br><br>LINE1: Nashville, TN USA<br>------------------------------------------------------<br><br>LINE2: Saturday, September 06, 2003 at 10:42:56 (EDT)<br>------------------------------------------------------<br>

 

NOT quite what I need

I should have as many LINE1: in my html as I do entries correct??

 

with

preg_match_all("%<[^>]+>(.*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldStuff,$out,PREG_SET_ORDER);
foreach($out as $input)
{
$d = $input[4];
$d1 = str_replace('at ', '', $d);
$d2 = date ('Y-m-d H:i:s', strtotime($d1));

echo "<br>LINE1: ".htmlspecialchars($input[1])."<br>------------------------------------------------------<br>";
echo "<br>LINE2: ".htmlspecialchars($input[2])."<br>------------------------------------------------------<br>";
echo "<br>LINE3: ".htmlspecialchars($input[3])."<br>------------------------------------------------------<br>";
echo "<br>LINE4: ".$d2."<br>------------------------------------------------------<br>";

}

 

I am very close but again all the entries are in LINE1: ecxept the input[2] though [4] are pulling from the last entry

 

could this problem be with the foreach?

Okay just tested one (slight update)

<?php
$oldxStuff = '
<b>just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.</b><br>
anthony sharkey <sharkeyanthony@aol.com><br>
USA - Thursday, September 11, 2003 at 17:23:17 (EDT)<hr>

<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>';
preg_match_all("%<[^>]+>([^>]*)</[^>]+><br>\s*(.*?)\s<.*?;<br>\s*(.*?)\s-\s(.*?)<hr>%si",$oldxStuff,$out,PREG_SET_ORDER);
foreach($out as $input)
{
   echo "<br>LINE1: ".htmlspecialchars($input[1])."<br>------------------------------------------------------<br>";
   echo "<br>LINE2: ".htmlspecialchars($input[2])."<br>------------------------------------------------------<br>";
   echo "<br>LINE3: ".htmlspecialchars($input[3])."<br>------------------------------------------------------<br>";
   echo "<br>LINE4: ".htmlspecialchars($input[4])."<br>------------------------------------------------------<br>";
}
?>

 

output

  Quote

 

LINE1: just saw darren perform on broadway with movin out. he was amazing. he would make billy joel proud. congrats darren.

------------------------------------------------------

 

LINE2: anthony sharkey

------------------------------------------------------

 

LINE3: USA

------------------------------------------------------

 

LINE4: Thursday, September 11, 2003 at 17:23:17 (EDT)

------------------------------------------------------

 

LINE1: Hey Darren, Congratulations on your Broadway successes as well. Best, Tad

------------------------------------------------------

 

LINE2: Tad

------------------------------------------------------

 

LINE3: Nashville, TN USA

------------------------------------------------------

 

LINE4: Saturday, September 06, 2003 at 10:42:56 (EDT)

------------------------------------------------------

Here's my take on it

 

<pre><?php

mysql_connect( 'localhost', 'root', '' );

// $data = file_get_contents( 'guestbook.txt' );
$data = <<<DATA
<b>Hey Darren,

Congratulations on your Broadway successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>
<b>Hey Darren,

Congratulations on your Broadway's successes as well.

Best, Tad</b><br>
Tad <info@globaldog.com><br>
Nashville, TN USA - Saturday, September 06, 2003 at 10:42:56 (EDT)<hr>
DATA;

/***************************
THIS REGEX IS VERY SLOW AND INEFFICIENT
I designed it as quickly and as simply as possible, asusming you're only going
to run this script once to covert your data to MySQL. If this will be used on-the-fly
I can write a sleeker regex pattern
***************************/
$regex = '%\s*+<b>(.*?)</b><br>\s*+(.*?) <(.*?)><br>\s*+(.*?) - (.*?)<hr>%si';

preg_match_all( $regex, $data, $posts, PREG_SET_ORDER );

$query = <<<SQL
INSERT INTO
	`jos_phocaguestbook_items`
		(`username`, `email`, `title`, `content`, `date`, `published`, `ordering`)
VALUES

SQL;

$values = array();
foreach( $posts as $post ) {
unset( $post[0] );
$post[5] = date(  'Y-m-d H:i:s', strtotime( str_replace(' at', '', $post[5]) )  );
foreach( $post as &$val )
	$val = mysql_real_escape_string( $val );
$values[] = "\t\t('$post[2]', '$post[3]', '$post[4]', '$post[1]', '$post[5]', 1, 1)";
}

$query .= implode( ",\n", $values );

echo $query;

?></pre>

  Quote

Yeah its a good app, but you still need to know the syntax, as the builder messes up alot

 

I use it more as a real-time reference... the builder is nice for the occasional keyword help ( i can never remember all the quantifiers or other minor things ). I agree the builder isn't suitable for ground-up complex regex, but the help files included are more than enough to get you around any assumptions the builder might make.

 

I especially like the debugging feature... it really helps in building streamlines regex, and gives you a visual, dynamic look at how the regex engine works.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.