Jump to content

Recursive String Replace?


crazytonyi

Recommended Posts

I've written a script that converts a source of data into XML. Now it follows a pretty standard pattern but does it is not consistent (that sounded weird...) Here's an example:

 

<schedule>
<employee name="Joe Smith">
<day date="10/13/2008">
	<shift><descr>Sweeping</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
	<shift><descr>Yodeling</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
</day>
<day date="10/14/2008">
	<shift><descr>Mopping</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
</day>
</employee>
<consultant name="Butternut McClintock">
<day date="10/13/2008">
	<shift><descr>Cliff Diving</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
</day>
<day date="10/14/2008">
	<shift><descr>Chewing Food</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
	<shift><descr>Crackin' Wise</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
</day>
</employee>
</schedule>

 

The problem is that I want to move the date in front of the appropriate times (start and end times) so that I can ditch the day tag and just have a list of shifts per employee. But I don't want to create arrays (yet) and I can't use regex (I don't think) because the number of shifts changes per day, so it's not regular enough.

 

Is there a way, preferably without using loops, to just say "find this regular expression, and while you're at it, find this OTHER regular expression that comes after it, and then stop looking when you get to this end point" ? I thought maybe I could use a * or + to imply that the entire section repeats, but that got really confusing and kept throwing errors.

 

Help! Please!

 

Thanks

Link to comment
Share on other sites

Thanks for replying. Sorry, this is all VERY new for me. I've got a ton more problems, but this one may lead me to a generally better understanding. Okay, so the input is:

<schedule>
<employee name="Joe Smith">
   <day date="10/13/2008">
      <shift><descr>Sweeping</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
      <shift><descr>Yodeling</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
   </day>
   <day date="10/14/2008">
      <shift><descr>Mopping</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
   </day>
</employee>
<consultant name="Butternut McClintock">
   <day date="10/13/2008">
      <shift><descr>Cliff Diving</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
   </day>
   <day date="10/14/2008">
      <shift><descr>Chewing Food</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
      <shift><descr>Crackin' Wise</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
   </day>
</employee>
</schedule>

 

The output (or a sampling of it) would be:

 

<schedule>
<employee name="Joe Smith">
      <shift><descr>Sweeping</descr><starttime>10/13/2008 7:45 AM</starttime><endtime>10/13/2008 12:15 PM</endtime></shift>
      <shift><descr>Yodeling</descr><starttime>10/13/2008 1:15 PM</starttime><endtime>10/13/2008 6:00 PM</endtime></shift>
      <shift><descr>Mopping</descr><starttime>10/14/2008 7:45 AM</starttime><endtime>10/14/2008 11:15 AM</endtime></shift>
</employee>
</schedule>

Link to comment
Share on other sites

I am sure that there are better ways doing that, but here's what I've come up with:

 

<?php

$data = <<<DATA
<schedule>
<employee name="Joe Smith">
   <day date="10/13/2008">
      <shift><descr>Sweeping</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
      <shift><descr>Yodeling</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
   </day>
   <day date="10/14/2008">
      <shift><descr>Mopping</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
   </day>
</employee>
<consultant name="Butternut McClintock">
   <day date="10/13/2008">
      <shift><descr>Cliff Diving</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
   </day>
   <day date="10/14/2008">
      <shift><descr>Chewing Food</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
      <shift><descr>Crackin' Wise</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
   </day>
</employee>
</schedule>
DATA;

$regex = '#<day date="([^"]+)">(.*?)<starttime>([^>]+)</starttime><endtime>([^>]+)</endtime>.*?</day>#ise';
$replacement = "trim('$2').'<starttime>$1 $3</starttime><endtime>$1 $4</endtime>'";
$data = preg_replace($regex, $replacement, $data);

?>

 

 

Orio.

Link to comment
Share on other sites

That would work for the first shift, but what about any others that follow? I could do another find/replace for the ones below using the first one (maybe?) but I'm not even sure that's an option.

 

My original idea was to grab all of the "day" groups and then just make the date a variable that gets added to a regex, BUT I don't know how to get that array back into the string.

Link to comment
Share on other sites

From what I've tested, this works. I don't understand- what's the problem?

 

Orio.

 

Not to take sides.. but there does seem to be an issue, Orio... When I use your code, this is the output (when right-clicking and viewing source):

 

<schedule> 
<employee name="Joe Smith"> 
   <shift><descr>Sweeping</descr><starttime>10/13/2008 7:45 AM</starttime><endtime>10/13/2008 12:15 PM</endtime> 
   <shift><descr>Mopping</descr><starttime>10/14/2008 7:45 AM</starttime><endtime>10/14/2008 11:15 AM</endtime> 
</employee> 
<consultant name="Butternut McClintock"> 
   <shift><descr>Cliff Diving</descr><starttime>10/13/2008 7:45 AM</starttime><endtime>10/13/2008 12:15 PM</endtime> 
   <shift><descr>Chewing Food</descr><starttime>10/14/2008 7:45 AM</starttime><endtime>10/14/2008 11:15 AM</endtime> 
</employee> 
</schedule> 

 

Notice that the pattern only effects the first <shift> lines in each section.

I made a slight modification to your regex and replacement, and I think this will grab every line into consideration:

 

$regex = '#<day date="([^"]+)">(.*?)<starttime>([^>]+)</starttime><endtime>([^>]+)</endtime>(.*?)</day>#ise';
$replacement = "trim('$2').'<starttime>$1 $3</starttime><endtime>$1 $4</endtime>$5'";

 

What I did here was make a capture out of the last .*? between </endtime> and </day> (as you'll notice the lack of outputted closing </shift> tags in the initial solution). Then simply added $5 after </endtime> in $replacement.

 

Output:

<schedule> 
<employee name="Joe Smith"> 
   <shift><descr>Sweeping</descr><starttime>10/13/2008 7:45 AM</starttime><endtime>10/13/2008 12:15 PM</endtime></shift> 
      <shift><descr>Yodeling</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift> 
   
   <shift><descr>Mopping</descr><starttime>10/14/2008 7:45 AM</starttime><endtime>10/14/2008 11:15 AM</endtime></shift> 
   
</employee> 
<consultant name="Butternut McClintock"> 
   <shift><descr>Cliff Diving</descr><starttime>10/13/2008 7:45 AM</starttime><endtime>10/13/2008 12:15 PM</endtime></shift> 
   
   <shift><descr>Chewing Food</descr><starttime>10/14/2008 7:45 AM</starttime><endtime>10/14/2008 11:15 AM</endtime></shift> 
      <shift><descr>Crackin Wise</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift> 
   
</employee> 
</schedule> 

 

Cheers,

 

NRG

 

EDIT: Oops.. seems to not quite work with a few lines:

Example: <shift><descr>Crackin Wise</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>

 

Link to comment
Share on other sites

Thanks for all the great feedback! It got my gears turning. I regret to admit that I had to use a for loop after all. Here's what I came up with:

 

/* Gets the pattern for the main date and top shift */
$findshift = '/<day date=\"(.*)\">\n<shift>(<descr>.*<\/descr>)<starttime>(.*)<\/starttime><endtime>(.*)<\/endtime>/';

/* The replacement code...notice how the day tag is gone */
$fixshift ="<shift>$2<startime>$1 $3</startime><endtime>$1 $4</endtime>";

/*moves the date down to the first shift of that date */
$text = preg_replace($findshift, $fixshift, $text);

/* looks for the top shift and the one below it */
$findshift2 = '/(\n<shift>.*time>(\d*\/\d*\/\d*).*<\/shift>\n)(<shift>(<descr>.*<\/descr>)<starttime>(\d{1,2}:\d{2} (AM|PM))<\/starttime><endtime>(.*)<\/endtime>)/';

/*returns the first shift and shift below, now both with dates */
$fixshift2 ="$1<shift>$4<startime>$2 $5</startime><endtime>$2 $6</endtime>";

/* tried regular preg_replace, but only worked on groups of two shifts per day. Tried doing it 
twice, and only got three dates inserted. Figured the likely hood of anyone doing more than 8 
shifts in a day, so I have a loop that does it 8 times (the current record for this weeks data was 
5 shifts, so better safe than sorry) */

for ($i = 1; $i <= 8; $i++) {
$text = preg_replace($findshift2, $fixshift2, $text);
}

/* finally, delete all day close tags */
$text=str_replace("</day>\n", "", $text);

 

If anyone knows a more slick way of doing this, feel free to school me.

 

A

Link to comment
Share on other sites

try

<?php
$test = '<schedule>
<employee name="Joe Smith">
<day date="10/13/2008">
	<shift><descr>Sweeping</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
	<shift><descr>Yodeling</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
</day>
<day date="10/14/2008">
	<shift><descr>Mopping</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
</day>
</employee>
<consultant name="Butternut McClintock">
<day date="10/13/2008">
	<shift><descr>Cliff Diving</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
</day>
<day date="10/14/2008">
	<shift><descr>Chewing Food</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
	<shift><descr>Crackin\' Wise</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
</day>
</employee>
</schedule>';
$patern = '|(\s*<day date="(.*?)">)(\s*<shift>.*?<starttime>)(.*?)(</starttime><endtime>)(.*?)(</endtime></shift>)|is';
$r = '$3$2 $4$5$2 $6$7 $1';
while (strpos($test, '<day')){
$test = preg_replace($patern, $r, $test);
$test = preg_replace('|\s+<day[^>]*>\s+</day>|is', '', $test);
}
echo $test;
?>

Link to comment
Share on other sites

Thanks for replying. Sorry, this is all VERY new for me. I've got a ton more problems, but this one may lead me to a generally better understanding. Okay, so the input is:

<schedule>
<employee name="Joe Smith">
   <day date="10/13/2008">
      <shift><descr>Sweeping</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
      <shift><descr>Yodeling</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
   </day>
   <day date="10/14/2008">
      <shift><descr>Mopping</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
   </day>
</employee>
<consultant name="Butternut McClintock">
   <day date="10/13/2008">
      <shift><descr>Cliff Diving</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
   </day>
   <day date="10/14/2008">
      <shift><descr>Chewing Food</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
      <shift><descr>Crackin' Wise</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
   </day>
</employee>
</schedule>

 

The output (or a sampling of it) would be:

 

<schedule>
<employee name="Joe Smith">
      <shift><descr>Sweeping</descr><starttime>10/13/2008 7:45 AM</starttime><endtime>10/13/2008 12:15 PM</endtime></shift>
      <shift><descr>Yodeling</descr><starttime>10/13/2008 1:15 PM</starttime><endtime>10/13/2008 6:00 PM</endtime></shift>
      <shift><descr>Mopping</descr><starttime>10/14/2008 7:45 AM</starttime><endtime>10/14/2008 11:15 AM</endtime></shift>
</employee>
</schedule>

 

 

from the way i see what you want to get, there's no need for regular expression. Just iterate the file, skip  lines that contains  day tags

$xmlfile = "file";
$fh=fopen($xmlfile,"r");
while( ( $line = fgets( $fh,4096) ) !==FALSE  ){    
    if( strpos($line,'<day')==FALSE && strpos($line,'</day>') == FALSE){
        echo $line;    
    }
}
fclose($fh);

Link to comment
Share on other sites

from the way i see what you want to get, there's no need for regular expression. Just iterate the file, skip  lines that contains  day tags

$xmlfile = "file";
$fh=fopen($xmlfile,"r");
while( ( $line = fgets( $fh,4096) ) !==FALSE  ){    
    if( strpos($line,'<day')==FALSE && strpos($line,'</day>') == FALSE){
        echo $line;    
    }
}
fclose($fh);

 

But where in your code does it extract the date found in the day tag and insert that info after the <starttime> and <endtime> tags?

Link to comment
Share on other sites

$xmlfile = "file";
$fh=fopen($xmlfile,"r");
while( ( $line = fgets( $fh,4096) ) !==FALSE  ){    
    if( strpos($line,'<day')!==FALSE ){
        $day = split('"',$line);
        $date = $day[1];
        continue;
    }
    if ( strpos($line,"<shift>")!==FALSE){
        $sh = split("<endtime>",$line);
        $sh[1]=$date . $sh[1];
        echo join("<endtime>",$sh);
    }elseif( strpos($line,"</day>") !==FALSE) {
        continue;
    }else{
        echo $line;
    }
}
fclose($fh);

Link to comment
Share on other sites

  • 2 weeks later...

I didn't read through all of the thread, but here's what it seems you want from your before and after example:

<pre>
<?php
$xml=<<<EOL
<schedule>
<employee name="Joe Smith">
   <day date="10/13/2008">
      <shift><descr>Sweeping</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
      <shift><descr>Yodeling</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
   </day>
   <day date="10/14/2008">
      <shift><descr>Mopping</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
   </day>
</employee>
<consultant name="Butternut McClintock">
   <day date="10/13/2008">
      <shift><descr>Cliff Diving</descr><starttime>7:45 AM</starttime><endtime>12:15 PM</endtime></shift>
   </day>
   <day date="10/14/2008">
      <shift><descr>Chewing Food</descr><starttime>7:45 AM</starttime><endtime>11:15 AM</endtime></shift>
      <shift><descr>Crackin' Wise</descr><starttime>1:15 PM</starttime><endtime>6:00 PM</endtime></shift>
   </day>
</employee>
</schedule>
EOL;
echo '<hr>before:<br>'.htmlentities($xml);
$xml=preg_replace_callback('~\s*<day date="([^"]*)">(.*?)\r\n *</day>~s','replfunc',$xml);
function replfunc($match){
  return preg_replace('/(?<=<starttime>|<endtime>)/',$match[1].' ',$match[2]);
}
echo '<hr>after:<br>'.htmlentities($xml);
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.