Jump to content

[SOLVED] Parsing an html file, replacing tags...


ripkjs

Recommended Posts

I have a file that is exported by an application as an .html with tables. In another post I had made here, DarkWater showed me the way to get the script to replace every other instance of <tr> with a <tr class="x">. Here is that script:

 

<?php
$points = "points.html";
$getPoints = file_get_contents($points);
function odd_replace($matches) {
    static $count = 0;
    if ($count % 2 == 0) { //even
       $count++;
       return '<tr class="row1">';
    }
    else {
       $count++;
       return '<tr class="row2">';
   }
}
$getPoints = preg_replace_callback("/<tr>/i", "odd_replace", $getPoints);
echo $getPoints;

 

Now on the same Idea of that script, is there a way to replace the table columns? Example of the htm prior to changes:

 

<TR>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
</TR>

 

And would like to change those to:

 

<TR>
<TD class="c1">text</TD>
<TD class="c2">text</TD>
<TD class="c3">text</TD>
<TD class="c4">text</TD>
<TD class="c5">text</TD>
<TD class="c6">text</TD>
<TD class="c7">text</TD>
<TD class="c8">text</TD>
</TR>

 

So there will always be 8 <td> opener tags that will need their own unique class="x" between each <tr> and </tr>. And need to have it repeat this multiple times within the html.

 

Sorry for my ignorance, but you all have been a great help so far, and this project is my first real scripting 'lesson'. I appreciate all the patience and guidance!

 

Not a very elegant solution but it should work:

 

<?php
$search = array();
$replace = array();
for($i = 1; $i <= 8; $i++) {
    $search[] = '!\<TD\>!i';
    $replace[] = '<TD class="c'.$i.'">';
}
$parsedText = preg_replace($search, $replace, $originalText, 1);

Not going to lie, that was annoying to write. xD  Here, see if this works.  Uncomment the first two lines and delete the example string I used.

 

<?php
//$points = "points.html";
//$getPoints = file_get_contents($points);
$getPoints = "<TR>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
</TR>";
function odd_replace($matches) {
    static $count = 0;
    if ($count % 2 == 0) { //even
       $matches[1] = '<tr class="row1">';
    }
    else {
       $matches[1] = '<tr class="row2">';
   }
   $count++;
   $i = 1;
   $matches[2] = preg_replace('/<td>(.+?)<\/td>/ise', "'<td class=\"c' . \$i++ . '\">' . '\\1' . '</td>'", $matches[2]);
   return $matches[1] . $matches[2] . $matches[3];
}

$getPoints = preg_replace_callback("/(<tr>)(.+?)(<\/tr>)/is", "odd_replace", $getPoints);
echo $getPoints;

 

>_<

Not going to lie, that was annoying to write. xD  Here, see if this works.  Uncomment the first two lines and delete the example string I used.

 

<?php
//$points = "points.html";
//$getPoints = file_get_contents($points);
$getPoints = "<TR>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
</TR>";
function odd_replace($matches) {
    static $count = 0;
    if ($count % 2 == 0) { //even
       $matches[1] = '<tr class="row1">';
    }
    else {
       $matches[1] = '<tr class="row2">';
   }
   $count++;
   $i = 1;
   $matches[2] = preg_replace('/<td>(.+?)<\/td>/ise', "'<td class=\"c' . \$i++ . '\">' . '\\1' . '</td>'", $matches[2]);
   return $matches[1] . $matches[2] . $matches[3];
}

$getPoints = preg_replace_callback("/(<tr>)(.+?)(<\/tr>)/is", "odd_replace", $getPoints);
echo $getPoints;

 

>_<

 

Win. You are a godsend, Dark. I can't Thank you enough! 111.gif

Though now my "images/image.png" is getting changed. any Idea why this is happening?

 

<td class="c1"><img src=\"images/image.png\" /></td>

 

This is how the img is being displayed. Changing any RCt to the image tag.

$rcT = '<img src="images/image.png" />';
$getPoints = preg_replace("/RCt/", $rcT, $getPoints);

Glad I could help.  Do you understand what's going on in the script, or do you need anything explained?

 

I've been trying to follow along using the php.net reference guide. So I have an understanding of how it works, I just wouldn't be able to actually write it.. (yet? :D)

 

Have been learning lots, this project seems to be a good place to start trying to understand scripting. Doesn't seem to be too complex yet.

Well, what I do understand of this so far is:

 

$points = "points.html"; //Defining the file to be parsed.

$getPoints = file_get_contents($points); //Grabbing the contents of that file as a string.

 

function odd_replace($matches) { //creating a function with the name odd_replace, $matches being the arguments.

 

    static $count = 0; //Defining the variable as 0, and making it static (which i think means each time this function is called, that it will remain 0)

 

    if ($count % 2 == 0) { //Checks $count to see if the number is even.

 

      $matches[1] = '<tr class="row1">'; //Sets the 1 key in the $matches array to the string in single quotes if $count is even? (guess)

    }

    else {

      $matches[1] = '<tr class="row2">'; //Same as above only if it is odd.

  }

  $count++; //increments $count by one.

  $i = 1; //setting $i to 1 for use in the preg_replace to give each <td> its own unique number.

 

  $matches[2] = preg_replace('/<td>(.+?)<\/td>/ise', "'<td class=\"c' . \$i++ . '\">' . '\\1' . '</td>'", $matches[2]);

 

/* this is where things get a tad fuzzy. I don't quite understand most of this, mostly all the added slashes and other toys.  I know the first parameter is looking for <td>. Single quotes around it because it is a string? Not sure why the first forward slash is there, though I'm assuming that it is defining the < as being literal, and not 'less than'. (.+?) I'm not sure, though guessing its some sort of wildcard. Assuming the next backslash after the > is again saying that its literal, and not 'greater than'. /ise, no idea. second parameter being what its replacing the first parameter with. Double quotes around all of it because you're using delimiters in the string. Backslash double quote again for literal. " . " is the delimiter, combining it all. \$i++ is taking $i and incrementing it by one each time the function is called. '\\1', not sure. closing the tag, then the next parameter being the source for the replacements. */

 

 

 

  return $matches[1] . $matches[2] . $matches[3]; //Not sure.

 

 

}

 

$getPoints = preg_replace_callback("/(<tr>)(.+?)(<\/tr>)/is", "odd_replace", $getPoints); //replacing <tr> with the results from the odd_replace function, in $getpoints.

 

echo $getPoints; //echos the string.

 

 

So I guess to make a short answer long, I don't really understand why it was messing with the image variables  :-[

 

I apologize if this was painful to read for anyone. I'm only about 4-5 days into PHP (or any scripting/coding outside html).

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.