Jump to content

[SOLVED] Parsing an html file, replacing tags...


ripkjs

Recommended Posts

I have a file that is exported by an application as an .html with tables. In another post I had made here, DarkWater showed me the way to get the script to replace every other instance of <tr> with a <tr class="x">. Here is that script:

 

<?php
$points = "points.html";
$getPoints = file_get_contents($points);
function odd_replace($matches) {
    static $count = 0;
    if ($count % 2 == 0) { //even
       $count++;
       return '<tr class="row1">';
    }
    else {
       $count++;
       return '<tr class="row2">';
   }
}
$getPoints = preg_replace_callback("/<tr>/i", "odd_replace", $getPoints);
echo $getPoints;

 

Now on the same Idea of that script, is there a way to replace the table columns? Example of the htm prior to changes:

 

<TR>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
</TR>

 

And would like to change those to:

 

<TR>
<TD class="c1">text</TD>
<TD class="c2">text</TD>
<TD class="c3">text</TD>
<TD class="c4">text</TD>
<TD class="c5">text</TD>
<TD class="c6">text</TD>
<TD class="c7">text</TD>
<TD class="c8">text</TD>
</TR>

 

So there will always be 8 <td> opener tags that will need their own unique class="x" between each <tr> and </tr>. And need to have it repeat this multiple times within the html.

 

Sorry for my ignorance, but you all have been a great help so far, and this project is my first real scripting 'lesson'. I appreciate all the patience and guidance!

 

Link to comment
Share on other sites

Not a very elegant solution but it should work:

 

<?php
$search = array();
$replace = array();
for($i = 1; $i <= 8; $i++) {
    $search[] = '!\<TD\>!i';
    $replace[] = '<TD class="c'.$i.'">';
}
$parsedText = preg_replace($search, $replace, $originalText, 1);

Link to comment
Share on other sites

Not going to lie, that was annoying to write. xD  Here, see if this works.  Uncomment the first two lines and delete the example string I used.

 

<?php
//$points = "points.html";
//$getPoints = file_get_contents($points);
$getPoints = "<TR>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
</TR>";
function odd_replace($matches) {
    static $count = 0;
    if ($count % 2 == 0) { //even
       $matches[1] = '<tr class="row1">';
    }
    else {
       $matches[1] = '<tr class="row2">';
   }
   $count++;
   $i = 1;
   $matches[2] = preg_replace('/<td>(.+?)<\/td>/ise', "'<td class=\"c' . \$i++ . '\">' . '\\1' . '</td>'", $matches[2]);
   return $matches[1] . $matches[2] . $matches[3];
}

$getPoints = preg_replace_callback("/(<tr>)(.+?)(<\/tr>)/is", "odd_replace", $getPoints);
echo $getPoints;

 

>_<

Link to comment
Share on other sites

Not going to lie, that was annoying to write. xD  Here, see if this works.  Uncomment the first two lines and delete the example string I used.

 

<?php
//$points = "points.html";
//$getPoints = file_get_contents($points);
$getPoints = "<TR>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
<TD>text</TD>
</TR>";
function odd_replace($matches) {
    static $count = 0;
    if ($count % 2 == 0) { //even
       $matches[1] = '<tr class="row1">';
    }
    else {
       $matches[1] = '<tr class="row2">';
   }
   $count++;
   $i = 1;
   $matches[2] = preg_replace('/<td>(.+?)<\/td>/ise', "'<td class=\"c' . \$i++ . '\">' . '\\1' . '</td>'", $matches[2]);
   return $matches[1] . $matches[2] . $matches[3];
}

$getPoints = preg_replace_callback("/(<tr>)(.+?)(<\/tr>)/is", "odd_replace", $getPoints);
echo $getPoints;

 

>_<

 

Win. You are a godsend, Dark. I can't Thank you enough! 111.gif

Link to comment
Share on other sites

Though now my "images/image.png" is getting changed. any Idea why this is happening?

 

<td class="c1"><img src=\"images/image.png\" /></td>

 

This is how the img is being displayed. Changing any RCt to the image tag.

$rcT = '<img src="images/image.png" />';
$getPoints = preg_replace("/RCt/", $rcT, $getPoints);

Link to comment
Share on other sites

Glad I could help.  Do you understand what's going on in the script, or do you need anything explained?

 

I've been trying to follow along using the php.net reference guide. So I have an understanding of how it works, I just wouldn't be able to actually write it.. (yet? :D)

 

Have been learning lots, this project seems to be a good place to start trying to understand scripting. Doesn't seem to be too complex yet.

Link to comment
Share on other sites

Well, what I do understand of this so far is:

 

$points = "points.html"; //Defining the file to be parsed.

$getPoints = file_get_contents($points); //Grabbing the contents of that file as a string.

 

function odd_replace($matches) { //creating a function with the name odd_replace, $matches being the arguments.

 

    static $count = 0; //Defining the variable as 0, and making it static (which i think means each time this function is called, that it will remain 0)

 

    if ($count % 2 == 0) { //Checks $count to see if the number is even.

 

      $matches[1] = '<tr class="row1">'; //Sets the 1 key in the $matches array to the string in single quotes if $count is even? (guess)

    }

    else {

      $matches[1] = '<tr class="row2">'; //Same as above only if it is odd.

  }

  $count++; //increments $count by one.

  $i = 1; //setting $i to 1 for use in the preg_replace to give each <td> its own unique number.

 

  $matches[2] = preg_replace('/<td>(.+?)<\/td>/ise', "'<td class=\"c' . \$i++ . '\">' . '\\1' . '</td>'", $matches[2]);

 

/* this is where things get a tad fuzzy. I don't quite understand most of this, mostly all the added slashes and other toys.  I know the first parameter is looking for <td>. Single quotes around it because it is a string? Not sure why the first forward slash is there, though I'm assuming that it is defining the < as being literal, and not 'less than'. (.+?) I'm not sure, though guessing its some sort of wildcard. Assuming the next backslash after the > is again saying that its literal, and not 'greater than'. /ise, no idea. second parameter being what its replacing the first parameter with. Double quotes around all of it because you're using delimiters in the string. Backslash double quote again for literal. " . " is the delimiter, combining it all. \$i++ is taking $i and incrementing it by one each time the function is called. '\\1', not sure. closing the tag, then the next parameter being the source for the replacements. */

 

 

 

  return $matches[1] . $matches[2] . $matches[3]; //Not sure.

 

 

}

 

$getPoints = preg_replace_callback("/(<tr>)(.+?)(<\/tr>)/is", "odd_replace", $getPoints); //replacing <tr> with the results from the odd_replace function, in $getpoints.

 

echo $getPoints; //echos the string.

 

 

So I guess to make a short answer long, I don't really understand why it was messing with the image variables  :-[

 

I apologize if this was painful to read for anyone. I'm only about 4-5 days into PHP (or any scripting/coding outside html).

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.