Jump to content

Parse HTML or Plain Text File and return the text delimited by a string


judejitsu

Recommended Posts

Hi guys,

 

I need help in parsing a text file. This text file has a few html tags in it. What I am looking for is a Solution (Either in PHP or JS or both) which will strip all these, and store the output into separate variables.

 

  Integration/QA  
<http://shopfloor/sfweb/secure/CancelOrders>


  Development  
<http://shopfloor/sfweb/secure/CancelOrders>


------------------------------------------------------------------------

*HEADER INFO*
    *View Object:* 6541997  *BPO:* 0020064484   *Ack Date:* 2012-05-25
    *Operation(s):* PS_Queue, PS_BoxAll, JPN_End

------------------------------------------------------------------------

*EXTERNAL ORDER NUMBER REFERENCE*
*SAP Sales Order Number*    *Customer P.O. Number*  *Legacy Order Number*
0310407774      89FC37763001

------------------------------------------------------------------------

*PRODUCTS FOR THIS WORK OBJECT/OPERATION(S)*
*PL*    *Product #*     *Qty*   *Options*   *Serial #*
LN  AE241A  1        

------------------------------------------------------------------------

*Station Info*
*Start Station:* JPN_End    *Location:* Done    *Station:*
*Birth Date/Time:* 2012-05-22 08:26:17 SGT  *Power Cord:*   *Voltage:*

------------------------------------------------------------------------

*MATERIAL LIST FOR THIS WORK OBJECT/OPERATION(S)*
*Part Number*   *Qty*   *Description*   *BB Type*   *Material
Location*   *Serial Number*
AE241-90001     1   XP Remote Support Service Leaflet   BOM     PACK     


Privacy Statement

 

I basically Want to strip a few text from this code into php variables, so it will return:

 

$viewobject = "6541997"
$BPO = "0020064484"
$ackdate = "2012-05-25"

 

and the likeso I can later process these variables. Is this possible?

 

Link to comment
Share on other sites

Something like this. There is very little consistency in format in the file but the basic strategy will be the same - find point before the value, find something after and grab the bit in the middle

<?php
function getTextValue ($haystack, $needle) {
    $len = strlen($needle);
    $pos1 = strpos ($haystack, $needle);
    $pos2 = strpos($haystack, '*', $pos1+$len);             // find known content before value
    $pos3 = strpos($haystack, '*', $pos2+1);                // find known content after
    return trim(substr($haystack, $pos2+1, $pos3-$pos2-1)); // grab data between
}

echo getTextValue($thetext, 'Ack Date') . '<br />';        // -> 2012-05-25
echo getTextValue($thetext, 'View Object') ;               // -> 6541997

Link to comment
Share on other sites

judejitsu: if you want the information like "Start Station" then you may need to section your file into groups. Since your file has very little consistency, you may need to handle each one. Barand's code should work fine for those cases where you have a ':' in the search string. The rest are in a table-like format and it may not be quite as easy.

Link to comment
Share on other sites

Something like this. There is very little consistency in format in the file but the basic strategy will be the same - find point before the value, find something after and grab the bit in the middle

<?php
function getTextValue ($haystack, $needle) {
    $len = strlen($needle);
    $pos1 = strpos ($haystack, $needle);
    $pos2 = strpos($haystack, '*', $pos1+$len);             // find known content before value
    $pos3 = strpos($haystack, '*', $pos2+1);                // find known content after
    return trim(substr($haystack, $pos2+1, $pos3-$pos2-1)); // grab data between
}

echo getTextValue($thetext, 'Ack Date') . '<br />';        // -> 2012-05-25
echo getTextValue($thetext, 'View Object') ;               // -> 6541997

 

Thanks barry! I'll try this one out, someone also told me this can be done via regex, though i am not really sure how to do it... I'll try this one..

Link to comment
Share on other sites

judejitsu: if you want the information like "Start Station" then you may need to section your file into groups. Since your file has very little consistency, you may need to handle each one. Barand's code should work fine for those cases where you have a ':' in the search string. The rest are in a table-like format and it may not be quite as easy.

 

Thank you for the reply Kays, yes i know. But I have no choice, because this file is generated by a script that I do not have my hands on to. I also do not have access to the database of this program. That's why I am trying to find a workaround which will parse thru the text file and get the variables so i can insert them into my own database and create a report out of it.

Link to comment
Share on other sites

Okay, what if I will just parse the HTML file, so lets say my file contains:

 

  <b>EXTERNAL ORDER NUMBER REFERENCE</b>
<table width=100% cellspacing=0>
<tr align=left>
<td width=2% colspan=1><font face="verdana, arial, helvetica" size="-2">  </font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>SAP Sales Order Number</b></font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Customer P.O. Number</b></font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Legacy Order Number</b></font>
</td>
</tr>
<tr align=left>
<td width=2% colspan=1><font face="verdana, arial, helvetica" size="-2">  </font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">0310363858</font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">77340892008-120413</font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">89FF09378001</font>
</td>
</tr>
</table>
</p>
<hr>
<p>
  <b>PRODUCTS FOR THIS WORK OBJECT/OPERATION(S)</b>
<table width=100% cellspacing=0>
<tr align=left>
<td width=2% colspan=1><font face="verdana, arial, helvetica" size="-2">  </font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>PL</b></font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Product #</b></font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Qty</b></font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Options</b></font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Serial #</b></font>
</td>
</tr>
<tr align=left>
<td width=2% colspan=1><font face="verdana, arial, helvetica" size="-2">  </font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">3C</font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">AP703B</font>
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">1</font>
</td>
<td valign=top colspan=1>&nbsp
</td>
<td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">2S6219000G</font>
</td>
</tr>
</table>

 

Say the header is on top of the cell with the value, can I parse this and save to variable?

 

like say:

$sapsalesorderno = 0310363858;

etc...

 

Thanks for the replies!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.