Jump to content

Separating HTML tags in PHP


Horgy

Recommended Posts

Hi there,

 

I'm new to the board, so apologise if something like this has been asked before. I'm also quite new to php, so the disaster that is my code will probably make your stomach turn, but it works so far...

 

I am using file_get_contents to grab a website I have access to (my work rota, incidentally), and parse a bit of the html. The aim is to be able to take my rota and do a few things with the times and dates, etc..

 

So far, I have got the file, parsed the crap out of the string that I don't need, and i'm left with something that looks like this:

 

<TR><TD ></TD></TR><TR bgcolor='ffff00'><TD>Date</TD><TD>Duty</TD><TD>Dep</TD><TD>Begin</TD><TD>End</TD><TD>Arr</TD></TR><TR><TD ></TD></TR><TR><TD>23 Apr 12, Mon</TD><TD>297</TD><TD>STN</TD><TD>15:55</TD><TD>17:10</TD><TD>DUB</TD></TR><TR><TD>23 Apr 12, Mon</TD><TD>288</TD><TD>DUB</TD><TD>17:35</TD><TD>18:50</TD><TD>STN</TD></TR><TR><TD>23 Apr 12, Mon</TD><TD>293</TD><TD>STN</TD><TD>19:15</TD><TD>20:30</TD><TD>DUB</TD></TR><TR><TD>23 Apr 12, Mon</TD><TD>298</TD><TD>DUB</TD><TD>20:55</TD><TD>22:05</TD><TD>STN</TD></TR><TR><TD ></TD></TR>

 

The people who wrote my rota weren't very tidy, but in a nutshell, I want the information between the TD/TR brackets. Each Row contains cells with my date, time, destination etc in it. I can use striptags, but I end up with a long string that is difficult to split into useful information. I need something that parses the string like this..

 

"for every row grab information between <td> and </td> and load into $td[0]" etc.."

 

How would I go about doing something like this? I'm a bit stumped.

 

Thanks in advance

 

Horgy

Link to comment
Share on other sites

preg_match_all('#<TR[^>]*><TD>([^<]+)</TD><TD>([^<]+)</TD><TD>([^<]+)</TD><TD>([^<]+)</TD><TD>([^<]+)</TD><TD>([^<]+)</TD></TR>#', $thisText, $foo);

 

$foo will be a multidimensional array with foo[0] being all the rows, and then foo[1]-[6] being each column, in order, with the header in the first slot.  so $foo[3][0] is 'Dep', and $foo[3][1] is STN.

 

You could also use the DomDocument on this, but the documentation is terrible.

Link to comment
Share on other sites

  • 2 weeks later...

Dan,

 

Apologies for the late reply - but thanks a lot this worked great! I found my way to preg_match_all myself, but i'd have never worked out the RegEx you wrote on my own - thanks again.

 

James

Link to comment
Share on other sites

Dan,

 

I've been playing with the code and it works fairly well, however it doesn't seem to capture data that doesn't have a flight number? I.e. some rows haven't seemed to have been captured, even though they are in the string I'm working through. Any thoughts? The string, and output is below:

 

Pilot Roster Individual Plan for ****** Print Roster|Close WindowThis Roster is Published until 13 May 12, Sun. A/L Taken0A/L Planned0-19755392 = 22277629 - 42033021DateDutyDepBeginEndArr7 May 12, MonOFF(Z)STN00:0021:008 May 12, Tue8386STN10:5013:15PMI8 May 12, Tue8387PMI13:5016:25STN9 May 12, Wed258STN11:2513:10MMX9 May 12, Wed259MMX13:4015:30STN9 May 12, Wed9273STN16:0017:05EIN9 May 12, Wed9274EIN17:3018:30STN10 May 12, Thu10STN13:0016:30MMM10 May 12, Thu11MMM16:5520:00STN11 May 12, Fri9273STN15:3016:35EIN11 May 12, Fri9274EIN17:0018:00STN11 May 12, Fri3634STN18:2519:45BRE11 May 12, Fri3633BRE20:1021:25STN12 May 12, SatOFF BRKSTN01:0022:0013 May 12, SunOFF(Z)STN00:0021:00**THE REST OF THIS ROSTER IS PLANNED**14 May 12, MonOFF(Z)STN00:0021:0015 May 12, TueOFF(Z)STN00:0021:0016 May 12, WedOFF(Z)STN00:0021:0017 May 12, Thu4192STN05:4007:35BGY17 May 12, Thu4193BGY08:0010:00STN17 May 12, Thu168STN10:3512:35TRS17 May 12, Thu169TRS13:0015:10STN18 May 12, Fri966STN05:5507:50GSE18 May 12, Fri967GSE08:1510:10STN19 May 12, Sat522STN05:2007:15LDE19 May 12, Sat523LDE07:4009:30STN19 May 12, Sat682STN10:0012:15VST19 May 12, Sat683VST12:4014:45STN20 May 12, SunSBY0400-ZSTN04:0016:0021 May 12, Mon712STN06:2508:05AAR21 May 12, Mon713AAR08:3010:10STN22 May 12, TueOFF(Z)STN00:0021:0023 May 12, WedOFF(Z)STN00:0021:0024 May 12, ThuOFF(Z)STN00:0021:0025 May 12, FriOFF(Z)STN00:0021:0026 May 12, Sat4198STN18:1020:05BGY26 May 12, Sat4197BGY20:3022:30STN27 May 12, Sun225STN12:3513:50DUB27 May 12, Sun294DUB14:1515:30STN27 May 12, Sun297STN15:5517:10DUB27 May 12, Sun288DUB17:3518:50STN27 May 12, Sun293STN19:1520:30DUB27 May 12, Sun298DUB20:5522:05STN28 May 12, Mon2404STN11:3013:10FMM28 May 12, Mon2405FMM13:3515:15STN28 May 12, Mon2634STN15:4517:55ZAZ28 May 12, Mon2635ZAZ18:2020:25STN29 May 12, Tue10STN13:0016:30MMM29 May 12, Tue11MMM16:5520:00STN30 May 12, Wed9258STN16:3019:00IBZ30 May 12, Wed9259IBZ19:3522:05STN31 May 12, ThuOFF(Z)STN00:0021:001 Jun 12, FriOFF(Z)STN00:0021:002 Jun 12, SatOFF(Z)STN00:0021:003 Jun 12, SunOFF(Z)STN00:0021:00Roster Reponsibility - Please Read.· The 28 day roster will be published every Friday, no later then 1730 local Irish time.· The valid period will start the following Monday.· Week 1 is the final published roster. Changes made in this week will generate change notifications in the rostering system.· Weeks 2 to 4 are planned. No changes, except in exceptional circumstances, will be made to either days off or roster patterns. Changes made to planned weeks will not generate change notifications in the rostering system until they become Week 1.· It is the Pilots' responsibility to obtain the published roster each week to confirm duties for the coming week. · If, due to unforeseen circumstances, the published (i.e. Week 1) roster is not available by Friday evening, you must contact your crew control to confirm your assignments for the week ahead.

 

...and once the regex is complete, my array looks like this:

 

Array ( [0] => Array ( [0] => DateDutyDepBeginEndArr [1] => 8 May 12, Tue8386STN10:5013:15PMI [2] => 8 May 12, Tue8387PMI13:5016:25STN [3] => 9 May 12, Wed258STN11:2513:10MMX [4] => 9 May 12, Wed259MMX13:4015:30STN [5] => 9 May 12, Wed9273STN16:0017:05EIN [6] => 9 May 12, Wed9274EIN17:3018:30STN [7] => 10 May 12, Thu10STN13:0016:30MMM [8] => 10 May 12, Thu11MMM16:5520:00STN [9] => 11 May 12, Fri9273STN15:3016:35EIN [10] => 11 May 12, Fri9274EIN17:0018:00STN [11] => 11 May 12, Fri3634STN18:2519:45BRE [12] => 11 May 12, Fri3633BRE20:1021:25STN [13] => 17 May 12, Thu4192STN05:4007:35BGY [14] => 17 May 12, Thu4193BGY08:0010:00STN [15] => 17 May 12, Thu168STN10:3512:35TRS [16] => 17 May 12, Thu169TRS13:0015:10STN [17] => 18 May 12, Fri966STN05:5507:50GSE [18] => 18 May 12, Fri967GSE08:1510:10STN [19] => 19 May 12, Sat522STN05:2007:15LDE [20] => 19 May 12, Sat523LDE07:4009:30STN [21] => 19 May 12, Sat682STN10:0012:15VST [22] => 19 May 12, Sat683VST12:4014:45STN [23] => 21 May 12, Mon712STN06:2508:05AAR [24] => 21 May 12, Mon713AAR08:3010:10STN [25] => 26 May 12, Sat4198STN18:1020:05BGY [26] => 26 May 12, Sat4197BGY20:3022:30STN [27] => 27 May 12, Sun225STN12:3513:50DUB [28] => 27 May 12, Sun294DUB14:1515:30STN [29] => 27 May 12, Sun297STN15:5517:10DUB [30] => 27 May 12, Sun288DUB17:3518:50STN [31] => 27 May 12, Sun293STN19:1520:30DUB [32] => 27 May 12, Sun298DUB20:5522:05STN [33] => 28 May 12, Mon2404STN11:3013:10FMM [34] => 28 May 12, Mon2405FMM13:3515:15STN [35] => 28 May 12, Mon2634STN15:4517:55ZAZ [36] => 28 May 12, Mon2635ZAZ18:2020:25STN [37] => 29 May 12, Tue10STN13:0016:30MMM [38] => 29 May 12, Tue11MMM16:5520:00STN [39] => 30 May 12, Wed9258STN16:3019:00IBZ [40] => 30 May 12, Wed9259IBZ19:3522:05STN ) [date] => Array ( [0] => Date [1] => 8 May 12, Tue [2] => 8 May 12, Tue [3] => 9 May 12, Wed [4] => 9 May 12, Wed [5] => 9 May 12, Wed [6] => 9 May 12, Wed [7] => 10 May 12, Thu [8] => 10 May 12, Thu [9] => 11 May 12, Fri [10] => 11 May 12, Fri [11] => 11 May 12, Fri [12] => 11 May 12, Fri [13] => 17 May 12, Thu [14] => 17 May 12, Thu [15] => 17 May 12, Thu [16] => 17 May 12, Thu [17] => 18 May 12, Fri [18] => 18 May 12, Fri [19] => 19 May 12, Sat [20] => 19 May 12, Sat [21] => 19 May 12, Sat [22] => 19 May 12, Sat [23] => 21 May 12, Mon [24] => 21 May 12, Mon [25] => 26 May 12, Sat [26] => 26 May 12, Sat [27] => 27 May 12, Sun [28] => 27 May 12, Sun [29] => 27 May 12, Sun [30] => 27 May 12, Sun [31] => 27 May 12, Sun [32] => 27 May 12, Sun [33] => 28 May 12, Mon [34] => 28 May 12, Mon [35] => 28 May 12, Mon [36] => 28 May 12, Mon [37] => 29 May 12, Tue [38] => 29 May 12, Tue [39] => 30 May 12, Wed [40] => 30 May 12, Wed ) [flightnum] => Array ( [0] => Duty [1] => 8386 [2] => 8387 [3] => 258 [4] => 259 [5] => 9273 [6] => 9274 [7] => 10 [8] => 11 [9] => 9273 [10] => 9274 [11] => 3634 [12] => 3633 [13] => 4192 [14] => 4193 [15] => 168 [16] => 169 [17] => 966 [18] => 967 [19] => 522 [20] => 523 [21] => 682 [22] => 683 [23] => 712 [24] => 713 [25] => 4198 [26] => 4197 [27] => 225 [28] => 294 [29] => 297 [30] => 288 [31] => 293 [32] => 298 [33] => 2404 [34] => 2405 [35] => 2634 [36] => 2635 [37] => 10 [38] => 11 [39] => 9258 [40] => 9259 ) [dep] => Array ( [0] => Dep [1] => STN [2] => PMI [3] => STN [4] => MMX [5] => STN [6] => EIN [7] => STN [8] => MMM [9] => STN [10] => EIN [11] => STN [12] => BRE [13] => STN [14] => BGY [15] => STN [16] => TRS [17] => STN [18] => GSE [19] => STN [20] => LDE [21] => STN [22] => VST [23] => STN [24] => AAR [25] => STN [26] => BGY [27] => STN [28] => DUB [29] => STN [30] => DUB [31] => STN [32] => DUB [33] => STN [34] => FMM [35] => STN [36] => ZAZ [37] => STN [38] => MMM [39] => STN [40] => IBZ ) [deptime] => Array ( [0] => Begin [1] => 10:50 [2] => 13:50 [3] => 11:25 [4] => 13:40 [5] => 16:00 [6] => 17:30 [7] => 13:00 [8] => 16:55 [9] => 15:30 [10] => 17:00 [11] => 18:25 [12] => 20:10 [13] => 05:40 [14] => 08:00 [15] => 10:35 [16] => 13:00 [17] => 05:55 [18] => 08:15 [19] => 05:20 [20] => 07:40 [21] => 10:00 [22] => 12:40 [23] => 06:25 [24] => 08:30 [25] => 18:10 [26] => 20:30 [27] => 12:35 [28] => 14:15 [29] => 15:55 [30] => 17:35 [31] => 19:15 [32] => 20:55 [33] => 11:30 [34] => 13:35 [35] => 15:45 [36] => 18:20 [37] => 13:00 [38] => 16:55 [39] => 16:30 [40] => 19:35 ) [arrtime] => Array ( [0] => End [1] => 13:15 [2] => 16:25 [3] => 13:10 [4] => 15:30 [5] => 17:05 [6] => 18:30 [7] => 16:30 [8] => 20:00 [9] => 16:35 [10] => 18:00 [11] => 19:45 [12] => 21:25 [13] => 07:35 [14] => 10:00 [15] => 12:35 [16] => 15:10 [17] => 07:50 [18] => 10:10 [19] => 07:15 [20] => 09:30 [21] => 12:15 [22] => 14:45 [23] => 08:05 [24] => 10:10 [25] => 20:05 [26] => 22:30 [27] => 13:50 [28] => 15:30 [29] => 17:10 [30] => 18:50 [31] => 20:30 [32] => 22:05 [33] => 13:10 [34] => 15:15 [35] => 17:55 [36] => 20:25 [37] => 16:30 [38] => 20:00 [39] => 19:00 [40] => 22:05 ) [arr] => Array ( [0] => Arr [1] => PMI [2] => STN [3] => MMX [4] => STN [5] => EIN [6] => STN [7] => MMM [8] => STN [9] => EIN [10] => STN [11] => BRE [12] => STN [13] => BGY [14] => STN [15] => TRS [16] => STN [17] => GSE [18] => STN [19] => LDE [20] => STN [21] => VST [22] => STN [23] => AAR [24] => STN [25] => BGY [26] => STN [27] => DUB [28] => STN [29] => DUB [30] => STN [31] => DUB [32] => STN [33] => FMM [34] => STN [35] => ZAZ [36] => STN [37] => MMM [38] => STN [39] => IBZ [40] => STN ) ) 

 

 

As you can see, for example, is that i'm "OFF" on the 15th, yet the entries although they are in the string are not in the array. Is there any reason for this?

 

Thanks

James

Link to comment
Share on other sites

Further to this, I found why it wasn't being captured, but i'm not really sure how to modify the regex to cope with it. The problem is that days 'OFF' although have a value in the "Dep" column, they have no "Arr" and thus the <TD></TD> is blank and the regex disregards it.

 

Is there a way to modify the regex string above to allow cells to be blank?

 

James

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.