cordoprod Posted December 21, 2009 Share Posted December 21, 2009 Hi, I've got some html i just need a couple of strings from.. argh, it's freaking me out. I've tried a lot. Here is the html: <div id="Tab01" style="overflow: auto; overflow-x:hidden; height: 2800px; width:930px"> <table style="width:930px;background-color:#deded5; border-style:dotted; border-width:1px; border-color:#79796F; border-top:none"><tr> <td style="width:40px"> </td><td style="width:50px"><font class="overskrift2-ruteoversikt">Rutenr.:</td><td style="360px"><font class="overskrift2-ruteoversikt">Rutenavn:</td> <td style="width:40px"> </td><td style="width:50px"><font class="overskrift2-ruteoversikt">Rutenr.:</td><td style="360px"><font class="overskrift2-ruteoversikt">Rutenavn:</td> </tr> </table> <table style="background-color:#f3f3de;width:924px;height:369px;border-style:dotted;border-width:1px;border-color:#79796F;border-top:none"> <tr valign="top"> <td> <table style="width:462px"> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-100.htm">01-100</a></td><td style="width:360px">Moss-Fredrikstad-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-102.htm">01-102</a></td><td style="width:360px">Halden-Parken-Tistedal-Vold skog-Parken-Stenrød-Brekkerød-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-104.htm">01-104</a></td><td style="width:360px">Halden-Parken-Tistedal-Næringsrød-Parken-Remmen-Bjørklund-Sy kehuset-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-105.htm">01-105</a></td><td style="width:360px">Halden-Stangeløkka-Refne-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-110.htm">01-110</a></td><td style="width:360px">Halden-Sørli-Isebakke</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-111.htm">01-111</a></td><td style="width:360px">Halden-Svinesund-Strömstad</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-112.htm">01-112</a></td><td style="width:360px">Halden-Knardal-Hov-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-113.htm">01-113</a></td><td style="width:360px">Halden-Holtet-Bakke-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-114.htm">01-114</a></td><td style="width:360px">Halden-Aspedammen-Prestebakke-Kornsjø-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-115.htm">01-115</a></td><td style="width:360px">Halden-Elgklev</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-118.htm">01-118</a></td><td style="width:360px">Halden-Isebakke-Svinesund-Sponvika-Halden</td></tr> </table> </td> <td> <table style="width:462px"> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-121.htm">01-121</a></td><td style="width:360px">Halden-Torpedal-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-123.htm">01-123</a></td><td style="width:360px">Fjeld bru-Jørkebekk-Østkroken-Aremark</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-124.htm">01-124</a></td><td style="width:360px">Aremark-strømsfoss-Vestsida-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-131.htm">01-131</a></td><td style="width:360px">Ørje-Kasbo-Buer-Engsødegård-Granerud</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-132.htm">01-132</a></td><td style="width:360px">Ørje-Damholtet-Strømsfoss</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-140.htm">01-140</a></td><td style="width:360px">Halden-Aremark-Strømsfoss-Granerud-Ørje</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/FERJE-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-150.htm">01-150</a></td><td style="width:360px">Skjærhalden-Hvaler</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BAAT-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-191.htm">01-191</a></td><td style="width:360px">Strømsfoss-Tistedal/Ørje</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-192.htm">01-192</a></td><td style="width:360px">Halden-Brekkerød-Halden</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-194.htm">01-194</a></td><td style="width:360px">Halden ringbuss</td></tr> <tr valign="top"><td style="width:40px"><IMG SRC="../images/BUSS-s.gif" align="center"></td><td style="width:50px"><a href="../t/01-199.htm">01-199</a></td><td style="width:360px">Skolebuss Halden</td></tr> </table> </td> </tr> </table> </div> I want to parse this html, and put in a mysql database based on 2 fields. The first one is the bus number. All these are the bus numbers: <a href="../t/01-100.htm">01-100</a> And all these are the places: <td style="width:360px">Moss-Fredrikstad-Halden</td> But theres one more thing. The bus number and the place is connected to each other, so I need to find a way to parse them so they doesn't get mixed up. Any help please? Quote Link to comment Share on other sites More sharing options...
Catfish Posted December 21, 2009 Share Posted December 21, 2009 use preg_match() something like: foreach ($htmlCode as $lineNum => $htmlLine) { preg_match('/(.*\d\d\-\d\d\d\.htm">)(\d\d\-\d\d\d)(.*px">)(.*)(<\/td>.*)/', $htmlLine, $matches); $busNumber = $matches[2]; $busName = $matches[4]; $busArray[$busNumber] = $busName; } The regulr expression is probably wrong because I never get them right first time and have to stuff around changing them. I'll leave that up to you. Quote Link to comment Share on other sites More sharing options...
cags Posted December 21, 2009 Share Posted December 21, 2009 There are lots of ways this could be done. Generally speaking when parsing HTML the best approach is to use some kind of document model such as DOMDocument. It can be achieved using a regular expression something along the lines of... $pattern = '#<tr valign="top"><td style="width:40px"><IMG SRC="\.\./images/[a-z]+?-s\.gif" align="center"></td><td style="width:50px"><a href="\.\./t/[0-9]{2}-[0-9]{3}\.htm">([0-9]{2}-[0-9]{3})</a></td><td style="width:360px">([^<]+)</td></tr>#i'; preg_match_all($pattern, $input, $out); ...but it's not really the best way. I'd have given you an example of using the document model, but to be honest, I've never actually used it myself. Faux Edit: Catfish replied whilst I was typing this. Quote Link to comment Share on other sites More sharing options...
cordoprod Posted December 21, 2009 Author Share Posted December 21, 2009 Heres my code know: $htmlCode = file_get_contents("http://www.rutebok.no/NRIIISStaticTables/Tables/ruter/index/Avd_01.htm"); preg_match('/(.*\d\d\-\d\d\d\.htm">)(\d\d\-\d\d\d)(.*px">)(.*)(<\/td>.*)/', $htmlCode, $matches); $busNumber = $matches[2]; $busName = $matches[4]; $busArray[$busNumber] = $busName; echo $busNumber; echo $busName; It works, but it only output one entry of the things i want to parse. The busNumer is 01-100 and the busName is Moss-Fredrikstad, which is correct. But theres more of busNumbers and busNames. Quote Link to comment Share on other sites More sharing options...
cags Posted December 21, 2009 Share Posted December 21, 2009 That's because the code you are using was designed to operate on a per-line basis hence the foreach loop in their code. The preg_match function is designed to match a single pattern. I'm also not entirely sure why they used 4 capture groups when you only want two bits of information, but that's by-the-by. You will need to either split the content into lines and run the array through the loop like Catfish did in their example, or use preg_match_all like I did in my example. I still stand by my suggestion that Regular Expressions is probably not the best solution though. Quote Link to comment Share on other sites More sharing options...
cordoprod Posted December 21, 2009 Author Share Posted December 21, 2009 But when i add Catfish's code, it gives me this error: Warning: Invalid argument supplied for foreach() in /customers/cordoproduction.com/cordoproduction.com/httpd.www/ruteinfo/linjer.php on line 32 Quote Link to comment Share on other sites More sharing options...
cags Posted December 21, 2009 Share Posted December 21, 2009 That's because (as with most solutions provided by this forum) it's not perfectly custom tailored to be copy/pasted into your code (mainly owing to the fact you didn't post any). The foreach syntax is a construct for iterating through an array I'm going to go ahead and guess you are passing it $htmlCode which is a string not an array and as such can't be iterated though. You would need to parse the file into an array by using something such as explode to split the file up, then pass the array returned to the foreach loop. This approach, in my opinion requires more work than is necessary. Using preg_match_all to find the matches seems much more sensible. Depending on what you need to do with the information you would then use a foreach loop to iterate though $matches to display/ do whatever with the information. Quote Link to comment Share on other sites More sharing options...
cordoprod Posted December 21, 2009 Author Share Posted December 21, 2009 Ok so I tried preg_match_all: $htmlCode = file_get_contents("http://www.rutebok.no/NRIIISStaticTables/Tables/ruter/index/Avd_01.htm"); preg_match_all('/(.*\d\d\-\d\d\d\.htm">)(\d\d\-\d\d\d)(.*px">)(.*)(<\/td>.*)/', $htmlCode, $matches); $busNumber = $matches[2]; $busName = $matches[4]; echo $busNumber." ".$busName."<br>"; And the output is: Array Array Quote Link to comment Share on other sites More sharing options...
cags Posted December 21, 2009 Share Posted December 21, 2009 As I said in my previous post you would need to loop through the outputs with some kind of loop. You should really read the manual for the functions you use (I handily provided links in my earlier posts). It's outputting the word Array because $matches[2] holds an array of all the numbers and $matches[4] holds an array of all the names. One solution for outputting them would be... foreach($matches[2] as $k=>$v) { echo 'Bus Number: ' . $v . 'Bus Name: ' . $matches[4][$k] . '<br/>'; } Quote Link to comment Share on other sites More sharing options...
cordoprod Posted December 21, 2009 Author Share Posted December 21, 2009 Thanks alot. it works Quote Link to comment Share on other sites More sharing options...
cordoprod Posted December 21, 2009 Author Share Posted December 21, 2009 So what if i want to add these to a mysql table. I've tried this but it outputs nothing, and doesn't add anything to the table. $host = "***"; $user = "***"; $pass = "***"; $database = "***"; $linkID = mysql_connect($host, $user, $pass) or die("Could not connect to host."); mysql_select_db($database, $linkID) or die("Could not find database."); $query = "SELECT * FROM ruteinfo_linjer ORDER BY bussnavn DESC"; $resultID = mysql_query($query, $linkID) or die("Data not found."); $htmlCode = file_get_contents("http://www.rutebok.no/NRIIISStaticTables/Tables/ruter/index/Avd_01.htm"); preg_match_all('/(.*\d\d\-\d\d\d\.htm">)(\d\d\-\d\d\d)(.*px">)(.*)(<\/td>.*)/', $htmlCode, $matches); $busNumber = $matches[2]; $busName = $matches[4]; while ($row = mysql_fetch_array($resultID)) { foreach($matches[2] as $k=>$v) { if($v == $row['bussnummer']) { //this is just a test echo 'Bus Number: ' . $v . 'Bus Name: ' . $matches[4][$k] . '<br/>'; } else { $sql = "INSERT INTO ruteinfo_linje(fylke,bussnummer,bussnavn) VALUES('Østfold', '".$v."', '".$matches[4][$k]."'"; $result = mysql_query($sql, $linkID) or die("Error"); echo "hello"; } } } Please excuse me, I've almost forgotten PHP. It's been a while since I've developed with PHP, because I'm an Objective-C and Cocoa programmer now. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.