Eamonn Posted August 1, 2007 Share Posted August 1, 2007 Hey I've been presented with a task of parsing multiple .jsp's (this is after they have been executed server side so I guess for all purposes its actually a html file). Anyway each of these pages have large complex tables displaying a lot of reporting data for one of our systems. My original method of carrying out this task was to go into the code and get the actual DB querys that the page executes and have this more as a bash based solution. However after spending several days trying to hack my way through a jungle of 100's of querys which dont hold to any naming convention Im going to plan B. So here's what Im looking to do. Get php to construct the correct url for the jsp. What I mean by construct is to make the url while dynamically inserting the correct values into the url as it uses GET to set the date range of the information it writes to the browser. Once its done that and requested the page is processed I want php to search through the page and find the results that Im looking for, assign them to variables and finally format the information from all the different jsp's into one php page. One nice thing is that I'm able to modify the .jsp's to wrap a comment around the data I want for example. I think this should remove the hardest part of the job which is having php identify what values I actually want. #take_this_value# 1234556 ####### What I dont know is how to get PHP to request the url I create,parse it and extract the values. I'm guessing this is a job for wget and regular expressions but Im not too sure where to start (or if there is more appropriate functions to use). As allways suggestions welcome Eamonn Quote Link to comment https://forums.phpfreaks.com/topic/62808-using-php-to-parse-html-tables-and-extract-values/ Share on other sites More sharing options...
mrjcfreak Posted August 1, 2007 Share Posted August 1, 2007 I really really would recommend using the other way (server side, bash etc, even using php from the command line), and if not that, getting an xml feed of the data- there are already protocols for getting data from xml meaningfully. Provided the right wrappers are enabled in php.ini you can access remote files using fopen over URLs Failing that I would try and change the jsp page to put a comment in each table row like this: <tr><td>id015</td><td>product name</td><td>product desc</td><td>£40.51</td><td>5 in stock</td></tr> <!-- PHPuniqueunique#id=id105&product=product name&desc=product description&price=4051&stock=4 --> which will make it a hang of a lot easier to parse, you can then use reg exp to select the right data, and explode to split up the parts. (You will of course need to make sure your delimeters don't collide with the content). It is possible but I wouldn't recommend it long term, as there are issues when the webmaster decides to rearrange the JSP, and accessing data over the net is less reliable and secure than within systems. Quote Link to comment https://forums.phpfreaks.com/topic/62808-using-php-to-parse-html-tables-and-extract-values/#findComment-312691 Share on other sites More sharing options...
trq Posted August 1, 2007 Share Posted August 1, 2007 I'm guessing this is a job for wget and regular expressions Might be easier to look into dom. Quote Link to comment https://forums.phpfreaks.com/topic/62808-using-php-to-parse-html-tables-and-extract-values/#findComment-312715 Share on other sites More sharing options...
Eamonn Posted August 1, 2007 Author Share Posted August 1, 2007 Im using file_get_contents to open the read the webpage. Mrjcfreak: Im thinking of something similar to what you suggested. I was thinking of going and modifying the the td tags around each cell that contains the values that I want. so for example change <td>123456</td> to <td class=phpuniqueid>123456</td> Then try to create a regular expression which would extract the 123456 from between the tags. Only problem is that I have never writtena regular expression and cant seem to find any decent examples :/ Quote Link to comment https://forums.phpfreaks.com/topic/62808-using-php-to-parse-html-tables-and-extract-values/#findComment-312763 Share on other sites More sharing options...
mrjcfreak Posted August 1, 2007 Share Posted August 1, 2007 Hehe! There is a reg_exp forum on here which is probably the best place for tips Adding comments is far easier than changing the HTML... if you used comments for each row like <!--PHPSEARCH id=0101&desc=asdjsd ENDPHPSEARCH--> then get your page into PHP, split off the top and bottom jetsam, then: $tablerows = explode("<tr>",$therawhtml); //makes an array of html pieces array_walk($tablerows,"getfromrow"); //applies a function to every row $productarray = []; //define the productarray function getfromrow($text) { //this is the function preg_match("(<\!\-\-PHPSEARCH id=(.+?)&name=(.+?)&desc=(.+?) ENDPHPSEARCH\-\->)",$text,$matches); //note first parenthesized bit is $matches[1], not $matches[0]! $productarray["{$matches[1]}"] = array($matches[2],$matches[3]); } I haven't tested- it's only my thoughts written up, so don't apply it without checking it through! Quote Link to comment https://forums.phpfreaks.com/topic/62808-using-php-to-parse-html-tables-and-extract-values/#findComment-312787 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.