eatc7402 Posted April 4, 2011 Share Posted April 4, 2011 I have a number of records of the html sections of a series of Google Earth placemarks that I managed to extract from the raw kml file. The purpose of the exersize was to then use the simplehtmldom.php api to extract the DATA from the raw html code. Some of the process is going well... and some is NOT. I have found that if I modify the raw html code by entering ID attributes into the html code the simplehtmldom api has an easy time identifying the desired data, and the data can be far 'cleaner' by entering an id attribute as 'close' to the data as possible. But doing a php text search and replace often requires finding a 'unique' identifyable portion of the html code and THEN placing the 'id' attribute in a nearby html tag because the desired data is nested inside a non-unique tag. As in I can identify a SPECIFIC <td> tag section where the data i desire is located but the data is nested inside a <font> tag inside the <td> cluster. Hence my problem... If I do a search in the following code... <td><b><font size="+2" color="#FF0000">Neighborhood:</font> <font size="+2" color="#0000FF">City of Sidney</font></b></td> I can locate the 'Neighborhood:' string because it is unique in the whole html code. Then by some charcter counting I am desiring to put my 'id' attribute in the NEXT font tag because it surrounds the desired data the 'City of Sidney'... as in... <td><b><font size="+2" color="#FF0000">Neighborhood:</font> <font id="neighborhood" size="+2" color="#0000FF">City of Sidney</font></b></td> With this modification the desired data is easily found and cleanly produced. But the html code while all operating correctly in a web page is not all identicle from a 'whitespace' point of view AND thus my problem. If I search the following code... <td> <b><font size="+2" color="#FF0000">Neighboorhood:</font> <font size="+2" color="#0000FF">Greenacre</font></b> </td> While being identical as far as html is concerned if I search this code for the 'Neighboorhood:' identifier I find it... but then attempting to place the id tag into the NEXT font tag is being problematic. What i seem to need is a function that once the 'Neighboorhood:' string position is identied and noted in the whole of the html code, to FIND and modify the NEXT occurance of a font tag no matter what whitespace (or special charachters) may be occuring. Any suggestions?? eatc7402 Quote Link to comment Share on other sites More sharing options...
rpmorrow Posted April 4, 2011 Share Posted April 4, 2011 Rather than counting characters from your occurrence of "Neighborhood:", use a combination of strpos and substr to find the position of the next font tag and then insert what you need using preg_replace, or str_replace. Quote Link to comment Share on other sites More sharing options...
sasa Posted April 4, 2011 Share Posted April 4, 2011 try <?php $test = '<td> <b><font size="+2" color="#FF0000">Neighboorhood:</font> <font size="+2" color="#0000FF">Greenacre</font></b> </td> '; $out = preg_replace('/(Neighboorhood:.*?<font.*?)>/s', '\1 what you want to insert>', $test); echo $out; ?> Quote Link to comment Share on other sites More sharing options...
eatc7402 Posted April 4, 2011 Author Share Posted April 4, 2011 Well I found a php function to strip out and remove whitespace and special character, and then using strlen and strreplaece from the then know positions seems to br getting me closer to my desired outcome. The regular expression function given in a reply DID not do what I desired which was a TEXT SUBSTITUION. eact7402 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.