MoFish Posted November 16, 2013 Share Posted November 16, 2013 (edited) Hi All, I'm trying to create a dropdown which will write out the directory structure names of another one of my servers (Remote URL) I have a script working (ish) however it keeps writing out "Parent Directory" as an option in my dropdown list. Could someone advise me on how to remove this? I've been looking into it for a while now. The structure of the HTML is. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <html> <head> <title>Index of /myurl.com</title> </head> <body> <h1>Index of /myurl.com</h1> <ul><li><a href="/"> Parent Directory</a></li> <li><a href="design1/"> design1/</a></li> </ul> </body></html> Ok - so first i set the URL and attempt strip out the ULs and the LI including "Parent Directory" - This doesnt seem to work <?php $url = file_get_contents("http://www.myurlwhichhasdirectorylisting.co.uk"); // do some line removing $newlines = array("\t","\n","\r","\x20\x20","\0","\x0B"); $content = str_replace($newlines, "", html_entity_decode($url)); // attempt to remove the ul and li (doesnt work) $start = strpos($content,"<ul><li><a href='/'>Parent Directory</a></li>"); $end = strpos($content,"</ul>",$start) + 8; $table = substr($content,$start,$end-$start); preg_match_all("|<li(.*)</li>|U",$table,$rows); ?> Then i attempt to loop around the dropdown <?php foreach ($rows[0] as $row){ preg_match_all("|<li(.*)</li>|U",$row,$cells); $var = strip_tags($cells[0][0]); echo "{$var}\n"; ?> <option value="<?=$var;?>"><?=$var;?></option> <?php } ?> I'm probably doing this a very long winded way Thanks! Edited November 16, 2013 by MoFish Quote Link to comment Share on other sites More sharing options...
denno020 Posted November 16, 2013 Share Posted November 16, 2013 You could possibly remove it by just using a preg_replace. $table = preg_replace("~.*<ul><li><a href='/'>Parent Directory</a></li>(.*)</ul>.*~" ,"\${1}", $content); I have no idea if that will work, I haven't tested it, but the theory is that it will match all text between the end of the first <li> to the start of the </ul> closing tag, and then put that into the $table variable. The other stuff will effectively be removed. Hopefully it gives you an idea Denno Quote Link to comment Share on other sites More sharing options...
requinix Posted November 16, 2013 Share Posted November 16, 2013 DOMDocument. Forget regular expressions, forget breaking it apart with string functions, and just use DOMDocument. getElementsByTagName() to get all the links, then loop through those and grab their href attributes. Quote Link to comment Share on other sites More sharing options...
MoFish Posted November 16, 2013 Author Share Posted November 16, 2013 Hi Thanks for your help. I finally got rid of that 'Parent Directory'. It is now writing out the following: <a href="design1/"> design1/</a> Ideally i would like it to only return "design1" without the anchors or trailing forward slash. Could anyone help? My code is: // declare the folder $html = file_get_contents("http://www.mywebsite.com/folderlist/"); preg_match_all('|<li>(.*)</li>|U', $html, $uu); $files = $uu[1]; print_r($files[1]); Thanks Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.