useroo Posted July 14, 2018 Share Posted July 14, 2018 Hi everyone / anyone, it's been many years since i plucked around in PHP scripts so i am a bit out of the loop and could use some help. I have an old directory script, the developers website no longer exists hence i can't ask them anymore. The script parses an URL form input, i believe with this string/s " function ParseURL($url) { $url = trim($url); //if (strpos($url, '.')<1) { return false; } // check if empty $len = strlen($url); if ($len<3) { return false; } if (strcmp("http://", substr($url, 0, 7)) !== 0) { $url = "http://" . $url; } $url_stuff = parse_url($url); if (!isset($url_stuff["path"])) { $url = $url . "/"; } return $url; }" This is inside a PHP file folder called "php4_classes" and once more in a folder called "php5_classes". I believe i can just add the https:// somewhere within here " if (strcmp("http://", substr($url, 0, 7)) !== 0) { $url = "http://" . $url; }" BUT HOW CORRECTLY?? In addition, with the same file, i find the http:// mentioned once more within a " function GetHTML" as if (strpos($nueva,"http://")=== FALSE) { $nueva = "http://" . $urlc . $nueva; } and a bit further down in "function GetAllHTML" i find " $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; " The last one i probably do not need to change or add anything in terms of being able to also add https websites to the directory, but the second one above may need the same add-in of a https recognition?? Hope someone knows how this probably simple solution?? Quote Link to comment Share on other sites More sharing options...
useroo Posted July 14, 2018 Author Share Posted July 14, 2018 Hmm, i just learned something new about the above problem to solve. The above script sections AUTO-ADD the http in front of any URL submitted. When i try to manually add-in an https it will instantly switch to http://https:// which makes no sense at all and can not be overwritten, so maybe i should just find a way to stop the script auto-adding http:// at all so it can be either, http or https, right?? Quote In the above script pieces, what would i have to correct how to ensure i do not cause PHP errors but still stop the pesky auto addition?? Quote Link to comment Share on other sites More sharing options...
requinix Posted July 15, 2018 Share Posted July 15, 2018 if (strcmp("http://", substr($url, 0, 7)) !== 0) { $url = "http://" . $url; } Look up strcmp and substr and you should be able to tell what this code is doing (if it wasn't already apparent). Since this code fixes URLs, if you want it to support HTTP and HTTPS then you need to add another condition in there that makes sure it only fixes URLs that don't have http:// and don't have https://. if (strcmp("http://", substr($url, 0, 7)) !== 0 && strcmp("https://", substr($url, 0, 8)) !== 0) { $url = "http://" . $url; } Except there's a better way of doing that, and in a way that allow for URLs that aren't in lowercase, though they probably are: strncasecmp. if (strncasecmp("http://", $url, 7) !== 0 && strncasecmp("https://", $url, 8) !== 0) { Now try fixing GetHTML yourself. As for the other one, what you posted doesn't involve "http://" anywhere so it doesn't have to be fixed, but it's very likely that whole thing needs to be pulled out and replaced anyways. What's the rest of it? 1 Quote Link to comment Share on other sites More sharing options...
useroo Posted July 20, 2018 Author Share Posted July 20, 2018 Oh, i did not get a notification of this answer and only find it now. The notifications were, however, on (green). The entire script is very long, to long to enter here and .php is not allowed to be attached, so i just enter the last section that deals with entering and processing an URL submission text field (before are the usual other text fields, like description, keywords, email address etc.) " function GetHTML ($url, $corto = true, $complet = true) { $url_stuff = parse_url($url); $urlc = $url; $url = $url_stuff['host']; if (!isset($url_stuff["port"])) { $port = 80; } else { $port = $url_stuff['port']; } if (!isset($url_stuff['path'])) { $url_stuff['path'] = "/"; } $path = $url_stuff['path']; $urlc = $url_stuff['host'] .$url_stuff['path'] ; $fp = @fsockopen ($url, $port, $errno, $errstr, 20); if (!$fp) { return FALSE; } else { stream_set_timeout($fp, 2); // $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; $header = "GET " . $url_stuff["path"] ; $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; fputs ($fp, $header); $header = ''; $body = ''; $act = false; $fin = false; while ((!feof($fp))) { $line = fread ($fp,8048); $header .= $line; $body = $body . $line; if ($fin) { break; } if ($corto) { $fin = true; $i = strpos($line, "<body"); if ($i>0) { $fin = true; } } } $ret = strpos($header, "Location:", 0); if ($ret !== false) { $fin = strpos($header, "\r\n", $ret +9); if(!$fin) { $fin = strpos($header, "\r\n", $ret +9); } $nueva = substr($header, $ret+9, $fin - $ret - 9); $nueva = trim($nueva); if (strpos($nueva,"http://")=== FALSE) { $nueva = "http://" . $urlc . $nueva; } $body = $this->GetHTML($nueva, $corto, $complet); } fclose ($fp); } return $body; } function GetAllHTML($url) { $url_stuff = parse_url($url); $urlc = $url; $url = $url_stuff['host']; if (!isset($url_stuff["port"])) $port = 80; else $port = $url_stuff["port"]; if (!isset($url_stuff["path"])) { $url_stuff["path"] = "/"; } if (!isset($url_stuff['query'])) { $url_stuff['query'] = ''; } $path = $url_stuff["path"]; $urlc = $url_stuff['host'] .$url_stuff["path"] ; $fp = @fsockopen ($url, $port, $errno, $errstr, 10); if (!$fp) { return FALSE; } else { $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; fputs ($fp, $header); $body = ''; while ( !feof($fp) ) { $line = fread ($fp, 4096); $body .= $line; if (strpos($body, '</html')) {break;} } fclose ($fp); } return $body; } } ?> " So, i think the idea of populating the URL form field with either http:// or and https:// is not the way to go, but rather just NOT populate it with anything instead so the submitter can just copy and paste the complete URL, no matter if http or https, right?? How could i do this without breaking the script?? Quote Link to comment Share on other sites More sharing options...
requinix Posted July 20, 2018 Share Posted July 20, 2018 Where are all the places in your application that use GetHTML or GetAllHTML, and what is the code? Quote Link to comment Share on other sites More sharing options...
useroo Posted July 23, 2018 Author Share Posted July 23, 2018 Inside a web directory script (Free version 3.0 of qlWeb Directory Script that no longer exists online otherwise) folders named php4_classes and php5_classes the file itself (it's only in this file of all the PHP scripts of the dir script) is called sites.class This is the complete line of script pertaining to this issue, located at the very end of the 2 fles (one for each PHP version): function GetHTML ($url, $corto = true, $complet = true) { $url_stuff = parse_url($url); $urlc = $url; $url = $url_stuff['host']; if (!isset($url_stuff["port"])) { $port = 80; } else { $port = $url_stuff['port']; } if (!isset($url_stuff['path'])) { $url_stuff['path'] = "/"; } $path = $url_stuff['path']; $urlc = $url_stuff['host'] .$url_stuff['path'] ; $fp = @fsockopen ($url, $port, $errno, $errstr, 20); if (!$fp) { return FALSE; } else { stream_set_timeout($fp, 2); // $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; $header = "GET " . $url_stuff["path"] ; $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; fputs ($fp, $header); $header = ''; $body = ''; $act = false; $fin = false; while ((!feof($fp))) { $line = fread ($fp,8048); $header .= $line; $body = $body . $line; if ($fin) { break; } if ($corto) { $fin = true; $i = strpos($line, "<body"); if ($i>0) { $fin = true; } } } $ret = strpos($header, "Location:", 0); if ($ret !== false) { $fin = strpos($header, "\r\n", $ret +9); if(!$fin) { $fin = strpos($header, "\r\n", $ret +9); } $nueva = substr($header, $ret+9, $fin - $ret - 9); $nueva = trim($nueva); if (strpos($nueva,"http://")=== FALSE) { $nueva = "http://" . $urlc . $nueva; } $body = $this->GetHTML($nueva, $corto, $complet); } fclose ($fp); } return $body; } function GetAllHTML($url) { $url_stuff = parse_url($url); $urlc = $url; $url = $url_stuff['host']; if (!isset($url_stuff["port"])) $port = 80; else $port = $url_stuff["port"]; if (!isset($url_stuff["path"])) { $url_stuff["path"] = "/"; } if (!isset($url_stuff['query'])) { $url_stuff['query'] = ''; } $path = $url_stuff["path"]; $urlc = $url_stuff['host'] .$url_stuff["path"] ; $fp = @fsockopen ($url, $port, $errno, $errstr, 10); if (!$fp) { return FALSE; } else { $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; fputs ($fp, $header); $body = ''; while ( !feof($fp) ) { $line = fread ($fp, 4096); $body .= $line; if (strpos($body, '</html')) {break;} } fclose ($fp); } return $body; } } Quote Link to comment Share on other sites More sharing options...
requinix Posted July 24, 2018 Share Posted July 24, 2018 Okay, but how are those functions being used? Because I'm probably going to tell you to get rid of them entirely and replace them with calls to file_get_contents. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.