Jump to content

adding https parsing to a old URL submission form PHP script that only has http?


useroo

Recommended Posts

Hi everyone / anyone, it's been many years since i plucked around in PHP scripts so i am a bit out of the loop and could use some help.
 
I have an old directory script, the developers website no longer exists hence i can't ask them anymore.
 
The script parses an URL form input, i believe with this string/s
"    function ParseURL($url) {
      $url = trim($url);
      
      //if (strpos($url, '.')<1) { return false;  }
      
      // check if empty 
      $len = strlen($url);
      if ($len<3) { return false;  }
      
      if (strcmp("http://", substr($url, 0, 7)) !== 0) {
          $url = "http://" . $url;
      }
  
      $url_stuff = parse_url($url);
      if (!isset($url_stuff["path"])) {  $url = $url . "/";  }
 
      return $url;
    }"
 
This is inside a PHP file folder called "php4_classes" and once more in a folder called "php5_classes".
 
I believe i can just add the https://
somewhere within here
"      if (strcmp("http://", substr($url, 0, 7)) !== 0) {
          $url = "http://" . $url;
      }"
BUT HOW CORRECTLY??
 
In addition, with the same file, i find the http:// mentioned once more within a
"    function GetHTML"
as
         if (strpos($nueva,"http://")=== FALSE) {
            $nueva = "http://" . $urlc . $nueva;
         }
and a bit further down in
"function GetAllHTML"
i find
"        $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; 
         $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; "
 
The last one i probably do not need to change or add anything in terms of being able to also add https websites to the directory, but the second one above may need the same add-in of a https recognition??
Hope someone knows how this probably simple solution??
Link to comment
Share on other sites

Hmm, i just learned something new about the above problem to solve.

The above script sections AUTO-ADD the http in front of any URL submitted.

When i try to manually add-in an https it will instantly switch to http://https://

which makes no sense at all and can not be overwritten, so maybe i should just find a way to stop the script auto-adding http:// at all so it can be either, http or https, right??

Quote

In the above script pieces, what would i have to correct how to ensure i do not cause PHP errors but still stop the pesky auto addition??

 

Link to comment
Share on other sites

if (strcmp("http://", substr($url, 0, 7)) !== 0) {
      $url = "http://" . $url;
}

Look up strcmp and substr and you should be able to tell what this code is doing (if it wasn't already apparent).

Since this code fixes URLs, if you want it to support HTTP and HTTPS then you need to add another condition in there that makes sure it only fixes URLs that don't have http:// and don't have https://.

if (strcmp("http://", substr($url, 0, 7)) !== 0 && strcmp("https://", substr($url, 0, 8)) !== 0) {
      $url = "http://" . $url;
}

Except there's a better way of doing that, and in a way that allow for URLs that aren't in lowercase, though they probably are: strncasecmp.

if (strncasecmp("http://", $url, 7) !== 0 && strncasecmp("https://", $url, 8) !== 0) {

 

Now try fixing GetHTML yourself.

As for the other one, what you posted doesn't involve "http://" anywhere so it doesn't have to be fixed, but it's very likely that whole thing needs to be pulled out and replaced anyways. What's the rest of it?

  • Great Answer 1
Link to comment
Share on other sites

Oh, i did not get a notification of this answer and only find it now.

The notifications were, however, on (green).

 

The entire script is very long, to long to enter here and .php is not allowed to be attached, so i just enter the last section that deals with entering and processing an URL submission text field (before are the usual other text fields, like description, keywords, email address etc.)

"

   function GetHTML ($url, $corto = true, $complet = true) { 

      $url_stuff = parse_url($url); 
      $urlc = $url;
      $url =  $url_stuff['host'];
    
      if (!isset($url_stuff["port"])) {
        $port = 80; 
      } else {
       $port = $url_stuff['port']; 
      }
    
      if (!isset($url_stuff['path'])) {
           $url_stuff['path'] = "/";  
      }
    
      $path = $url_stuff['path'];
    
      $urlc = $url_stuff['host'] .$url_stuff['path'] ;
      
      $fp = @fsockopen ($url, $port, $errno, $errstr, 20);

      if (!$fp) { 
            return FALSE;
      } else { 
        
       stream_set_timeout($fp, 2);    
       //  $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; 
       $header = "GET " . $url_stuff["path"] ; 
       $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; 
       
       fputs ($fp, $header); 
    
       $header = ''; 
       $body = ''; 
       $act = false; 
       $fin = false; 

       while ((!feof($fp))) { 

           $line = fread ($fp,8048);        
           $header .= $line; 

           $body = $body . $line;

           if ($fin) { break; }
            
            if ($corto) { 

                $fin = true; 
                $i = strpos($line, "<body");
                if ($i>0) {
                    $fin = true;
                }
            } 
    
 

       } 
    
       $ret = strpos($header, "Location:", 0); 
    
       if ($ret !== false) { 
         $fin = strpos($header, "\r\n", $ret +9); 
         if(!$fin) {
             $fin = strpos($header, "\r\n", $ret +9); 
         }
         
         $nueva = substr($header, $ret+9, $fin - $ret - 9); 
         $nueva = trim($nueva);
    
         if (strpos($nueva,"http://")=== FALSE) {
            $nueva = "http://" . $urlc . $nueva;
         }
         $body = $this->GetHTML($nueva, $corto, $complet); 
       } 
    
       fclose ($fp); 
     } 
     return $body; 
    } 
    
    function GetAllHTML($url) { 
    
        $url_stuff = parse_url($url); 
        $urlc = $url;
        $url =  $url_stuff['host'];
      
        if (!isset($url_stuff["port"]))
          $port = 80;
        else 
         $port = $url_stuff["port"];
      
        if (!isset($url_stuff["path"])) {   $url_stuff["path"] = "/"; }  
        if (!isset($url_stuff['query'])) { $url_stuff['query'] = ''; }
      
        $path = $url_stuff["path"];
      
        $urlc = $url_stuff['host'] .$url_stuff["path"] ;
        
        $fp = @fsockopen ($url, $port, $errno, $errstr, 10);
        
        if (!$fp) { 
              return FALSE;
        } else { 
   
         $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; 
         $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; 
         
         fputs ($fp, $header); 
 
         $body = ''; 
 
         while ( !feof($fp) ) {

          $line = fread ($fp, 4096);
          $body .= $line;
          if (strpos($body, '</html')) {break;}
          
         }        
         
         fclose ($fp); 
       } 
      
       return $body; 
    }     
    
  }
  
  

  
?>

"

So, i think the idea of populating the URL form field with either http:// or and https:// is not the way to go, but rather just NOT populate it with anything instead so the submitter can just copy and paste the complete URL, no matter if http or https, right??

How could i do this without breaking the script??

Link to comment
Share on other sites

Inside a web directory script (Free version 3.0 of qlWeb Directory Script that no longer exists online otherwise) folders named

php4_classes

and

php5_classes

the file itself (it's only in this file of all the PHP scripts of the dir script) is called

sites.class

This is the complete line of script pertaining to this issue, located at the very end of the 2 fles (one for each PHP version):
    function GetHTML ($url, $corto = true, $complet = true) { 

      $url_stuff = parse_url($url); 
      $urlc = $url;
      $url =  $url_stuff['host'];
    
      if (!isset($url_stuff["port"])) {
        $port = 80; 
      } else {
       $port = $url_stuff['port']; 
      }
    
      if (!isset($url_stuff['path'])) {
           $url_stuff['path'] = "/";  
      }
    
      $path = $url_stuff['path'];
    
      $urlc = $url_stuff['host'] .$url_stuff['path'] ;
      
      $fp = @fsockopen ($url, $port, $errno, $errstr, 20);

      if (!$fp) { 
            return FALSE;
      } else { 
        
       stream_set_timeout($fp, 2);    
       //  $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; 
       $header = "GET " . $url_stuff["path"] ; 
       $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; 
       
       fputs ($fp, $header); 
    
       $header = ''; 
       $body = ''; 
       $act = false; 
       $fin = false; 

       while ((!feof($fp))) { 

           $line = fread ($fp,8048);        
           $header .= $line; 

           $body = $body . $line;

           if ($fin) { break; }
            
            if ($corto) { 

                $fin = true; 
                $i = strpos($line, "<body");
                if ($i>0) {
                    $fin = true;
                }
            } 
    
 

       } 
    
       $ret = strpos($header, "Location:", 0); 
    
       if ($ret !== false) { 
         $fin = strpos($header, "\r\n", $ret +9); 
         if(!$fin) {
             $fin = strpos($header, "\r\n", $ret +9); 
         }
         
         $nueva = substr($header, $ret+9, $fin - $ret - 9); 
         $nueva = trim($nueva);
    
         if (strpos($nueva,"http://")=== FALSE) {
            $nueva = "http://" . $urlc . $nueva;
         }
         $body = $this->GetHTML($nueva, $corto, $complet); 
       } 
    
       fclose ($fp); 
     } 
     return $body; 
    } 
    
    function GetAllHTML($url) { 
    
        $url_stuff = parse_url($url); 
        $urlc = $url;
        $url =  $url_stuff['host'];
      
        if (!isset($url_stuff["port"]))
          $port = 80;
        else 
         $port = $url_stuff["port"];
      
        if (!isset($url_stuff["path"])) {   $url_stuff["path"] = "/"; }  
        if (!isset($url_stuff['query'])) { $url_stuff['query'] = ''; }
      
        $path = $url_stuff["path"];
      
        $urlc = $url_stuff['host'] .$url_stuff["path"] ;
        
        $fp = @fsockopen ($url, $port, $errno, $errstr, 10);
        
        if (!$fp) { 
              return FALSE;
        } else { 
   
         $header = "GET " . $url_stuff['path'] . "?" . $url_stuff['query'] ; 
         $header = $header . " HTTP/1.1\r\nHost: " . $url_stuff['host'] . "\r\n\r\n"; 
         
         fputs ($fp, $header); 
 
         $body = ''; 
 
         while ( !feof($fp) ) {

          $line = fread ($fp, 4096);
          $body .= $line;
          if (strpos($body, '</html')) {break;}
          
         }        
         
         fclose ($fp); 
       } 
      
       return $body; 
    }     
    
  }

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.