The Letter E Posted January 31, 2011 Share Posted January 31, 2011 I'm working on a fix for cURL that replaces all relative urls with absolute. Here's the code: <?php //Get web address //FORMAT: url=site.com, url=site.net, url=site.org $page = $_GET['url']; //Format web address $http = 'http:\/\/'; $www = 'www.'; if(preg_match('/'.$http.'/', $page)){preg_replace('/'.$http.'/', '', $page);} if(preg_match('/'.$www.'/', $page)){preg_replace('/'.$www.'/', '', $page);} $page = rtrim($page, '/'); $page = 'http://www.'.$page.'/'; //cURL // create curl resource $ch = curl_init(); // set url curl_setopt($ch, CURLOPT_URL, $page); //return the transfer as a string curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // $output contains the output string $output = curl_exec($ch); // close curl resource to free up system resources curl_close($ch); //Convert relative URL to absolute $output = preg_replace('/src="/', 'src="'.$page, $output); $output = preg_replace('/href="/', 'href="'.$page, $output); $output = preg_replace('/action="/', 'action="'.$page, $output); echo $output; ?> As you can see it's pretty basic. In many cases it fixes broken styles, links, images and form actions. I am looking for any ideas as to how I can add some more intelligence to this script. 1. What else should it do 2. Where is it not doing it's job 3. Can it do what it's already doing better Any input offered is much appreciated. I'm not looking for someone to write code, but if you are intrigued and want to add a snippet to it, that's cool! Feel free to keep a copy of your own if you like the idea to build off of. Thanks Peeps, E Quote Link to comment https://forums.phpfreaks.com/topic/226296-curl-with-link-fixer-input-needed/ Share on other sites More sharing options...
QuickOldCar Posted February 1, 2011 Share Posted February 1, 2011 Can do a check if url not equal to substr x amount.... http://, ftp://,https://,feed:// then add the http:// to front if not. With or without the www. won't matter as curl will resolve those to where need to be, even though some sites don't work with or without , but hey what gonna do. After curl resolves it, use parse php function and just lowercase the domain name part. I also lowercase the protocols area, who knows why people like to mess with capitalizing this stuff. Some sites require an end slash while others can not. I found it's best not to add the end slash, people that usually have the slashes...their links can resolve to with or without the end slash. So after al my babble here's the code I use to fix links for inclusion to my site. For me I make everything a http instead of https or ftp, because as I post them I chop off the protocols anyway, and when someone clicks the url in browser...they go to where need to be anyway. So maybe you can use some of my code, can add the pot number back in to the complete url as well, i eliminate any ports as can see in my code. I just didn't want those. $trimurl = trim($_GET['domainname']); $trimurl = substr($trimurl, 0,300); if ($_GET['domainname'] == "" OR $_GET['domainname'] == "http://"){ echo "<h1>Please Insert a Url</h1><br />"; } else { $input_parse_url=mysql_real_escape_string($trimurl); /*check for valid urls*/ if ((substr($input_parse_url, 0, == "https://") OR (substr($input_parse_url, 0, 12) == "https://www.") OR (substr($input_parse_url, 0, 7) == "http://") OR (substr($input_parse_url, 0, 11) == "http://www.") OR (substr($input_parse_url, 0, 4) == "www.") OR (substr($input_parse_url, 0, 6) == "ftp://") OR (substr($input_parse_url, 0, 11) == "feed://www.") OR (substr($input_parse_url, 0, 7) == "feed://")) { $new_parse_url = $input_parse_url; } else { /*replace uppercase or unsupported to normal*/ $url_input .= str_replace(array('feed://www.','feed://','HTTP://','HTTP://www.','HTTP://WWW.','http://WWW.','HTTPS://','HTTPS://www.','HTTPS://WWW.','https://WWW.'), '', $input_parse_url); $new_parse_url = "http://$url_input"; } echo "$input_parse_url<br />"; /*parse the complete url to lowercase just the site domain area*/ function getparsedHost($new_parse_url) { $parsedUrl = parse_url(trim($new_parse_url)); return trim($parsedUrl[host] ? $parsedUrl[host] : array_shift(explode('/', $parsedUrl[path], 2))); } $get_parse_url = parse_url($new_parse_url, PHP_URL_HOST); $host_parse_url .= str_replace(array('Www.','WWW.'), '', $get_parse_url); $host_parse_url = strtolower($host_parse_url); $port_parse_url = parse_url($new_parse_url, PHP_URL_PORT); $user_parse_url = parse_url($new_parse_url, PHP_URL_USER); $pass_parse_url = parse_url($new_parse_url, PHP_URL_PASS); $get_path_parse_url = parse_url($new_parse_url, PHP_URL_PATH); $path_parse_url .= str_replace(array('Www.','WWW.'), '', $get_path_parse_url); $query_add_parse_url = parse_url($new_parse_url, PHP_URL_QUERY); $query_add_parse_url = "?$query_add_parse_url"; $query_add_parse_url = rtrim($query_add_parse_url, '#'); $fragment_parse_url = parse_url($new_parse_url, PHP_URL_FRAGMENT); $fragment_parse_url = "#$fragment_parse_url"; $hostpath_url = "$host_parse_url$path_parse_url"; $hostpath_url = rtrim($hostpath_url, '?'); $query_add_parse_url = rtrim($query_add_parse_url, '?'); $hostpathquery_url = "$host_parse_url$path_parse_url$query_add_parse_url"; $complete_url = "$host_parse_url$user_parse_url$pass_parse_url$path_parse_url$query_add_parse_url$fragment_parse_url"; $complete_url = rtrim($complete_url, '#'); Quote Link to comment https://forums.phpfreaks.com/topic/226296-curl-with-link-fixer-input-needed/#findComment-1168117 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.