Jump to content

[SOLVED] file_get_contents does not work with XML files?


Tim_Myth

Recommended Posts

I'm trying to write an app that grabs an XML sitemap from a domain, but get_file_contents is coming up empty. This code snippet works when I feed it a url like http://www.after5webdesign.com, but not when the url is http://www.timmyth.com.

<?php
if (empty($_GET['subdomain']) || empty($_GET['domain']) || empty($_GET['tld'])) {
    header("HTTP/1.0 404 Not Found");
    die;
} else {
    $subdomain = ereg_replace('[^a-zA-Z0-9-]', '', $_GET['subdomain']);
    $domain = ereg_replace('[^a-zA-Z0-9-]', '', $_GET['domain']);
    $tld = ereg_replace('[^a-zA-Z0-9-]', '', $_GET['tld']);
}

$mySiteMap = "http://".$subdomain.".".$domain.".".$tld."/google-sitemap.cfm";
\\$mySiteMap = "http://www.after5webdesign.com";

$contents = file_get_contents($mySiteMap);

echo "<pre>".htmlspecialchars($contents)."</pre>";
?>

The two sites are hosted on different servers. Sfter 5 is hosted with 1&1, and TimMyth.com is hosted at WhyPark. Given that one URL works, I don't think the issue is my code or my WAMP settings. This is all that timmyth.com outputs:

http://www.timmyth.com/google-sitemap.cfm<pre></pre>

Any ideas why one URL works while the other does not?

Which server would be blocking it? I can view both urls from my PC which is also where I have WAMP installed and am running this script from. If it's the WhyPark server blocking this, is there a way around it? I know I've tried fopen, and curl, but I had the same results.

I'm not sure but the only headers I'm getting from a raw request are..

 

HTTP/1.1 200 OK
Connection: close
Date: Fri, 21 Nov 2008 20:34:08 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Set-Cookie: CFID=somenumber;expires=Sun, 14-Nov-2038 20:34:08 GMT;path=/
Set-Cookie: CFTOKEN=somenumber;expires=Sun, 14-Nov-2038 20:34:08 GMT;path=/
Content-Type: text/html; charset=UTF-8

 

Once I specify a user agent I am able to get the content, so check to see if there's a reason your script dies without sending info on a blank user-agent request. It's likely a server problem but I know nothing about IIS so I can't help you there.

 

Do be sure to check your script though... see if it checks the user-agent without making sure it's set, because IIS is sending an "HTTP/1.1 200 OK" header, I wouldn't think it would do that if it really was a server setting.

 

Do this if you really want it to work and you don't care about fixing the problem

 

<?php
$h = curl_init();
curl_setopt($h, CURLOPT_URL, "http://www.timmyth.com/");
curl_setopt($h, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_exec($h);
curl_close($h);
?>

 

to be honest, you can use any user agent, even "q" works, as long as it is not blank.

Excellent discovery guys. Thanks very much. I do not have the authority to monkey with the server settings, so I'll have to specify the useragent which is actually a better solution for my purposes. Both methods work by the way, proving that there's more than one way to skin a cat.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.