Jump to content

[SOLVED] file_get_contents does not work with XML files?


Recommended Posts

I'm trying to write an app that grabs an XML sitemap from a domain, but get_file_contents is coming up empty. This code snippet works when I feed it a url like http://www.after5webdesign.com, but not when the url is http://www.timmyth.com.

<?php
if (empty($_GET['subdomain']) || empty($_GET['domain']) || empty($_GET['tld'])) {
    header("HTTP/1.0 404 Not Found");
    die;
} else {
    $subdomain = ereg_replace('[^a-zA-Z0-9-]', '', $_GET['subdomain']);
    $domain = ereg_replace('[^a-zA-Z0-9-]', '', $_GET['domain']);
    $tld = ereg_replace('[^a-zA-Z0-9-]', '', $_GET['tld']);
}

$mySiteMap = "http://".$subdomain.".".$domain.".".$tld."/google-sitemap.cfm";
\\$mySiteMap = "http://www.after5webdesign.com";

$contents = file_get_contents($mySiteMap);

echo "<pre>".htmlspecialchars($contents)."</pre>";
?>

The two sites are hosted on different servers. Sfter 5 is hosted with 1&1, and TimMyth.com is hosted at WhyPark. Given that one URL works, I don't think the issue is my code or my WAMP settings. This is all that timmyth.com outputs:

http://www.timmyth.com/google-sitemap.cfm<pre></pre>

Any ideas why one URL works while the other does not?

Which server would be blocking it? I can view both urls from my PC which is also where I have WAMP installed and am running this script from. If it's the WhyPark server blocking this, is there a way around it? I know I've tried fopen, and curl, but I had the same results.

I'm not sure but the only headers I'm getting from a raw request are..

 

HTTP/1.1 200 OK
Connection: close
Date: Fri, 21 Nov 2008 20:34:08 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Set-Cookie: CFID=somenumber;expires=Sun, 14-Nov-2038 20:34:08 GMT;path=/
Set-Cookie: CFTOKEN=somenumber;expires=Sun, 14-Nov-2038 20:34:08 GMT;path=/
Content-Type: text/html; charset=UTF-8

 

Once I specify a user agent I am able to get the content, so check to see if there's a reason your script dies without sending info on a blank user-agent request. It's likely a server problem but I know nothing about IIS so I can't help you there.

 

Do be sure to check your script though... see if it checks the user-agent without making sure it's set, because IIS is sending an "HTTP/1.1 200 OK" header, I wouldn't think it would do that if it really was a server setting.

 

Do this if you really want it to work and you don't care about fixing the problem

 

<?php
$h = curl_init();
curl_setopt($h, CURLOPT_URL, "http://www.timmyth.com/");
curl_setopt($h, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_exec($h);
curl_close($h);
?>

 

to be honest, you can use any user agent, even "q" works, as long as it is not blank.

Excellent discovery guys. Thanks very much. I do not have the authority to monkey with the server settings, so I'll have to specify the useragent which is actually a better solution for my purposes. Both methods work by the way, proving that there's more than one way to skin a cat.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.