sharpxs Posted August 28, 2012 Share Posted August 28, 2012 Hey there, I am about to publish my PHP website and just had a friend run a crawler over my staging server. The crawler returned many odd links where folders were duplicated multiple times. My folder structure looks like this - ~/index.htm ~/service/<several service-related html files> ~/img/*.png ~/hardware/<several hardware-related html files> And the crawler returned virtually hundreds of links (for both source and destination URL) that look like this: http://www.mydomain.com/service/service/img/img/img/img/hardware/file.htm http://www.mydomain.com/service/service/service/img/img/impressum.htm http://www.mydomain.com/service/service/service/img/img/img/img/img/img/logos_small/logo.png (This last link, which is an image file, is referring to facebook.com; at least that's what the crawler returns. None of the logos I have linked are supposed to refer to facebook ) Oddly, these links all work (crawler says status code 200)!! I can paste them in the address bar of my browser and the file actually shows up! It just doesn't load the css. Does anybody have an idea what might cause this odd behaviour? I have never seen this before:-/. Thanks much in advance. Cheers, Lars Quote Link to comment https://forums.phpfreaks.com/topic/267688-crawler-returning-odd-links/ Share on other sites More sharing options...
requinix Posted August 28, 2012 Share Posted August 28, 2012 What framework does the site run on? Did you make it yourself? Can you find out where the spider is getting those URLs from (ie, the referring page)? Quote Link to comment https://forums.phpfreaks.com/topic/267688-crawler-returning-odd-links/#findComment-1373130 Share on other sites More sharing options...
sharpxs Posted August 28, 2012 Author Share Posted August 28, 2012 Hey there, I built it myself, no framework used. The spider lists those odd links as both source and destination. To date, the site isn't linked anywhere. They're all internal references. Interestingly, I just ran a spider myself (probably another one than what was used before) and no odd links found :-/. Let me check what software my friend was using. Cheers, Lars Quote Link to comment https://forums.phpfreaks.com/topic/267688-crawler-returning-odd-links/#findComment-1373140 Share on other sites More sharing options...
requinix Posted August 28, 2012 Share Posted August 28, 2012 If it's your framework... well, even if it wasn't... then debug through it as it tries to serve one of those URLs. Figure out why it thinks they're valid and fix it. Quote Link to comment https://forums.phpfreaks.com/topic/267688-crawler-returning-odd-links/#findComment-1373289 Share on other sites More sharing options...
sharpxs Posted August 28, 2012 Author Share Posted August 28, 2012 Not sure I'm getting what you mean. However, the strange thing is that my debugger doesn't find any issues. Then again, I ran the spider on my localhost (exact same copy of the website) and didn't find any such issues at all. Totally weird. Seems that this link checker (it's called Xenu) gets into some kind of loop - I am redesigning an existing website; currently, it's published to staging; most of the existing news articles (replicated to staging) contain references to the live site (the one in old design). Somehow that spider gets to the live site and goes back to staging, which doesn't make any sense, because the staging isn't linked on live:(. Am still investigating, running tests in isolation. I'll let you know when I've figured out what's going on... Quote Link to comment https://forums.phpfreaks.com/topic/267688-crawler-returning-odd-links/#findComment-1373293 Share on other sites More sharing options...
Christian F. Posted August 28, 2012 Share Posted August 28, 2012 Sounds like the issue is more with your production server, than with your code. Especially since you cannot replicate the issue locally. I'd have a look at the server, if I were you. Chances are you'll find the reason for this issue there. Quote Link to comment https://forums.phpfreaks.com/topic/267688-crawler-returning-odd-links/#findComment-1373325 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.