codecreative Posted September 28, 2013 Share Posted September 28, 2013 Hi everyone I have had property blm parser developed. For those that do not know blm is an xml file type used to share property data. In an ideal world an estate agents using software uploads the blm file and imagery to property portals such as rightmove.co.uk and zoopla.com and also to our web server. My website has a php plugin developed that parsers and imports this data. Now once an upload is complete the files should site in the "datatobeparsed" folder and wait until a cron scheduled task initiates the parser to begin working. Once complete images are moved to the "images" folder and the blm file is moved to the "archived" folder. And finally as a cleanup any remainder files from the datatobeparsed folder are unlinked, that is files such as pdf/docs. The web hosts I use implement a 30 second rule where a response must be sent to them at least every 30 seconds to prevent a time out. No problem my developer did all this. The web hosts do this to prevent any kind of permanent process going on which is fine. Okay to begin with testing was fine, after an upload I'd see the files and initiate cron, full feedback via log sent in an email output would be received and all went well. The ftp account being used by the client is restricted to just there static ip and to just have access to the datatobeparsed folder. No other folder. The problem Okay so this is hard for me to get my head around. I have raised numerous support tickets with my hosts who are not sure how this is going on and my developer says it is not the script. I have now lost contact with my developer. After the client performs an upload, I see no blm file in the datatobeparsed folder, just some images and pdf/doc files. I browse into the archived folder and I see a new blm file, and the timestamp shows it was created at 9am GMT this morning (it is currently 10:42am GMT). My cron doesn't work until 10pm GMT, I rang up my client about 10 minutes ago and they confirmed they did an upload this morning. My web hosts have also confirmed no cron is set to run in the morning just the one I have via my scheduled tasks tab. Also when I look at the datatobeparsed folder not all the images are there, some have also been processed and moved to the images folder as confirmed via the time stamps of the files in the images folder. So to me it looks like the process had begun, and timed out. How can this be intelligent enough to self execute? Because it appears as soon as an upload is complete the script is initiated and parses but not very well and times out. As a temporary fix I'm considering inspecting the code and trying to edit it myself so that it doesn't attempt to move the images to the images folder as this is where most off the time and processing is taken. This won't give me any insight as to why it is doing this but if it prevents the time outs it would be a temporary fix. The knock on effect is when I go on the front end of the site it has properties that have no imagery. I feel I have been ripped off totally because I paid for this work via a free lancer site and now the developer hasn't communicated in a week and I have a very angry customer. I have found most of the processing goes on in a file called "Adapter" located in the plugin/libaries folder as this is a wordpress plugin. But the part that cut and pastes the images to the destination images folder isn't in it. Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted September 28, 2013 Share Posted September 28, 2013 your php script that is being executed via the cron job is probably being directly requested by a browser/search-engine-spider/bot-script. your server's web access log should show the who, what, when, and where it is being requested. you would want/need to put such a processing script in a location where it cannot be requested via any external source. Quote Link to comment Share on other sites More sharing options...
Irate Posted September 28, 2013 Share Posted September 28, 2013 You could also set up a 403 Forbidden Error with your .htaccess or httpd.conf if you have access to those. Quote Link to comment Share on other sites More sharing options...
codecreative Posted September 29, 2013 Author Share Posted September 29, 2013 Ah. At last this sounds like a viable explanation as to what is going on. This has caused days of problems. Ok I downloaded the server log and performed a search for the file cron-index.php and found no results. I'm not sure what else to search for I tried searching for .blm but that came up with no results In my scheduled tasks cron job I have the following line, this is the run command /home/sites/mywebsitedomain.co.uk/public_html/cron-index.php I also have access to .htaccess would I add a 403 rule to this file in the ht access to prevent any external source? Quote Link to comment Share on other sites More sharing options...
codecreative Posted September 29, 2013 Author Share Posted September 29, 2013 I think I would like to apply .htaccess to deny all external sources from its plugin folder. I just inspected the plugin folder for the plugin akismet and sure enough it did have a ht access file contained within that has the following, I'm guessing something similar is needed for my plugin folder? Order Deny,Allow Deny from all <FilesMatch "^akismet\.(css|js)$"> Allow from all </FilesMatch> Quote Link to comment Share on other sites More sharing options...
codecreative Posted September 29, 2013 Author Share Posted September 29, 2013 As a test I have uploaded all the files myself to the directory. Just to rule out any possible factor this is to do with the clients software. Though I can't see how as they have an ftp account thats home directory is the upload directory so I can't see how they can have access to any other file/folder Quote Link to comment Share on other sites More sharing options...
codecreative Posted September 29, 2013 Author Share Posted September 29, 2013 Thanks for all the help by the way it means a lot to me to solve this puzzle Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted September 29, 2013 Share Posted September 29, 2013 some portion of the web site code may be including/requiring the cron-index.php file. perhaps whomever coded this thought they would do the cron-index.php processing any time the web page is requested and there are pending ftp files to be processed. Quote Link to comment Share on other sites More sharing options...
codecreative Posted September 29, 2013 Author Share Posted September 29, 2013 Hmm I'm just wondering. How ever the script is called/initiated the expect outcome should be the same in any case shouldn't it? Quote Link to comment Share on other sites More sharing options...
codecreative Posted September 29, 2013 Author Share Posted September 29, 2013 Hmm this makes sense though. Wp cron I read can be initiated from page loads on the site. And to me it looks like the script may be possibly timing out. This I'm thinking may be due to the overhead of moving images into a subdirectory /images/ and I feel this to be the case because once I manually move the blm file back from the archived folder to the datatobeparsed upload folder, it process it fine as it picks up from where it left off. With over half the images already processed. I'm happy for them not be moved and therefore stop time outs. Time outs are occurring since I am on a shared hosting account and not on a vps. I'm not familiar with the anatomy of a wordpress plugin but I can see all the code of the values being assigned is in the php file, Adapter which lives inside wp-content/plugins/libaries/adapter.php Any ideas where the code for the movement of images would be stored? Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted September 29, 2013 Share Posted September 29, 2013 if the script is in the middle of processing the files and it is called again, it could leave some of the files behind/change only some of the database records... also, if the script takes more than a few seconds to run each time, it doing something wrong. short-answer to all of this, without the full code, hard to do more than make guesses. Quote Link to comment Share on other sites More sharing options...
codecreative Posted September 29, 2013 Author Share Posted September 29, 2013 no this script takes more than a few seconds its importing 70 plus properties at a time, and each property can typically have 5 images, thats a lot of copying pasting, assigning values, loops etc Takes more then a few seconds. Ok is there a way to remove my script being linked to wp-cron? As I can just call the script direct I'm guessing from my scheduled tasks tab in my hosting c panel. I believe wp-cron is for people who don't have cron? Quote Link to comment Share on other sites More sharing options...
codecreative Posted September 29, 2013 Author Share Posted September 29, 2013 or is it possible to stop my website intiting wp-cron calls? a simple option so only my scheduled tasks calls it Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted September 29, 2013 Share Posted September 29, 2013 Not sue if this article may help you http://tommcfarlin.com/wordpress-cron-jobs/ Quote Link to comment Share on other sites More sharing options...
Irate Posted September 29, 2013 Share Posted September 29, 2013 To get back to my suggestion, Stack Overflow has this small but handy topic about setting 403 errors for certain directories. http://stackoverflow.com/questions/11321998/htaccess-403-forbidden-exclusions Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted September 29, 2013 Share Posted September 29, 2013 @Irate, the OP checked the web server access log and it looks like the file isn't being requested directly. Quote Link to comment Share on other sites More sharing options...
codecreative Posted September 30, 2013 Author Share Posted September 30, 2013 Thanks for the replies and thanks ch0cu3r for the excellent article. Mac_gyver it makes sense to me the script gets recalled before being given a chance to complete. I read there is a locking mechanism for wp cron now so I'm going to see if setting that, helps fix the problem. Out of interest where are the list of functions that get called by wp cron listed? Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted September 30, 2013 Share Posted September 30, 2013 (edited) Out of interest where are the list of functions that get called by wp cron listed? http://codex.wordpress.org/Category:WP-Cron_Functions Edited September 30, 2013 by Ch0cu3r Quote Link to comment Share on other sites More sharing options...
codecreative Posted October 2, 2013 Author Share Posted October 2, 2013 Thanks for all your help Guys Today its strange I logged on and saw the images for lettings properites all to be duplicated. I feel this parser is 95% complete it just has a few bugs. I hope this isn't against any forum rules but do any of the guru's on here ever take on free lancer work? Quote Link to comment Share on other sites More sharing options...
codecreative Posted October 2, 2013 Author Share Posted October 2, 2013 Quick question If the wp cron has been set in wp-config to have a lock time out like so define('WP_CRON_LOCK_TIMEOUT', 900); What happens if once every 24 hours there is a direct call to the script wp-cron. Will the cron scheduled tasks call to the wp-cron still obey the lock time out rule? If not I will probably just disable this scheduled task call as I never want a scenario occuring when a script is re called halfway through a parse Thanks Guys Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.