Jump to content

Someone please advise/help. Unexplainable behaviour


codecreative

Recommended Posts

Hi everyone

 

I have had property blm parser developed. For those that do not know blm is an xml file type used to share property data.

 

In an ideal world an estate agents using software uploads the blm file and imagery to property portals such as rightmove.co.uk and zoopla.com and also to our web server. My website has a php plugin developed that parsers and imports this data.

 

Now once an upload is complete the files should site in the "datatobeparsed" folder and wait until a cron scheduled task initiates the parser to begin working. Once complete images are moved to the "images" folder and the blm file is moved to the "archived" folder. And finally as a cleanup any remainder files from the datatobeparsed folder are unlinked, that is files such as pdf/docs. The web hosts I use implement a 30 second rule where a response must be sent to them at least every 30 seconds to prevent a time out. No problem my developer did all this. The web hosts do this to prevent any kind of permanent process going on which is fine.

 

Okay to begin with testing was fine, after an upload I'd see the files and initiate cron, full feedback via log sent in an email output would be received and all went well.

 

The ftp account being used by the client is restricted to just there static ip and to just have access to the datatobeparsed folder. No other folder.

 

The problem

Okay so this is hard for me to get my head around. I have raised numerous support tickets with my hosts who are not sure how this is going on and my developer says it is not the script. I have now lost contact with my developer. After the client performs an upload, I see no blm file in the datatobeparsed folder, just some images and pdf/doc files. I browse into the archived folder and I see a new blm file, and the timestamp shows it was created at 9am GMT this morning (it is currently 10:42am GMT). My cron doesn't work until 10pm GMT, I rang up my client about 10 minutes ago and they confirmed they did an upload this morning. My web hosts have also confirmed no cron is set to run in the morning just the one I have via my scheduled tasks tab.

 

Also when I look at the datatobeparsed folder not all the images are there, some have also been processed and moved to the images folder as confirmed via the time stamps of the files in the images folder. So to me it looks like the process had begun, and timed out. How can this be intelligent enough to self execute? Because it appears as soon as an upload is complete the script is initiated and parses but not very well and times out.

 

As a temporary fix I'm considering inspecting the code and trying to edit it myself so that it doesn't attempt to move the images to the images folder as this is where most off the time and processing is taken. This won't give me any insight as to why it is doing this but if it prevents the time outs it would be a temporary fix. The knock on effect is when I go on the front end of the site it has properties that have no imagery. 

 

I feel I have been ripped off totally because I paid for this work via a free lancer site and now the developer hasn't communicated in a week and I have a very angry customer.

 

I have found most of the processing goes on in a file called "Adapter" located in the plugin/libaries folder as this is a wordpress plugin. But the part that cut and pastes the images to the destination images folder isn't in it.

Link to comment
Share on other sites

your php script that is being executed via the cron job is probably being directly requested by a browser/search-engine-spider/bot-script. your server's web access log should show the who, what, when, and where it is being requested.

 

you would want/need to put such a processing script in a location where it cannot be requested via any external source.

Link to comment
Share on other sites

Ah. At last this sounds like a viable explanation as to what is going on. This has caused days of problems. Ok I downloaded the server log and performed a search for the file cron-index.php and found no results. I'm not sure what else to search for I tried searching for .blm but that came up with no results

 

In my scheduled tasks cron job I have the following line, this is the run command

 

/home/sites/mywebsitedomain.co.uk/public_html/cron-index.php

 

I also have access to .htaccess would I add a 403 rule to this file in the ht access to prevent any external source?

Link to comment
Share on other sites

I think I would like to apply .htaccess to deny all external sources from its plugin folder. I just inspected the plugin folder for the plugin akismet and sure enough it did have a ht access file contained within that has the following, I'm guessing something similar is needed for my plugin folder?

Order Deny,Allow
Deny from all

<FilesMatch "^akismet\.(css|js)$">
Allow from all
</FilesMatch>
Link to comment
Share on other sites

Hmm this makes sense though. Wp cron I read can be initiated from page loads on the site. And to me it looks like the script may be possibly timing out. This I'm thinking may be due to the overhead of moving images into a subdirectory /images/ and I feel this to be the case because once I manually move the blm file back from the archived folder to the datatobeparsed upload folder, it process it fine as it picks up from where it left off. With over half the images already processed.

 

I'm happy for them not be moved and therefore stop time outs. Time outs are occurring since I am on a shared hosting account and not on a vps.

 

I'm not familiar with the anatomy of a wordpress plugin but I can see all the code of the values being assigned is in the php file, Adapter which lives inside wp-content/plugins/libaries/adapter.php

 

Any ideas where the code for the movement of images would be stored?

Link to comment
Share on other sites

if the script is in the middle of processing the files and it is called again, it could leave some of the files behind/change only some of the database records... also, if the script takes more than a few seconds to run each time, it doing something wrong.

 

short-answer to all of this, without the full code, hard to do more than make guesses.

Link to comment
Share on other sites

no this script takes more than a few seconds its importing 70 plus properties at a time, and each property can typically have 5 images, thats a lot of copying pasting, assigning values, loops etc

 

Takes more then a few seconds.

 

Ok is there a way to remove my script being linked to wp-cron? As I can just call the script direct I'm guessing from my scheduled tasks tab in my hosting c panel.

 

I believe wp-cron is for people who don't have cron?

Link to comment
Share on other sites

Thanks for the replies and thanks ch0cu3r for the excellent article.

 

Mac_gyver it makes sense to me the script gets recalled before being given a chance to complete. I read there is a locking mechanism for wp cron now so I'm going to see if setting that, helps fix the problem.

 

Out of interest where are the list of functions that get called by wp cron listed?

Link to comment
Share on other sites

Thanks for all your help Guys

 

Today its strange I logged on and saw the images for lettings properites all to be duplicated. I feel this parser is 95% complete it just has a few bugs. I hope this isn't against any forum rules but do any of the guru's on here ever take on free lancer work?

Link to comment
Share on other sites

Quick question

 

If the wp cron has been set in wp-config to have a lock time out like so

define('WP_CRON_LOCK_TIMEOUT', 900);

 

What happens if once every 24 hours there is a direct call to the script wp-cron. Will the cron scheduled tasks call to the wp-cron still obey the lock time out rule? If not I will probably just disable this scheduled task call as I never want a scenario occuring when a script is re called halfway through a parse

 

Thanks Guys

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.