Jump to content

delayedinsanity

Members
  • Posts

    25
  • Joined

  • Last visited

    Never

Contact Methods

  • Website URL
    http://mark.watero.us/

Profile Information

  • Gender
    Male
  • Location
    Vegas

delayedinsanity's Achievements

Newbie

Newbie (1/5)

0

Reputation

  1. It's much harder on the server, agreed. The worry is having one process running for 30-60 minutes; I think there's more potential for that to hang or hit a snag and spiral out of control eating memory. Then again, with 100 processes being fired off nearly simultaneously, each should be done in a fraction of the time, but there is potential for 100 processes to hang. Augh. Okay. I'm going to test the multiple processes version on a sandbox server to see if I can hang it.
  2. That's what I'm doing, the program doesn't grab the information on the fly. An import is scheduled via cron, and fired off once or twice a day depending on the needs of the particular site. That import grabs the data from an XML feed, turns it into a page and then retrieves the images. It's this image retrieval that's causing the script to run for an extremely long period of time. With the image import disabled, it takes on average 30 seconds for the script to import all the data from the remote XML feed. That's about 5-700 pages of information, adding anywhere from 1-18 seconds per page to grab the images. If we average 4 seconds (low-balling) per page due to the image import, we're looking at a half hour execution time on the script. I'd rather not run a single process for that length of time. If I make the script sleep and restart every minute, I'm actually extending that time, but I'm starting a new process every minute so it can't eat a ridiculous amount of memory. If I fire off a separate request for each page to grab the images, the whole process may complete in far less time, but it would require 5-700 php scripts being fired in the space of 30 seconds... I'm leaning this direction because it's not much different than a site that has 1000-1400 visitors per minute, which can easily be handled by a properly configured server. If it's scheduled during the lowest traffic time of the day, it will have the least competition and there should be no visible affect to regular visitors on the front end. Unless of course there's a better solution! I googled for about an hour on this earlier, and while I'm quite positive I'm not the only person who's had to pull in a large amount of remote files, there doesn't seem to be too many people interested in writing about the methodology they employed to do so.
  3. I'm hoping to get a little feedback on what you all believe is the best way to handle this efficiently in PHP. I am working on a script that imports a large amount of data from remote feeds; this facilitates the quick deployment of real estate web sites, but has to download a large number of images to each new site. Assuming for right now that the bottleneck isn't in the method (fsock vs curl vs...) and that for each imported listing we're spending between .89439 and 17.0601 seconds on the image import process alone... what would you suggest for handling this over the space of 100-1000 occurrences? As of right now I have two ideas in mind, both fairly rudimentary in nature. The first idea is to shut the script down every 30-45 seconds, sleep for a second and fire off another asynchronous request to start the script again. The second idea is to fire off a new asynchronous to run the image imports separate from the main script. This would let the efficient ones clear out rather quickly while the slower imports would have their own process to run in. The only thing that worries me about this is the fact that 100 of these could be fired off every second. Even assuming half of them complete before the next round are fired off, they would still pile up.
  4. "It seems like I'm running over the two arrays far too many times, and that I should be able to do it in one pass somehow." It works; There are no errors that I'm trying to work out. I just thought I would pick everybody's brain to see if it was possible to do it more efficiently.
  5. Bump from page 3... I fixed the formatting (tabs vs spaces, oops) and repasted: http://pastebin.com/yD7u8bZ4 I know it's something simple I'm missing. A change in the function itself, somewhere, is going to reduce the number of times I have to run it from 3 to 1.
  6. "Cannot find search daemon". I tried. I've been wracking my brain for the last two hours trying to simplify this. It seems like I'm running over the two arrays far too many times, and that I should be able to do it in one pass somehow. I have two associative arrays, the first contains a set of older options (previous version), the second is a default set of new options. I want to compare the two arrays, and strip out any old options that aren't in the new set. I want to add options from the new set that aren't already in the old, and I don't want to overwrite and of the old options that already have values. Here's a copy of the sandbox I've been working it out in; it works, but it does four separate operations on the arrays to achieve the result: http://pastebin.com/erbFujkG
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.