Jump to content

[SOLVED] merge files


drifter

Recommended Posts

OK I have 2 files that are | delimited - one is like 26MB and holdds image names, the other is 48MB and holds records.

Each record has an record/code that coresponds to a line in the image names file... (each is about 40,000 lines and the do NOT match 1 to 1)

From the records file
data|1234|more|data|other|stuff...........................

from the image file
1234|image|names|here

So I currently start by looping through the image file- explode each line and write to an image array
$photoarray[$id]['pic1']=$lineelement[5];
$photoarray[$id]['pic2']=$lineelement[7];

So I get this whole 26 MB file in an array

then loop through the other file and match it with the right element in the array....

Now the problem - this is killing my memory - running 50-60% on this script - any bump in traffic and things start backing up and compounding and crash.

I am doing unset on everything that I use - ever photo element that is already used is unset just to save memory...

So are there any other ways of doing this?

Are there any good ways to scan the image file rather then saving it in an array?
Link to comment
Share on other sites

Just as a note; I use this in there to slow things down when they get busy, but sleep only frees up CPU not memory

[code]
$load = sys_getloadavg();
if ($load[0] > 4) {
  sleep(30);
echo "Busy server - sleep 30 seconds<br>";
}else if ($load[0] > 2) {
  sleep(5);
echo "Busy server - sleep 5 seconds<br>";
}else{
if (time_nanosleep(0, 30000000) === true) {
  echo "Slept for 0.03 second.<br>";
}
}
[/code]
Link to comment
Share on other sites

Load one line into an array to search with. Then loop through the entire other file -- overwriting the same array until you find what you're looking for.

Works great if there would be only one line in each file that would match one line in the other. If multiple lines are possible, you might have to make the array multi-dimensional, but it should work.

It'll be slow, though. Want faster? mysql was built for relational data.
Link to comment
Share on other sites

do you think there are performance problems though looping through a 40,000 line file 40,000 times? I really have no idea. although on average, the line I am seeking would be at line 20,000, so I would only be looping through an average of 20,000 lines 40,000 times. (not counting those that do not have a match)

And I do not care about speed - this is a background process - I care about memory and CPU usage
Link to comment
Share on other sites

Hey - I switched it to the loop thought the file thing - I think I like it - I am amazed how fast if can loop through that many lines and take a substr to find a match.

I see the CPU is way up but my memory is down to like 10% - I have that code in there to make the program sleep whenever there is traffic, so I am not worried so much about the CPU.

Thanks
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.