Jump to content

Application Cache and Concurrency


448191

Recommended Posts

I have generic file system cache. It currently uses what I call a 'register', which is an object that keeps track of files, the type of content and lifetimes. In short, it manages the files' metadata that isn't available through the filesystem, and also provides an interface for metadata that is.

 

In it's current state it works perfectly for components that don't deal with concurrency, such as sessions and transactions. During the request, the register is kept in memory, saved to disk at the end, and restored at the next. No problems, and relatively efficient.

 

Just to give you an idea of how the register currently works, here's the top of the class:

 

class Backbone_Cache_FileSystem_Register extends Backbone_Tools_MagicOverloader implements Countable, Iterator, ArrayAccess {

private $files = array();

public function __construct($path){

	if(file_exists($path)){
		$this->restore();
	}
	Backbone_Tools_MagicOverloader::setSubject(new Backbone_FileSystem_Info($path));
}
public function restore(){
	if(false === ($this->files = unserialize($this->getContents()))){
		throw new Backbone_Cache_Exception('Unable to restore cache register.');
	}
}
public function __destruct(){
	return $this->putContents(serialize($this->files), LOCK_EX);
}
public function count(){
	return count($this->files);
}

 

The files prop in the register is an array of file info objects, snippet from that class:

 

class Backbone_Cache_FileSystem_Info extends Backbone_FileSystem_Info {

private $lifetime;
private $id;
private $serialized = false;
private $vLock = LOCK_UN;
protected $fileClass = 'Backbone_Cache_FileSystem_File';

....
public function getVLock(){
	return $this->vLock;
}
public function setVLock($type = LOCK_EX){
	$this->vLock = $type;
}
public function setVUnlock(){
	$this->vLock = LOCK_UN;
}
public function isVLocked(){
	return $this->getVLock() != LOCK_UN;
}
public function putContents($data, $flags = LOCK_EX){
	if($this->isVLocked()){
		return false;
	}
	if(!is_scalar($data)){
		$data = serialize($data);
		$this->serialized = true;
	}
	if(!parent::putContents($data, $flags)){
		throw new Backbone_Cache_Exception('Error writing data to cache file.');
	}
	return true;
}

public function getContents(){
	if($this->getVLock() == LOCK_EX){
		return null;
	}
	if($this->isSerialized()){
		return unserialize(parent::getContents());
	}
	return parent::getContents();
}
}

 

The request method in the cache class looks like this:

 

    public function request($id, $attempts = null, $usleep = null, $waitForUnlock = true){	
    	if(isset($this->register[$id])){
    		if($this->register[$id]->isLive()){
    			$buf = $this->register[$id]->getContents();
    			if($buf === null && $waitForUnlock){
    				$attempts = $attempts? $attempts : $this->lockAttempts;
    				$usleep = $usleep? $usleep : $this->lockAttemptInterval;
				while($attempts > 1 && ($buf === null)){
					 $buf = $this->register[$id]->getContents();
					 --$attempts;
					 usleep($usleep);
				}
    			}
    				else {
    				return $this->register[$id]->getContents();		
    			}
    		}
    			else {
    			if($this->runtimeGC){
    				$this->register[$id]->unlink();
    				unset($this->register[$id]);
    			}
    		}
    	}
    	return null;
    }

 

I haven't tested the vLock mechanism yet, but it should work as expected. Except... While the register is in memory, other users of the system may access locked files, simply because the register file hasn't been stored yet.

 

So, I could rewrite the register to store separate files, and not keep anything in memory. The top of the register would then look something like this:

 

class Backbone_Cache_FileSystem_Register extends Backbone_Tools_MagicOverloader implements Countable, Iterator, ArrayAccess {

public function __construct($path){
	Backbone_Tools_MagicOverloader::setSubject(new Backbone_FileSystem_Info($path));
}
public function addFileInfo($id){
	$fileInfo = new Backbone_Cache_FileSystem_Info($id);
	$fileInfo->putContents($fileInfo);
}
public function getFileInfo($fileName){
	$fileInfo = new Backbone_Cache_FileSystem_Info($id);
	if(null === ($fileInfo = $this->getContents())){
		throw new Backbone_Cache_Exception('Unable to restore file info.');
	}		
}

 

<note>For cache info files, id === basename</note>

 

Or, keep the info in memory, but immediately write any changes. I could write all the file info abjects to a single file, like it's setup now, or make a unit of work of sorts and have the client (the cache object) or the subject (the info object) register any changes, so they can be immediately updated.

 

This means a big increase in file access though, and I can't help but feel I'm missing something... Any thoughts?

Link to comment
Share on other sites

It seems to me the best option you have available is to immediately write (what is an important question) when a change of status between locked and unlocked occurs.  This would simulate having two scripts running and working off the same piece of memory.

 

However isn't there a better way to do this?  For instance, could you move it to a database table, which might be faster for the writes if you already have the connection open (not actually certain that it would be faster)?  Another benefit would be that you would no longer need to read the entire cache into memory to use it.  Certainy i/o speed is an important concern, but so is memory consumption, and if you're reading in an entire cache file on each request only to use one or two elements then you're squandering memory.  In the long run that will cost you as you'll be forced to use virtual memory, which is of course dramatically slower.  It also seems like by using a database you can write a procedure which will return the desired row and switch the locking bit on at the same time, which means you can go directly from a hash key to the desired data without a costly read/parse/find step.  (this might be a perfect problem for Sqlite to solve)

 

If you're really hell bent on using files I would at least implement something to emulate a database.  For example, you could have a table stored in the first line of the file which maps line numbers to caches.  Then instead of reading in the entire file on each request you could instead just read in the first line, and on any cache requests you would simply randomly access the file using the table as a guide.  The table could also store the locking bit, which would reduce the burden of recording a status change, as you could simply write the first line of the file.

 

I'm just thinking outloud. Let's continue this discussion if you have any questions or things to add.

Link to comment
Share on other sites

However isn't there a better way to do this?  For instance, could you move it to a database table, which might be faster for the writes if you already have the connection open (not actually certain that it would be faster)?

 

A component using a DB would be a separate subpackage of the cache package. I doubt it would be really faster either. In fact, given that it would require at very least DB abstraction (without any mapping, just serialized LOB) it would probably be slower. If applying some ORM (instead of serialialized LOB) and adding a little water, you're left with a (basic) persistence layer instead of a caching mechanism (of course caching is about persistence).

 

Another benefit would be that you would no longer need to read the entire cache into memory to use it.  Certainy i/o speed is an important concern, but so is memory consumption, and if you're reading in an entire cache file on each request only to use one or two elements then you're squandering memory.  In the long run that will cost you as you'll be forced to use virtual memory, which is of course dramatically slower.

 

Currently it doesn't require to load the whole cache into memory. A cache is represented by a directory on disk, separate items by files. One can add child directories as subcaches. One can make it as fine grained as one likes, making it a trade off between file access frequency and throughput/memory usage.

 

For example, given something like session management, I could have the whole cache be the session registry, and the files the individual session data. Or, I could make a parent cache, and have each set of session data represented by a sub cache, and individual large objects (and arrays of smaller objects) represented by files. Whatever turns out to perform best in the specific application of the component, that's why it's supposed to be generic. In time I might add in a component that aids in making this decision, based on memory usage.

 

If you're really hell bent on using files I would at least implement something to emulate a database.  For example, you could have a table stored in the first line of the file which maps line numbers to caches.  Then instead of reading in the entire file on each request you could instead just read in the first line, and on any cache requests you would simply randomly access the file using the table as a guide.  The table could also store the locking bit, which would reduce the burden of recording a status change, as you could simply write the first line of the file

 

In a way, it's already very much like a database. There is data, metadata and a mechanism to manipulate both. But, you might be right about one thing: it would probably be more efficient to store only the actual metadata (lifetime, vLocking flag and whatever needs to be added in the future) instead of serializing an 'entire' TO. Which has me thinking about the performance of unserialization vs. construction...

 

Back to the core of the topic, I think I would prefer locking (a regular lock using flock()) a meta file during the whole time it is in memory. Currently it is only locked when writing using file_put_contents(). This last should be the faster, less memory intensive option, but as said also the more burdensome on the file system. Locking the file in use reduces the number of writes per request to 1.

 

But now that I think about it, I think I'll have to live with the cost of immediate writes. Otherwise, imagine what happens if a script decides to hang. The file would be unavailable to anyone else until max_execution_time.

 

Thanks for replying, nothing like talking about something to get your thoughts straightened out.  :)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.