Jump to content

Recommended Posts

Hi,

 

I need an efficient method for storing numeric values in text files. I want to minimize storage space and also need to have a method that supports fast writing and reading of such values to/from the file, while also making them available for PHP to use numerically.

 

E.g. if I have a number like "173522", I _could_ write it out in the file as a string: 1, 7, 3, 5, 2 and 2. This would take 6 bytes. If I would need to access the number again, I would have to read the 6 bytes as a string with PHP and then PHP would automatically be able to treat it as a numeric (with some slight loss of time/performance for the conversion I guess).

 

However, a method that I assume might be more efficient - but which I currently don't know how to implement - would be to convert the format of the number into a... I don't know what to call it. But I think you know what I mean. It would then take only 3 bytes of space when stored in the text file: 1 byte for numerals 1-255, 2 bytes for numerals 256-65536, etc.

 

I wonder if there is a way to implement this latter method with PHP and if it would give better performance than the previous method. Right now, I have no idea of how to do it and would be grateful for any suggestions.

 

Thomas

Well, I did a lot of studying and I think I came up with a solution. I post it here in case someone ever will have a similar problem.

 

I found that one can use the function pack().

For instance, pack("i", "12345678") packs the number 12345678 into a 4-byte expression (X below) of the same number.

Then I just use unpack("i", "X") to get back to the original number (12345678).

 

There are more details, you can check them up in the PHP manual.

Next time you encounter this problem, just open the file for binary access and use fwrite().

 

Oh, you mean if I open it for binary access, a number such as 167363982 would be automatically packed to the adequate 4-byte value before written?? That would be great.

 

But what would happen with very small values, like 27456 - would that be packed to a 2-byte or 4-byte expression and is there a way to control which?

In binary, all values of a particular type are the same size.

 

For example, an integer may be 4 or 8 bytes.  All integer values, regardless of the value, will use the same amount of storage space.  So the value 1 uses the same amount of space as 32,000.

 

If you know that all of your numbers are smaller than the maximum size for a short integer, you can typecast them all to (short); note that I don't actually know for a fact that PHP supports the short integer data type, just that other languages do.

 

If you know that all values are positive, you can typecast them as unsigned, assuming that PHP supports that as well.

I am trying to grasp this.

 

(1) Yesterday I made a short experiment, with some code like the following:

 

$file = "test.txt";

$number = 1234567889;

$fp = fopen($file, "wb");

fwrite($fp, $number);

fclose($fp);

 

When I checked what had actually been written in the text file, using Notepad, the number shown was a decimal number, like thus: 1234567889 - not 4 strange-looking bytes... What did I do wrong here?

 

(2) Also, I think I will prefer to store decimals in different formats, both as 2-byte shorts and as 4-byte integers, i.e. use shorts whenever the number is below 32000 something, and integers for larger values. Is that possible with binary access or would I need to do something extra?

 

From the comments in the PHP Manual:

To write 'true binary' files combine with pack() :

 

$a = 65530;

$fp = fopen('test.dat', 'w');

fwrite($fp, pack('L', $a));

fclose($fp);

 

As for your second part, you're better of just using a standard one size fits all.  First, it keeps things simple.  If you want to use variable byte width you will also have to place flags in the file to indicate how many bytes the next value is.  That will complicate both your code that will read and write to the file.

 

Second, you won't gain much space.  Let's say an int is 4 bytes and a short is 2.  Let's say that you write a single int and a single short, thats 6 bytes.  You will also need to use at least 1 byte to flag each of them, bringing the total up to 8 bytes.  So you've saved nothing.  Of course, this depends on your data you'll be writing and the actual size of an int or short on the target platform.

From the comments in the PHP Manual:

To write 'true binary' files combine with pack() :

 

$a = 65530;

$fp = fopen('test.dat', 'w');

fwrite($fp, pack('L', $a));

fclose($fp);

 

As for your second part, you're better of just using a standard one size fits all.  First, it keeps things simple.  If you want to use variable byte width you will also have to place flags in the file to indicate how many bytes the next value is.  That will complicate both your code that will read and write to the file.

 

Second, you won't gain much space.  Let's say an int is 4 bytes and a short is 2.  Let's say that you write a single int and a single short, thats 6 bytes.  You will also need to use at least 1 byte to flag each of them, bringing the total up to 8 bytes.  So you've saved nothing.  Of course, this depends on your data you'll be writing and the actual size of an int or short on the target platform.

 

Great, thanks! This clarrifies matter.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.