tcjohans

Storing numeric values efficiently in a text file

tcjohans replied to tcjohans's topic in PHP Coding Help

As for your second part, you're better of just using a standard one size fits all. First, it keeps things simple. If you want to use variable byte width you will also have to place flags in the file to indicate how many bytes the next value is. That will complicate both your code that will read and write to the file. Second, you won't gain much space. Let's say an int is 4 bytes and a short is 2. Let's say that you write a single int and a single short, thats 6 bytes. You will also need to use at least 1 byte to flag each of them, bringing the total up to 8 bytes. So you've saved nothing. Of course, this depends on your data you'll be writing and the actual size of an int or short on the target platform. Great, thanks! This clarrifies matter.

Storing numeric values efficiently in a text file

tcjohans replied to tcjohans's topic in PHP Coding Help

I am trying to grasp this. (1) Yesterday I made a short experiment, with some code like the following: $file = "test.txt"; $number = 1234567889; $fp = fopen($file, "wb"); fwrite($fp, $number); fclose($fp); When I checked what had actually been written in the text file, using Notepad, the number shown was a decimal number, like thus: 1234567889 - not 4 strange-looking bytes... What did I do wrong here? (2) Also, I think I will prefer to store decimals in different formats, both as 2-byte shorts and as 4-byte integers, i.e. use shorts whenever the number is below 32000 something, and integers for larger values. Is that possible with binary access or would I need to do something extra?

How to make a server run a script when an email arrives?

tcjohans replied to tcjohans's topic in PHP Coding Help

A solution that somebody proposed on another site is just to open a mailbox - pr one for each mailing list - on the mailserver, get incoming mail directed to these, and then use cron to set a php to process them regularly at short intervals.

Storing numeric values efficiently in a text file

tcjohans replied to tcjohans's topic in PHP Coding Help

Oh, you mean if I open it for binary access, a number such as 167363982 would be automatically packed to the adequate 4-byte value before written?? That would be great. But what would happen with very small values, like 27456 - would that be packed to a 2-byte or 4-byte expression and is there a way to control which?

how to handle long process

tcjohans replied to lloydlester's topic in PHP Coding Help

I usually use set_time_limit(max number of seconds). The argument specifies the maximum execution time in seconds. If the argument is 0 you get unlimited execution time.

Storing numeric values efficiently in a text file

tcjohans replied to tcjohans's topic in PHP Coding Help

Well, I did a lot of studying and I think I came up with a solution. I post it here in case someone ever will have a similar problem. I found that one can use the function pack(). For instance, pack("i", "12345678") packs the number 12345678 into a 4-byte expression (X below) of the same number. Then I just use unpack("i", "X") to get back to the original number (12345678). There are more details, you can check them up in the PHP manual.

Does fopen() per se read the entire file into memory?

tcjohans posted a topic in PHP Coding Help

Hi, This is just a quick question. Could anyone confirm whether or not calling fopen() on a file per se means that all of it is read into memory. This is an important issue for me right now. In another thread, I have expressed the belief that I could fopen() a file and then fseek() to a given position and read some limited content there (e.g. by means of fgets() or fgetc()) without having to get the entire file into memory first (which would greatly slow down the process e.g. if dealing with very large files). However, someone else expressed a contrary opinion: "I may be mistaken, but I believe that it has to read the entire file into memory, then places the pointer at the beginning of the file at which point you can then call fseek() and go to the position you want. I don't see how PHP can just go to a particular place in a file without having at least read all data prior to the seek point in the file. fopen is the handle for the file. fseek, fgets, and all other file functions rely on this being called first. fopen read the entire file into memory and you can then work with that data. But you have to have the fopen handle first." Can someone who knows this matter please put their two cents on this issue? Thomas

October 4, 2007

Using a zillion files to store data???

tcjohans replied to tcjohans's topic in PHP Coding Help

Conceptually your wrong, your indexing files are what you are seraching for for your records otherwise you be seraching the database files themselves which you are trying not to do. Either way it can't be done without a massive writing of a normalize databasing system, and even then you'll be 5 years behind what sql, odbc, oreacal can all do I am not sure I understand when you say that "conceptually" I am wrong... Please tell me what you mean. Possibly you mean that I will need to read the entire index file according to the structure I've outlined. (In which I would use fixed-length records so as to be able to calculate the relevant position and then fseek - but I am just repeating myself here...) I think the issue really maybe just comes out to whether calling fopen() involves reading the entire file or not to memory. Also, I do agree that for the system to work it must in principle have an advanced structure, with indexes etc., like mysql et al. But that's just the fun part... In principle, PHP and C++ would face essentially the same architectonical issues with regard to a project of this kind. For instance, MySQL is also based on reading and writing to a few text files. In the end, however, PHP would end having less performance. I read somewhere that PHP is about 30% slower than compiled programs - if so, I reckon that a pure PHP-based database would be about 30% slower than e.g. MySQL. In any event, there are a few possible advantages or considerations: - Particular tasks that require faster or special treatment can by and by be "outsourced" to small C++ written executables. E.g. extensive searches. This solution might be able to make up for any performance issues vis-a-vis MySQL. - If the architecture per se is sound but the implementation is based in PHP and open source, then I could imagine that by and by people would easily be able to contribute small fixes, additional functionality, etc. There might in this way eventually be a lot more things people could achieve with this system than with e.g. MySQL. - A PHP-based system will be easier to integrate in PHP-sites than MySQL. - In principle, MySQL is an external interface to PHP, so doing away with it might _possibly_ be a performance gain in _some_ aspects. - Also, MySQL is not oriented primarily to web applications and PHP. I don't know now concretely what that possibly could mean in terms of performance for PHP, but I am sure there must be some aspects in which PHP does not get all it could have if MySQL would have been built mainly for PHP applications. (That's the way things generally are when something which is built for A is also adapted for B.)

Using a zillion files to store data???

tcjohans replied to tcjohans's topic in PHP Coding Help

Just a small point: I don't think reading an index necessarily must mean you read an entire file. Not if the index file has fixed-length rows. Suppose I want to know the position of data A corresponding to record 1234 and have an index that tells me this. Suppose also that the index file has the same length, L, for each row. Then I can simply calculate the position X where data A's position is written in the index file as (L-1) * 1234 - or some similar calculation depending on the structure of the index file. Then I fopen() the file, get to position X by means of fseek(), and read from that position by means of fgets(). No entire file read at all, though it is opened.

Using a zillion files to store data???

tcjohans replied to tcjohans's topic in PHP Coding Help

Hi, I am not sure exactly if I understand - but I hope I do. My plan is to split data into two parts: data that is of fixed length and data that is of variable length. All data of fixed length is held in one file, which holds no variable-length data. So in this file there should not be a problem with exact random-access to the data, right? Then I put all variable data in another file. Here I am thinking in terms of working in "blocks". Basically, for illustration, say if you are working with 10 blocks of data, occupying positions: Block Positions 1 1-10 2 11-20 3 21-30 etc. Then suppose you edit block 2, so that the result is just 6 bytes (instead of 10). If that happens, you just split block 2 into two new ones. You now have the following structure: 1 1-10 Occupied 2 11-16 Occupied 3 17-20 Free 4 21-30 Occupied etc. (By the way, the blocks don't have unique or fixed ID's - they're just identified by the position they start at.) Now, block 3 would be free for 4 bytes of data if sometime something of that size needs to be allocated to free space. Alternatively - let's go back to the starting point - if instead of edting block 2 to 6 bytes, you edit it to, say, 22 bytes. Now, that data won't fit into block 2 any longer. The solution will be either of two: (1) You find some other free blocks for it (if necessary by merging two or more empty adjacent blocks). (2) You extend the file, appending a new block at the very end, and put the data in that block. In turn, the old block 2 would now be free to hold new data if some would need some eventually. In any event, with this system, editing and deleting data would not need to call for changing all the data's starting points. Each change just calls for one's piece of data to be changed, and an index could feasibly be maintained registering all pieces' starting points. Does this make sense and does it respond to the problem you had in mind. I hope I understood you... What did you mean when you said that MySQL grows and shrinks in 3 directions, by the way??

Storing numeric values efficiently in a text file

tcjohans posted a topic in PHP Coding Help

Hi, I need an efficient method for storing numeric values in text files. I want to minimize storage space and also need to have a method that supports fast writing and reading of such values to/from the file, while also making them available for PHP to use numerically. E.g. if I have a number like "173522", I _could_ write it out in the file as a string: 1, 7, 3, 5, 2 and 2. This would take 6 bytes. If I would need to access the number again, I would have to read the 6 bytes as a string with PHP and then PHP would automatically be able to treat it as a numeric (with some slight loss of time/performance for the conversion I guess). However, a method that I assume might be more efficient - but which I currently don't know how to implement - would be to convert the format of the number into a... I don't know what to call it. But I think you know what I mean. It would then take only 3 bytes of space when stored in the text file: 1 byte for numerals 1-255, 2 bytes for numerals 256-65536, etc. I wonder if there is a way to implement this latter method with PHP and if it would give better performance than the previous method. Right now, I have no idea of how to do it and would be grateful for any suggestions. Thomas

How to make a server run a script when an email arrives?

tcjohans replied to tcjohans's topic in PHP Coding Help

I am not sure I understand what you mean here. What I need is a way to get a php script to run when an email comes in to the mail server posted to an email address that ends with "@lists.MYDOMAIN.org" - the php script would process it, e.g. if it is addressed to a mailing list it would check the MySQL mailing list database for a list of all subscribers to the list in question and then mail it off to them. Alternatively, an idea that came to my head today: If the mail server can just make sure to store/save all such incoming mail for some reasonable amount of time - without responding or reacting to them in any particular way -, then I should be able set up a cron job that executes a php file regularly to check all such incoming mail and performing the appropriate processes for each (e.g. forwarding to the appropriate mailing list, unsubscibe the email address, subscribe the email address, etc.) But I don't know anything about mail servers at this stage. Does any of these solutions make sense and be feasible? Or how do other script-based mailing list managers handle this particular problem - i.e. how to handle incoming mail? Thomas

Using a zillion files to store data???

tcjohans replied to tcjohans's topic in PHP Coding Help

I doubt this is correct. I checked the PHP manual yesterday for fopen(), and it mentioned nothing about fopen() involves loading the file into memory. It says: "fopen() binds a named resource, specified by filename, to a stream." I _guess_ that what happens is - if this makes sense - that fopen creates a dynamic pointer to the start position of the file, and that a variable name is then associated with that pointer (but I have only a rudimentary understanding of the deep processes here, so I may be completely wrong or just speak nonsense here). In any event, if I am right, there should thus be no need to read an entire file to memory - if information is provided somewhere (some sort of indexes) about exactly where each piece of information is to be found (file and file position) or if there is an easy manner of calculating the file and location. (E.g. if each record has a fixed length, same for all, the position where a record starts in the file would be smething like: row number * record length.) One could use fseek() to get to such a position, and then use fgets() and fgetc() to get content starting at that position (without having to read or load the entire file). Could someone confirm if I am right???

Using a zillion files to store data???

tcjohans replied to tcjohans's topic in PHP Coding Help

I made an experiment yesterday. Created 30,000 files with short arbitrary content. Gave names like 1.txt, 2.txt, etc. to facilitate file searching and have a best-case scenarion. Then used a loop with file_get_contents to read each in turn. 10,000 files took 34 secs and 30,000 took 140 seconds... So, I'll build it with just one or a few files instead!

Using a zillion files to store data???

tcjohans replied to tcjohans's topic in PHP Coding Help

If opening a file means that it is all read into memory, then there is a problem of course (but just a problem...) The ideal thing then would be if somehow one can access the data that is part of a file without actually opening it in the sense of loading it all to memory. Someone mentioned something about using "file streaming concepts" instead and I am trying to figure out what that means and see if it could be a solution. Also, I think I have actually found an idea of how to get a piece of data from a file without reading all of it. There needs to be information somewhere about in what file and in what position a particular field (or its data) is stored and of what length it is. If such information can be provided - and it can with a reasonably well-done system - then accessing such a limited part of a file shouldn't be a problem, say with fseek(), without needing to read the entire file. Would you see a problem here? Also, maybe I should add that I think maybe the most problematic performance issue has to do with doing extensive searches in very large databases - like handling SELECT ... WHERE... statements. What I am thinking about is that maybe it would be possible to eventually develop a separate small database searcher program in C++ for that particular task and with which PHP should be able to interact (just like it interacts with MySQL).

Sign In

Posts

Joined

Last visited

Profile Information

tcjohans's Achievements

Newbie (1/5)

Reputation

Storing numeric values efficiently in a text file

Storing numeric values efficiently in a text file

How to make a server run a script when an email arrives?

Storing numeric values efficiently in a text file

how to handle long process

Storing numeric values efficiently in a text file

Does fopen() per se read the entire file into memory?

Using a zillion files to store data???

Using a zillion files to store data???

Using a zillion files to store data???

Storing numeric values efficiently in a text file

How to make a server run a script when an email arrives?

Using a zillion files to store data???

Using a zillion files to store data???

Using a zillion files to store data???

Browse

Activity

Important Information