Jump to content

Struct array extension idea


btherl

Recommended Posts

Hi,

I would like to run this idea by all of you and see if it makes sense.  I would like to create an extension that simulates an array of structs in C.  What I am replacing is this PHP array:

$arr = array(
  array(
    'name' => 'Brian',
    'expires' => '2006-12-31',
    'active' => true,
  ),
  array( ... ),
  ...
)

The replacement will look like this:

$c_arr = c_array_create();
c_array_add_elem($c_arr, 'name', 'string');
c_array_add_elem($c_arr, 'expires', 'date');
c_array_add_elem($c_arr, 'active', 'bool');

So much for initialization.  The above is the declaration of a structure like this:

typedef struct {
  char *name;
  date_t expires;
  bool active;
} s_person;

The extension will have a pre-defined list of types that it understands.  It needs to know types both for efficient storage, and to enable sorting.

Then, array entries can be added (to the end only)

c_array_insert($c_arr, 'Brian', '2006-12-31', true);

And they can be fetched from any location

$brian = c_array_fetch($c_arr, 0);

Additionally, I would like to implement sorting in C for these arrays (can use qsort with appropriate callbacks)

c_array_sort($c_arr, 'name=asc,expires=desc');

I will most likely implement a 'c_array_fetch_serialized()' function too, which will be quite simple due to the limited data types.  I have a need for it in my application.

Q: Why bother?
A: C Structs do not use any space for the names of elements of the structure.  C arrays do not use any space for the names of elements of the array.  This can lead to considerable saved space, at the expense of reduced flexibility.

Q: Why not use C for everything?
A: I want it all! I want PHP's high-level power combined with C's efficient storage.

Q: How is the data stored and accessed?
A: The same as a C struct.  Data will be accessed as (array_offset * struct size + struct element offset), and all data is stored in a single (very large) char*.  Strings within the array will be a char* pointing to the string itself, as is usual in C.

Q: How do you delete data from the middle of an array?
A: You can't.  But you can filter the data in php before adding it to the array.


Any comments?  Is it practical?  I've never written a PHP extension before, but I am expert in both C and PHP programming, so I am interested in comments from people with experience in writing PHP extensions.
Link to comment
Share on other sites

  • 1 year later...
FYI, this extension is now under development.  Here is the underlying data structure:

[code=php:0]typedef enum _c_array_entry_types {
        C_ARRAY_STRING,
        C_ARRAY_INTEGER,
        C_ARRAY_FLOAT,
        C_ARRAY_DOUBLE
} c_array_entry_type;

typedef struct {
        char *name;  /* Name of element */
        c_array_entry_type type;    /* Type of element */
        size_t size; /* Size of element */
        int offset;  /* Offset of element from start of struct */
} c_array_entry;

typedef struct {
        int len;    /* Length of array */
        int size;  /* Size of array entry */
        int entries_len; /* Length of array description */
        c_array_entry *entries; /* Array description */
        char *data; /* Actual array data is stored here */
} c_array;[/code]

I haven't optimized either of these for size as there will only be a few of them in existence for each array.  The actual data will all be stored in char *data, and will be packed tightly.  I'll probably add data types less than one word once i've sorted out alignment issues.
Link to comment
Share on other sites

This extension is now functional, and has some primitive sorting capabilities.  Sorting is quite slow, using a php callback.  But data is nicely packed.  Storing an array of 20k rows, each looking like this:

[code=php:0]  c_array_push($c_arr, array(
    0 => 'pickles',
    1 => 1,
    2 => 2.5 + rand(0,10),
    3 => 3.5,
    4 => 'bananas',
  ));[/code]

takes 1.9MB, compared to php's 8.6MB for the exact same array.

Creation of the array takes 0.08 seconds for c_array, compared to 0.07 seconds for a native php array.  Note that this time includes creating half the array using named elements instead of integer indices.

Sorting of the array by element 2 takes 1.7 seconds for c_array, compared to 0.41 seconds for native php array.  This could use improvement.

Currently I am sorting with C's qsort(), generating a php data structure from the c_array and passing it back to a php callback to do the comparison.  That's overhead that the native php sort doesn't have.  Perhaps some pre-defined native C comparison callbacks could speed this up.

More data: 200k array entries, same structure as above.
c_array creation: 0.73 seconds
php array creation: 0.74 seconds
c_array sort: 20.89 seconds
php array sort: 4.72 seconds
c_array memory: 16.3MB
php array memory: 85.8MB

That's an 80% reduction in memory use.  The reduction for arrays of pure integers and other small data structures will be even greater.
Link to comment
Share on other sites

I just cut sorting time down to 8 seconds for my 200k length arrays, vs 4.7 seconds for php's native arrays.  That's less than double, which is pretty good I think.  The cut from 20 seconds to 8 seconds was achieved by allocating two zval arrays at the start of the sort, and re-using them for each comparison.  The 12 seconds removed was previously spent allocating and de-allocating all those values!

So the summary is:
C arrays vs PHP native arrays
16.3MB vs 85.8MB (80% reduction in memory usage)
8s vs 4.7s (70% increase in sorting time)

Not bad eh?  If only I could cut the sorting down a bit more.
Link to comment
Share on other sites

  • 6 months later...
  • 2 weeks later...
  • 1 month later...
Sorry for the entirely offtopic-ness of the following, but I found it extremely funny....


I woke up this morning wondering if something was wrong with me.... I  dreamed about this thread!  (Guess I should stop reading PHPFreaks at 4 in the morning lol)

I don't remember the dream exactly, but it had something to do with not being able to compile some code, and then wondering how I was compiled, and then there were all these weird like... cut sceens...  It was cracked out.

Anyway... once again, sorry for the entirely offtopic post, and hopefully you didn't see a post in here and hope someone had a meaningful thing to say (hurts inside when that happens to me :().
Link to comment
Share on other sites

Wondering how you were compiled?  Do you think about that a lot?  I used to try very hard to find the answers to life, thinking that if I found them, then things would be easy.

I don't do that anymore though.  Now I go for the "whatever makes me happy" approach.  Because I realized that even if I found "the answer", it's not going to make me any happier.

BTW, everyone believes there is something wrong with them.  It's part of being human.  I think that how we deal with that feeling of "something is wrong" defines who we are.  I think that what's wrong with me is that I don't know enough, and it's because of that I will only make things worse.  And so I devoted much of my life to learning about everything, so I wouldn't make things worse :)

No surprise that I ended up as a mathematician/philosopher/programmer ..

These days I'm much more free though - I can see that the feeling that something is wrong with me doesn't mean that something actually IS wrong with me.  And separating those out lets me be free :)

Ok, very off topic I suppose.  But hardly anyone posts in this section anyway..
Link to comment
Share on other sites

Nah I don't think it was an existential crisis dream....  I think that was just a problem I encounted (just happened to be compiling my self).

I play a game though, where most of the server side stuff for NPCs is done in ASM like AI.... I had a dream once about not knowing what my AI was and going from text document to text document (actually in them) reading them.....

99% of the time I don't dream though....  Gotta sleep enough to dream lol.
Link to comment
Share on other sites

Yep.. I find I don't dream unless I get enough sleep.  And that is rare :)  Actually it's more a question of getting enough quality sleep.  I can have my eyes closed and be unconscious but still not feeling refreshed the next day.  But when I dream, that's usually the times when I'm getting quality sleep.
Link to comment
Share on other sites

  • 1 month later...
I can send you the code if you want.  I've been devloping it in a php-5.2.4 source tree, so I'm not sure if it'll work with php 4 as well.  It doesn't use any php 5 features but maybe it uses some internal macros not available in php 4 or something like that.

The odds of it getting packaged are slim .. there's a whole lot of standards you have to meet to get into PEAR.  PECL might be easier though.. hmm.  They look more forgiving :)
Link to comment
Share on other sites

Well, since it's a C extension, I'd probably say that PECL would be where to send it.  As for actually packaging it, if I remember correctly, you need a package.xml file and use [tt]pecl package[/tt] in the directory.  But that's from memory, haven't really read up on it recently.  And yeah, I'd appreciate it if you send over the code actually.  Can you just put it in a tar.gz for me rather than a zip or something? >_<
Link to comment
Share on other sites

Oh yeah, also, if you don't mind, can you just run (in the directory of your ext):

[tt]
phpize
./configure --enable-yourextension
make
[/tt]

So there's a .so file already in the modules folder? =P  I can do it on my end, but it'd be easier to distribute if you just have it done.  Whatever though. xD
Link to comment
Share on other sites

Alright, thanks a lot. :D  *Goes off to read a bit of the source*

Edit: Hmm, read the c_array.c file (don't really care about the header file.  All it does it like set the functions and resource stuff.  Boring)  And I like it.  I'll see what I can do (if anything) in terms of rewriting parts of it for speed, but otherwise it looks good.  :D  Man, I hate all the zend macros. D:  They get annoying to remember. ._.  Anyway, thanks.
Link to comment
Share on other sites

Cool.. if you want to make it releaseable, go ahead :)  c_array.php is the test script I used.  It runs through each of the features and most of the combinations in which they can be used (I'm sure I missed out some edge cases).

I've run it with memory debugging on and it doesn't leak .. but I still don't fully understand all those zend macros for freeing zvals :)  I just copied from other code and kept trying different combinations until it worked.

I notice that I have broken the rule of placing the prototype on a single line in the function comment .. I just read about that the other day.

And I hope you enjoy interpreting my pointer casts as much as I enjoyed writing them :P
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.