Jump to content

Accessing data in different formats


NotionCommotion

Recommended Posts

I have an object which contains the following data.  The number of rows can be many thousands.  There will only be a dozen or so sensors for a given request and there are only 5 available function types (i.e. max, min, etc).  The data isn't coming directly from a SQL DB and is being manipulated by PHP in a per timestamp loop.

+---------------------+---------------+---------------+----------------+----------------+---------------+----------------+
|        time         | max(sensor_1) | min(sensor_1) | mean(sensor_1) | mean(sensor_2) | min(sensor_3) | mean(sensor_3) |
+---------------------+---------------+---------------+----------------+----------------+---------------+----------------+
| 2018-17-05 14:00:00 |         83.70 |         33.48 |          58.59 |          31.96 |         12.78 |          54.08 |
| 2018-17-05 14:15:00 |         46.27 |         18.51 |          32.39 |          60.49 |         24.19 |          10.36 |
| 2018-17-05 14:30:00 |         32.52 |         13.01 |          22.77 |          39.70 |         15.88 |           3.58 |
| 2018-17-05 14:45:00 |         63.29 |         25.32 |          44.30 |          11.23 |          4.49 |          65.61 |
| 2018-17-05 15:00:00 |         81.41 |         32.56 |          56.98 |          84.67 |         33.87 |          77.97 |
| 2018-17-05 15:15:00 |         50.37 |         20.15 |          35.26 |          37.58 |         15.03 |          39.69 |
| 2018-17-05 15:30:00 |          8.09 |          3.23 |           5.66 |           9.02 |          3.61 |          24.64 |
| 2018-17-05 15:45:00 |         23.54 |          9.42 |          16.48 |           0.63 |          0.25 |          49.57 |
| 2018-17-05 16:00:00 |         81.08 |         32.43 |          56.76 |          32.27 |         12.91 |          62.57 |
| 2018-17-05 16:15:00 |         20.27 |          8.11 |          14.19 |          22.35 |          8.94 |          89.53 |
| 2018-17-05 16:30:00 |         47.64 |         19.06 |          33.35 |           4.69 |          1.88 |          57.69 |
| 2018-17-05 16:45:00 |         12.28 |          4.91 |           8.60 |          13.31 |          5.32 |          74.39 |
| 2018-17-05 17:00:00 |         35.43 |         14.17 |          24.80 |          63.90 |         25.56 |          38.47 |
| 2018-17-05 17:15:00 |         10.28 |          4.11 |           7.19 |          94.99 |         38.00 |           7.95 |
| 2018-17-05 17:30:00 |         78.73 |         31.49 |          55.11 |          78.10 |         31.24 |          18.74 |
| 2018-17-05 17:45:00 |         22.28 |          8.91 |          15.60 |          59.45 |         23.78 |          34.87 |
+---------------------+---------------+---------------+----------------+----------------+---------------+----------------+

I have other objects which must be able to access the data but needs it in different formats.  For instance, one object will require data similar to the above table and will use it to source a stream to create a CSV file.  Another object will require it in the following format, however, this format will never have thousands of timestamp rows but only around 100.

$formattedData=[
    'max_sensor_1' =>[83.70,46.27,32.52,63.29,81.41,50.37,8.09,23.54,81.08,20.27,47.64,12.28,35.43,10.28,78.73,22.28],
    'min_sensor_1' =>[33.48,18.51,13.01,25.32,32.56,20.15,3.23,9.42,32.43,8.11,19.06,4.91,14.17,4.11,31.49,8.91],
    'mean_sensor_1'=>[58.59,32.39,22.77,44.30,56.98,35.26,5.66,16.48,56.76,14.19,33.35,8.60,24.80,7.19,55.11,15.60],
    'mean_sensor_2'=>[31.96,60.49,39.70,11.23,84.67,37.58,9.02,0.63,32.27,22.35,4.69,13.31,63.90,94.99,78.10,59.45],
    'min_sensor_3' =>[12.78,24.19,15.88,4.49,33.87,15.03,3.61,0.25,12.91,8.94,1.88,5.32,25.56,38.00,31.24,23.78],
    'mean_sensor_3'=>[54.08,10.36,3.58,65.61,77.97,39.69,24.64,49.57,62.57,89.53,57.69,74.39,38.47,7.95,18.74,34.87]
];

In addition to the original mentioned object which holds all the data, max(sensor_1), min(sensor_1), etc are also objects, and ideally would have the ability to efficiently access their indivual data series.  This would also facilitate providing the second $formattedData output.

I am thinking of storing the data in the main object as an array such as $data[timestamp][sensorNumber][function]=value.  This would make providing the data as suitable for the CSV file very simple. The problem with this approach, however, is that the various function/sensor objects such as max(sensor_1) won't have a direct link to the data but will need to loop through the $data array to create it.  I "could" when looping over the data create new arrays for this data, but storing the same data independently in two locations seems like a bad idea.

Any thoughts?  Thank you
 

Link to comment
Share on other sites

Just with your two examples, you have a row format and a column format. There's no way to cover both formats at the same time in a single representation.

Start with a default representation. It could match the original source format, or it could be the most common representation you need to produce, whichever you feel is best. If there's a request for the data in a different format, do the work to produce it.

Or if you need performance then keep all formats in memory at the same time. It'll suck for memory usage, and you'll have to update each one when there's new data available, but that's the usual trade-off for increased performance.

Link to comment
Share on other sites

Thanks requinix,

Was hoping there was some special data storage like http://php.net/manual/en/class.spldoublylinkedlist.php which might meet this need.  Just read https://en.wikipedia.org/wiki/Doubly_linked_list, and definitely not applicable.  Interesting however, and I will look for an opportunity to use it.

If performance becomes an issue, I suppose I can injected some assembler object into the main container object.  If not, maybe array_walk_recursive()?

Link to comment
Share on other sites

7 minutes ago, NotionCommotion said:

Was hoping there was some special data storage like http://php.net/manual/en/class.spldoublylinkedlist.php which might meet this need.  Just read https://en.wikipedia.org/wiki/Doubly_linked_list, and definitely not applicable.  Interesting however, and I will look for an opportunity to use it.

A doubly-linked list is good if you have an "array" (list) of data and will need to add and remove somewhere besides at the beginning or end of the list. It's not a very common need.

7 minutes ago, NotionCommotion said:

If performance becomes an issue, I suppose I can injected some assembler object into the main container object.  If not, maybe array_walk_recursive()?

Not sure where you're trying to go with that.

The point is simple: in one form you have a set of rows and each row contains columns of data, and in the other form you have a set of columns and each column contains rows of data. The two are mutually exclusive. Which means there is no possible data structure that does both forms implicitly.

The best solution would probably be to maintain both forms separately. One data structure with the rows-of-columns, one with the columns-of-rows. You would keep each one updated as data was added or removed. But they would be separate.

Link to comment
Share on other sites

The form (not really a form in a HTTP sort of way but some PHP data) is generated by something.  As stated in your first post, it can generate the original source or the most common representation or whatever I feel is best.

But it is not what I want but what the requesting object wants. So where I am "maybe" trying to go is pass an object which assembles it as desired by the requester.

13 hours ago, requinix said:

Start with a default representation. It could match the original source format, or it could be the most common representation you need to produce, whichever you feel is best.

 

10 hours ago, requinix said:

Not sure where you're trying to go with that.

The point is simple: in one form you have a set of rows and each row contains columns of data, and in the other form you have a set of columns and each column contains rows of data. The two are mutually exclusive. Which means there is no possible data structure that does both forms implicitly.

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.