tomanr1 Posted September 13, 2011 Share Posted September 13, 2011 Hi everyone!! I am doing an map reduce in PHP. It's whole written in PHP (no js code generator). i've been running my script on local machine and the result was great, but trouble came when i had too run the script on server machine. I think that it's too overloaded with tasks. But firstly You have to read the description of my script to understand why do i think so. Firstly i delete current records with criteria and then send a query ($collection->find($criteria). Next for each element being represented by mongo cursor I am doing a map from collection1 to form of collection2 where the mapreduce result is being saved. This works just fine (i think). Secondly i am grabing keys (group keys) from array being result of Map (lets call it MapResult), and putting those keys as another criteria into another find but this time it was findOne and it was executed on collection2. (FindOne couse in my collection there is no records with same keys and if the keys will match it will add the values from MapResult['value'] to the current sum of those values that have been finded in collection2 with collection2->find($criteria2) where criteria2 are group keys ). If the record in collection2 exists as i have writen above the values will be add'ed (Thats where reduce function comes in handy - end it works just fine as well on local machine). The next step is saving MongoReduceResult in to collection2. I've done it in 2 ways: firstly, using update, then when update() was failing, i tried save(). Both methods were ineffective. The records where adding but (in my opinion!!!) the php update havn't been waiting for answer from mongo. And in next step in foreach ($mongo_cursor as $data) findOne ($data['id')) on criteria returned null but previous record with the same keys should be updated!! And thats when the values from previous records are being overwritten! :/ I used update options: 'upsert', 'set', 'fsync'... NOTHING worked! Any ideas?? Quote Link to comment Share on other sites More sharing options...
tomanr1 Posted September 14, 2011 Author Share Posted September 14, 2011 map and reduce functions work fine as i said on phpfreaks forum, so i won't paste code for it here. $criteria is date only. $mongoCursor = $collection->find($criteria); foreach ($mongoCursor as $key => $value) { $mapResult = $this->map($value, $groupKey); $isInCollection2 = $collection2->findOne(array('_id' => $mapResult['_id'])); if (!empty($isInCollection2)) { //If there will be record with this gorup key, the values are being summed with current $mapResult['value]; $mapResult = $this->reduce($mapResult, $city_id, $isInCollection2['value']); } $collection2->update(array('_id' => $mapResult['_id']), array('$set' => array('value' => $mapResult['value'])), array('upsert' => true, 'safe' => true)); } Input structure: ($collection1) { "_id" : ObjectId("4e6dfc8a7ba176a952000000"), "date" : "2011-08-07", "something" : 0, "moresmthng" : 1, "city_id" : 33, "prog_id" : 1230, "some_text" : "" } output structure: ($collection2) { "_id" : { "date" : "2011-08-07", "progid" : 1230, }, "value" : { "33" : { //this is city_id "something" : 0, "some_text" : "" "moresmthng" : 1, } } } if one of next records will have the same date and progId (group keys) but different city_id (for example 45) the output structure will look like: { "_id" : { "date" : "2011-08-07", "progid" : 1230, }, "value" : { "33" : { //this is city_id "something" : 0, "some_text" : "" "moresmthng" : 1, }, "45" : { //this is city_id "something" : 12, "some_text" : "blah blah" "moresmthng" : 111, } } offtopic : couldn't find the edit button:/ Quote Link to comment Share on other sites More sharing options...
tomanr1 Posted September 14, 2011 Author Share Posted September 14, 2011 I am almost sure that this is becouse the foreach is not waitng for responce from mongo update!! :/ How too check it?? ;/ i've var_dumped the amount of times, when if (!empty($isInCollection2)) is true. It was equal 169586; ( //this is my log: Log on service machine looks the same. [input] => 179055 //amount of records in collection1 matching criteria, [update_correct] => 179055 //this many times the foreach has run [update_failure] => 0 //... useles for now;P [Output] => 9452 // the amount of inserts to collection2 ($collection2 size before whole operation minus after operation) ) As You can see 179055 - 9452 = 169586 so the amount of if (...) being run is correct. On server machine i go smaller value. So the collection2->find() was looking for records, that has'nt been inserted yet but should be! :/ The collection1 on server has about 30 000 000 records, collection2 = 1 000 000 . On my local machine i've got only about 600 000 records of collection1 copied from server. Quote Link to comment Share on other sites More sharing options...
tomanr1 Posted September 15, 2011 Author Share Posted September 15, 2011 Oh come on! Don't You really know the answer?? It's simple!! in update option save should be set as number of machines that our mongo stands on. In my case: 2 machines (one master, second slave), so the options should look like: 'safe' => 2 That's all. Thanks for any effort. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.