blackhawk2165 Posted January 14, 2015 Share Posted January 14, 2015 Hello, Here is my code: <?php require 'vendor/autoload.php'; $client = new Elasticsearch/Client(); $root = realpath('~/elkdata/for_elk_test_2014_11_24/Agencies'); $iter = new RecursiveIteratorIterator( new RecursiveDirectoryIterator($root, RecursiveDirectoryIterator::SKIP_DOTS), RecursiveIteratorIterator::SELF_FIRST, RecursiveIteratorIterator::CATCH_GET_CHILD); $paths = array($root); foreach ($iter as $path => $dir) { if ($dir -> isDir()) { $paths[] = $path; } } //Create the index and mappings $mapping['index'] = 'rvuehistoricaldocuments2009-2013'; //mapping code $mapping['body'] = array ( 'mappings' => array ( 'documents' => array ( '_source' => array ( 'enabled' => true ), 'properties' => array( 'doc_name' => array( 'type' => 'string', 'analyzer' => 'standard' ), 'description' => array( 'type' => 'string' ) ) ) ) ); $client ->indices()->create($mapping) //Now index the documents for ($i = 0; $i <= count($paths); $i++) { $params ['body'][] = array( 'index' => array( 'type' => 'documents' 'body' => array( 'foo' => 'bar' //Document body goes here ) ) ); //Every 1000 documents stop and send the bulk request. if($1 % 1000) { $responses = $client->bulk($params); // erase the old bulk request $params = array(); // unset the bulk response when you are done to save memory unset($responses); } } ?> I am looking to index a large amount of documents using elastic search and php. I have a very complex directory filled with other directories that i need to index into an array. I wanted to see if my code looked right, and if not what am I doing wrong? Thanks, Austin Harmon Quote Link to comment Share on other sites More sharing options...
Psycho Posted January 14, 2015 Share Posted January 14, 2015 Well, this doesn't seem right: //Every 1000 documents stop and send the bulk request. if($1 % 1000) { $responses = $client->bulk($params); 1. The variable should be $i not $i. In fact, I believe that would cause an error. have you not even run the code at all? 2) The modulus will return an integer of 0 or greater. When you have an integer as a condition (i.e. if() statement) and any positive integers will be treated as TRUE. Only a 0 will be considered FALSE. With an incrementing numerator and a divisor of 1,000 the result will be a TRUE condition 999 times out of 1,000. You should change $i to be $i+1 and then check for that condition to be FALSE if(($i+1) % 1000 == false) { But, when you are running iterative processes over folders you have to be careful that you will not have an exceptionally long processing time. If you are going to run from the command line there won't be an issue with a timeout, but there could be memory issues. I really don't know enough about what you are doing and the purpose to give great advise on how best to proceed. If this is user facing you might want to store all the folders in a DB, then have an AJAX request to process x number of records until they have all been completed. Quote Link to comment Share on other sites More sharing options...
blackhawk2165 Posted January 15, 2015 Author Share Posted January 15, 2015 Hello, Thanks for your help. I believe the 1 was a typo. I am working with Elasticsearch which is an open source, local search engine. I want to index the documents into an array and then I can index them. I planned on using recursiveiteratoriterator to get all the documents and put them into an array. also are you suggesting that I change: if ($i % 1000) { $responses = $client->bulk($params); to if(($i+1) % 1000 == false) { $responses = $client->bulk($params); Quote Link to comment Share on other sites More sharing options...
Psycho Posted February 9, 2015 Share Posted February 9, 2015 you suggesting that I change: if ($i % 1000) { $responses = $client->bulk($params); to if(($i+1) % 1000 == false) { $responses = $client->bulk($params); Yes. You want that if() statement to do something different every 1,000 iterations, right. So, from 1 to 999 don't do anything and on 1,000 send the data. Then do the same at 2,000, 3,000 etc. When you do an if() statement and the value in that condition is a number (which is what modulus returns), the condition will be TRUE for any value that is not 0. So, you want to create the condition so it will only be TRUE at 1,000, 2,000, etc. To do that you need $i to be 1 on the first iteration and you want to get the modulus of ($i+1) divided by 1,000 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.