Jump to content

Recursive Iteration


blackhawk2165

Recommended Posts

Hello,

 

Here is my code:

<?php

require 'vendor/autoload.php';

$client = new Elasticsearch/Client();

$root = realpath('~/elkdata/for_elk_test_2014_11_24/Agencies');

$iter = new RecursiveIteratorIterator(
		new RecursiveDirectoryIterator($root, RecursiveDirectoryIterator::SKIP_DOTS),
		RecursiveIteratorIterator::SELF_FIRST,
		RecursiveIteratorIterator::CATCH_GET_CHILD);
		
$paths = array($root);
foreach ($iter as $path => $dir) {
	if ($dir -> isDir()) {
		$paths[] = $path;
		}
	}

//Create the index and mappings
$mapping['index'] = 'rvuehistoricaldocuments2009-2013'; //mapping code
$mapping['body'] = array (
	'mappings' => array (
		'documents' => array (
			'_source' => array (
				'enabled' => true
			),
			'properties' => array(
				'doc_name' => array(
					'type' => 'string',
					'analyzer' => 'standard'
				),
				'description' => array(
					'type' => 'string'
				)
			)
		)
	)
);

$client ->indices()->create($mapping)


//Now index the documents

for ($i = 0; $i <= count($paths); $i++) {
	$params ['body'][] = array(
		'index' => array(
		'type' => 'documents'
		'body' => array(
			'foo' => 'bar' //Document body goes here
			
			)
		)
	);
	
	//Every 1000 documents stop and send the bulk request.
	
	if($1 % 1000) {
		$responses = $client->bulk($params);
	
	// erase the old bulk request
	$params = array();
	
	// unset the bulk response when you are done to save memory
	unset($responses);
	}
}
?>

I am looking to index a large amount of documents using elastic search and php. I have a very complex directory filled with other directories that i need to index into an array. I wanted to see if my code looked right, and if not what am I doing wrong? 

 

Thanks,

Austin Harmon

Link to comment
Share on other sites

Well, this doesn't seem right:

 

    //Every 1000 documents stop and send the bulk request.
    
    if($1 % 1000) {
        $responses = $client->bulk($params);

 

1. The variable should be $i not $i. In fact, I believe that would cause an error. have you not even run the code at all?

 

2) The modulus will return an integer of 0 or greater. When you have an integer as a condition (i.e. if() statement) and any positive integers will be treated as TRUE. Only a 0 will be considered FALSE. With an incrementing numerator and a divisor of 1,000 the result will be a TRUE condition 999 times out of 1,000. You should change $i to be $i+1 and then check for that condition to be FALSE

 

if(($i+1) % 1000 == false) {

 

But, when you are running iterative processes over folders you have to be careful that you will not have an exceptionally long processing time. If you are going to run from the command line there won't be an issue with a timeout, but there could be memory issues. I really don't know enough about what you are doing and the purpose to give great advise on how best to proceed. If this is user facing you might want to store all the folders in a DB, then have an AJAX request to process x number of records until they have all been completed.

Link to comment
Share on other sites

Hello,

 

Thanks for your help. I believe the 1 was a typo. I am working with Elasticsearch which is an open source, local search engine. I want to index the documents into an array and then I can index them. I planned on using recursiveiteratoriterator to get all the documents and put them into an array. 

 

also are you suggesting that I change:

 

if ($i % 1000) {

        $responses = $client->bulk($params);

 

to 

 

if(($i+1) % 1000 == false) {

        $responses = $client->bulk($params);

Link to comment
Share on other sites

  • 4 weeks later...

 you suggesting that I change:

 

if ($i % 1000) {

        $responses = $client->bulk($params);

 

to 

 

if(($i+1) % 1000 == false) {

        $responses = $client->bulk($params);

 

Yes. You want that if() statement to do something different every 1,000 iterations, right. So, from 1 to 999 don't do anything and on 1,000 send the data. Then do the same at 2,000, 3,000 etc. When you do an if() statement and the value in that condition is a number (which is what modulus returns), the condition will be TRUE for any value that is not 0. So, you want to create the condition so it will only be TRUE at 1,000, 2,000, etc. To do that you need $i to be 1 on the first iteration and you want to get the modulus of ($i+1) divided by 1,000

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.