PHP shuffle without array?

B0b · May 28, 2010

Hey guys,

I wish to shuffle a .txt list into a new file without the use of an array (memory wise, huge list).

I'm having an hard time finding a solution for this, would you have any clue or suggestion?

Mchl · May 28, 2010

You could try shaking your hard disk...

Seriously though, I can't think of any solution, that would be using less resources than just loading the file into array.

You could load a file line by line, save those lines to new files on disk, then read them by random, but that's just crazy (would use less RAM though ).

B0b · May 28, 2010

Wow, I got it. I've been thinking about this for 2 days and posting the question made the bang:

Simply taking the original list's last line, save it on file #2, take first line, save, take the line before the last one, save, take second line, save and so on.

Then rince and repeat until it's mixed enough. So simple, yet efficient. The point is that I got to use it with 1M+ entries list, it would consume too much memory.

B0b · May 29, 2010

Holy crab. I've end up with a 100+ lines function for this only to realise it takes over 1 second per kb, at 100% CPU usage, which means over an hour for a file.

What to do now? Any suggestion?

function shuffleList( $theList, $listName, $replace, $quality )
{
// $theList: path to original list to shuffle
// $listName: name of the original list
// $replace: whether the original list is overwritten or saved under a new name (bool)
// $quality: shuffle iterations to determine quality of the mix (int)

if ( file_exists( $theList ) )
{
	$originalList = fopen( $theList, 'r' );
	$tmpList1 = fopen( 'mix1-' . $listName, 'w+' );
	$tmpList2 = fopen( 'mix2-' . $listName, 'w+' );

	// Count entries total.
	$total = 0;
	while ( !feof( $originalList ) )
	{
		fgets( $originalList );
		$total++;
	}
	rewind( $originalList );

	// Shuffle.
	for ( $i = 1; $i <= $quality; $i++ )
	{
		// Determine which file to shuffle.
		if ( $i == 1 )
		{
			$listRead = $originalList;
			$listWrite = $tmpList1;
		}
		elseif ( isPair( $i ) )
		{
			$listRead = $tmpList1;
			$listWrite = $tmpList2;
		}
		else
		{
			$listRead = $tmpList2;
			$listWrite = $tmpList1;
		}

		$counter = 0;
		$top = 1;
		$bot = 0;
		while ( $counter != $total )
		{
			$tmpCounter = 0;
			if ( isPair( $counter ) )
			{
				// Pick an entry at the bottom.
				while ( $tmpCounter != $total - $bot )
				{
					$tmpCounter++;
					$tmpEmail = fgets( $listRead );
				}
				$bot++;
				$counter++;
				fwrite( $listWrite, trim( $tmpEmail ) . "\r\n" );
				rewind( $listRead );
			}
			else
			{
				// Pick an entry at the top.
				while ( $tmpCounter != $top )
				{
					$tmpCounter++;
					$tmpEmail = fgets( $listRead );
				}
				$top++;
				$counter++;
				fwrite( $listWrite, trim( $tmpEmail ) . "\r\n" );
				rewind( $listRead );
			}
		}
		rewind( $listWrite );
	}

	fclose( $originalList );

	// Save mixed list.
	if ( $replace == true )
	{
		$listFinal = $theList;
	}
	else
	{
		$listFinal = 'mix-' . $listName;
	}

	$listFinal = fopen( $listFinal, 'w' );

	while ( !feof( $listWrite ) )
	{
		fwrite( $listFinal, fgets( $listWrite ) );
	}

	fclose( $tmpList1 );
	fclose( $tmpList2 );
}
}

function isPair( $number )
{
// true: pair, false: not pair.

if ( $number == 0 )
{
	return true;
}
else
{
	if ( ( $number - 1 )%2 == 1 )
	{
		return true;
	}
	else
	{
		return false;
	}
}
}

kenrbnsn · May 29, 2010

The big question is -- Why do you want to do this?

Would storing the file in a database table be better? Then you could randomly access the table.

Ken

trq · May 29, 2010

Holy crab. I've end up with a 100+ lines function for this only to realise it takes over 1 second per kb, at 100% CPU usage, which means over an hour for a file.

I'm really not sure what gave you the idea that that method would be at all efficient.

.josh · May 29, 2010

or random...how is that random?

Just a thought, but instead of randomizing the order of the list, read a random line from the list. Make all the lines the same length (take the longest line and add blankspace or another delimiter buffers to the shorter ones). Then use fseek randomizing the offset argument (and rounding up or down to the nearest newline mark).

B0b · May 29, 2010

Thanks for the input guys. Great idea Crayon Violent, but I'm worried that reading a random line wouldn't be efficient at all: in a 200k+ lines file, finding the last unread line would take ages assuming I replace read ones by spaces. The question is: would it be possible to delete these bytes directly within the original file? Quite impossible I guess.

ex:

dummy1

bob___

hello_

I randomly read "bob___" and delete it:

dummy1

hello_

Otherwise I'd have to do:

dummy1

______

hello_

Then randomly find a line and verify if it's not only spaces.

Seems like a database would be the most efficient way.

ignace · May 29, 2010

You can create a good approximation where a certain line starts, of course this assumes that each line is of equal length, example:

$line_length = 82;//in bytes (accounting for \r\n on Windows)
$filesize = filesize('test.txt');
echo 'Lines: ', ceil($filesize / $line_length);

Using something like this:

qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies
qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies
qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies
qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies
qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies
qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies
qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies
qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies
qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies
qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies

Returns "Lines: 10"

[ot]The good old Assembler days[/ot]

.josh · May 29, 2010

Thanks for the input guys. Great idea Crayon Violent, but I'm worried that reading a random line wouldn't be efficient at all: in a 200k+ lines file, finding the last unread line would take ages assuming I replace read ones by spaces. The question is: would it be possible to delete these bytes directly within the original file? Quite impossible I guess.

ex:

dummy1

bob___

hello_

I randomly read "bob___" and delete it:

dummy1

hello_

Otherwise I'd have to do:

dummy1

______

hello_

Then randomly find a line and verify if it's not only spaces.

Seems like a database would be the most efficient way.

Yes, you can delete previously read lines, however, it would basically involve removing it by writing all lines except for the selected line to a new file. This isn't very convenient esp with really large files, but it should be a lot less memory intensive than keeping all of the lines in memory at once.

But overall, if you can use a db instead of a text file for this then it would be a million times better to do that.

Sign In

PHP shuffle without array?

Recommended Posts

B0b

Link to comment

Share on other sites

Mchl

Link to comment

Share on other sites

B0b

Link to comment

Share on other sites

B0b

Link to comment

Share on other sites

kenrbnsn

Link to comment

Share on other sites

trq

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

B0b

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information