B0b Posted May 28, 2010 Share Posted May 28, 2010 Hey guys, I wish to shuffle a .txt list into a new file without the use of an array (memory wise, huge list). I'm having an hard time finding a solution for this, would you have any clue or suggestion? Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/ Share on other sites More sharing options...
Mchl Posted May 28, 2010 Share Posted May 28, 2010 You could try shaking your hard disk... Seriously though, I can't think of any solution, that would be using less resources than just loading the file into array. You could load a file line by line, save those lines to new files on disk, then read them by random, but that's just crazy (would use less RAM though ). Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/#findComment-1064869 Share on other sites More sharing options...
B0b Posted May 28, 2010 Author Share Posted May 28, 2010 Wow, I got it. I've been thinking about this for 2 days and posting the question made the bang: Simply taking the original list's last line, save it on file #2, take first line, save, take the line before the last one, save, take second line, save and so on. Then rince and repeat until it's mixed enough. So simple, yet efficient. The point is that I got to use it with 1M+ entries list, it would consume too much memory. Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/#findComment-1064875 Share on other sites More sharing options...
B0b Posted May 29, 2010 Author Share Posted May 29, 2010 Holy crab. I've end up with a 100+ lines function for this only to realise it takes over 1 second per kb, at 100% CPU usage, which means over an hour for a file. What to do now? Any suggestion? function shuffleList( $theList, $listName, $replace, $quality ) { // $theList: path to original list to shuffle // $listName: name of the original list // $replace: whether the original list is overwritten or saved under a new name (bool) // $quality: shuffle iterations to determine quality of the mix (int) if ( file_exists( $theList ) ) { $originalList = fopen( $theList, 'r' ); $tmpList1 = fopen( 'mix1-' . $listName, 'w+' ); $tmpList2 = fopen( 'mix2-' . $listName, 'w+' ); // Count entries total. $total = 0; while ( !feof( $originalList ) ) { fgets( $originalList ); $total++; } rewind( $originalList ); // Shuffle. for ( $i = 1; $i <= $quality; $i++ ) { // Determine which file to shuffle. if ( $i == 1 ) { $listRead = $originalList; $listWrite = $tmpList1; } elseif ( isPair( $i ) ) { $listRead = $tmpList1; $listWrite = $tmpList2; } else { $listRead = $tmpList2; $listWrite = $tmpList1; } $counter = 0; $top = 1; $bot = 0; while ( $counter != $total ) { $tmpCounter = 0; if ( isPair( $counter ) ) { // Pick an entry at the bottom. while ( $tmpCounter != $total - $bot ) { $tmpCounter++; $tmpEmail = fgets( $listRead ); } $bot++; $counter++; fwrite( $listWrite, trim( $tmpEmail ) . "\r\n" ); rewind( $listRead ); } else { // Pick an entry at the top. while ( $tmpCounter != $top ) { $tmpCounter++; $tmpEmail = fgets( $listRead ); } $top++; $counter++; fwrite( $listWrite, trim( $tmpEmail ) . "\r\n" ); rewind( $listRead ); } } rewind( $listWrite ); } fclose( $originalList ); // Save mixed list. if ( $replace == true ) { $listFinal = $theList; } else { $listFinal = 'mix-' . $listName; } $listFinal = fopen( $listFinal, 'w' ); while ( !feof( $listWrite ) ) { fwrite( $listFinal, fgets( $listWrite ) ); } fclose( $tmpList1 ); fclose( $tmpList2 ); } } function isPair( $number ) { // true: pair, false: not pair. if ( $number == 0 ) { return true; } else { if ( ( $number - 1 )%2 == 1 ) { return true; } else { return false; } } } Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/#findComment-1064915 Share on other sites More sharing options...
kenrbnsn Posted May 29, 2010 Share Posted May 29, 2010 The big question is -- Why do you want to do this? Would storing the file in a database table be better? Then you could randomly access the table. Ken Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/#findComment-1064916 Share on other sites More sharing options...
trq Posted May 29, 2010 Share Posted May 29, 2010 Holy crab. I've end up with a 100+ lines function for this only to realise it takes over 1 second per kb, at 100% CPU usage, which means over an hour for a file. I'm really not sure what gave you the idea that that method would be at all efficient. Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/#findComment-1064919 Share on other sites More sharing options...
.josh Posted May 29, 2010 Share Posted May 29, 2010 or random...how is that random? Just a thought, but instead of randomizing the order of the list, read a random line from the list. Make all the lines the same length (take the longest line and add blankspace or another delimiter buffers to the shorter ones). Then use fseek randomizing the offset argument (and rounding up or down to the nearest newline mark). Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/#findComment-1064922 Share on other sites More sharing options...
B0b Posted May 29, 2010 Author Share Posted May 29, 2010 Thanks for the input guys. Great idea Crayon Violent, but I'm worried that reading a random line wouldn't be efficient at all: in a 200k+ lines file, finding the last unread line would take ages assuming I replace read ones by spaces. The question is: would it be possible to delete these bytes directly within the original file? Quite impossible I guess. ex: dummy1 bob___ hello_ I randomly read "bob___" and delete it: dummy1 hello_ Otherwise I'd have to do: dummy1 ______ hello_ Then randomly find a line and verify if it's not only spaces. Seems like a database would be the most efficient way. Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/#findComment-1065000 Share on other sites More sharing options...
ignace Posted May 29, 2010 Share Posted May 29, 2010 You can create a good approximation where a certain line starts, of course this assumes that each line is of equal length, example: $line_length = 82;//in bytes (accounting for \r\n on Windows) $filesize = filesize('test.txt'); echo 'Lines: ', ceil($filesize / $line_length); Using something like this: qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies qqmsdlkfjlmqskdjfmlkqsjdfmlksjqdmlfkjsqmldkfjmlqskdjflmksqjdfmlkqsjdfnvsqdfeies Returns "Lines: 10" [ot]The good old Assembler days[/ot] Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/#findComment-1065055 Share on other sites More sharing options...
.josh Posted May 29, 2010 Share Posted May 29, 2010 Thanks for the input guys. Great idea Crayon Violent, but I'm worried that reading a random line wouldn't be efficient at all: in a 200k+ lines file, finding the last unread line would take ages assuming I replace read ones by spaces. The question is: would it be possible to delete these bytes directly within the original file? Quite impossible I guess. ex: dummy1 bob___ hello_ I randomly read "bob___" and delete it: dummy1 hello_ Otherwise I'd have to do: dummy1 ______ hello_ Then randomly find a line and verify if it's not only spaces. Seems like a database would be the most efficient way. Yes, you can delete previously read lines, however, it would basically involve removing it by writing all lines except for the selected line to a new file. This isn't very convenient esp with really large files, but it should be a lot less memory intensive than keeping all of the lines in memory at once. But overall, if you can use a db instead of a text file for this then it would be a million times better to do that. Quote Link to comment https://forums.phpfreaks.com/topic/203241-php-shuffle-without-array/#findComment-1065100 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.