dodgerfan Posted October 17, 2008 Share Posted October 17, 2008 I need code that will scan a fairly large text file (10MB+) and either display or export all of the duplicate lines. I know how to remove the dupes using array_unique and filling it with the contents of the file but I want to know what those lines are rather than just removing them. Any thoughts on how I might go about doing this? Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/ Share on other sites More sharing options...
MadTechie Posted October 17, 2008 Share Posted October 17, 2008 May fail on a very large file.. depends on memory and timeouts but <?php file_put_contents("SmallFile.txt",array_unique(split("\n", file_get_contents("LargeFile.txt")))); ?> EDIT: LOL may fail on small files.. yeah mean very large Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/#findComment-668068 Share on other sites More sharing options...
ghostdog74 Posted October 17, 2008 Share Posted October 17, 2008 store each line as associative arrays, increasing the count as dups are found. At the end, check for count of more than 1. $contents=file("file"); foreach ($contents as $k=>$v){ $v=trim($v); $array[$v]++; } print_r($array); Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/#findComment-668069 Share on other sites More sharing options...
xylex Posted October 17, 2008 Share Posted October 17, 2008 array_unique() maintains key values, so you can find the removed lines by comparing the original array with the new one using array_diff_key() Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/#findComment-668140 Share on other sites More sharing options...
dodgerfan Posted October 17, 2008 Author Share Posted October 17, 2008 xylex, Just to be clear how would that work with this code... <?php // Load file into Array $list = file('file.txt'); // Remove duplicates $list = array_unique($list); // Write back to file file_put_contents('uniques.txt', implode('', $list)); ?> Thanks in advance. Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/#findComment-668202 Share on other sites More sharing options...
xylex Posted October 17, 2008 Share Posted October 17, 2008 Untested <?php // Load file into Array $original = file('file.txt'); // Remove duplicates $uniques = array_unique($original); $removed = array_diff_key($orginal, $uniques); // Write back to file file_put_contents('uniques.txt', $uniques); file_put_contents('removed.txt', $removed); ?> Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/#findComment-668329 Share on other sites More sharing options...
dodgerfan Posted October 17, 2008 Author Share Posted October 17, 2008 I'm getting an error that the #1 argument is not an array in line 8. I'll keep trying. Maybe I've missed something. Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/#findComment-668332 Share on other sites More sharing options...
.josh Posted October 17, 2008 Share Posted October 17, 2008 That's because there's a typo. Come on man... Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/#findComment-668334 Share on other sites More sharing options...
xylex Posted October 17, 2008 Share Posted October 17, 2008 Sorry, can't spell. <?php // Load file into Array $original = file('file.txt'); // Remove duplicates $uniques = array_unique($original); $removed = array_diff_key($original, $uniques); // Write back to file file_put_contents('uniques.txt', $uniques); file_put_contents('removed.txt', $removed); ?> Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/#findComment-668335 Share on other sites More sharing options...
dodgerfan Posted October 17, 2008 Author Share Posted October 17, 2008 That works. Thanks for your help and sorry for overlooking the typo. I've been staring at code all day so it's starting to run together. Quote Link to comment https://forums.phpfreaks.com/topic/128864-find-duplicate-lines-within-a-file/#findComment-668337 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.