mike12255 Posted January 23, 2011 Share Posted January 23, 2011 My Project: Online school note sharing for my university. How: You upload your note(s) get credits and with those credits buy other notes to download. Problem: Whats to stop someone from uploading a note they have downloaded. Security: Currently I have all notes go through an approval system where a staff member views the note sees if it is a legit note then approves it granting the user credits and making the note downloadable. So does anyone have any theories about how I could stop someone from uploading a file that they have uploaded before or have downloaded? timestamps on files md5checksums anything I need to find a way to fix this error and I have no idea. Quote Link to comment Share on other sites More sharing options...
requinix Posted January 24, 2011 Share Posted January 24, 2011 What file format is the "note"? Can you embed meta-information inside it? Quote Link to comment Share on other sites More sharing options...
mike12255 Posted January 24, 2011 Author Share Posted January 24, 2011 i accept people to upload the notes in pdf doc rtf docx and in the future images so people that handwrite notes can upload them instead of rewriting I can use md5_file is the files are the exact same but say the person changes the file just slightly im not sure if md5 will be the same. Quote Link to comment Share on other sites More sharing options...
requinix Posted January 24, 2011 Share Posted January 24, 2011 If the file changes by even the slightest amount, the MD5 hash will be completely and unpredictably different. Same with any good hashing scheme. The best thing I can suggest is extracting the text from whatever file is uploaded and comparing it with the text in the database. If it's above some level of similarity then have it flagged for review. You can also crowdsource your approval system: let people flag items. That way you can greatly reduce the staff workload. And so you know: you'll never win the battle against fraud. You can make it annoying, difficult, and hopefully even time-consuming, but they'll always be able to beat the system given enough time. Quote Link to comment Share on other sites More sharing options...
Zurev Posted January 24, 2011 Share Posted January 24, 2011 I'm unsure how the entire checksum bit works, but if you create a checksum of a .doc file with the contents "blah" would it match a txt or rtf file with the same contents? I would assume not since doc translates funky. Quote Link to comment Share on other sites More sharing options...
mike12255 Posted January 24, 2011 Author Share Posted January 24, 2011 nope just tested, the checksum only works if it is the exact same file for example, when I upload scripts they get a random name so I upload note1.doc it goes to the db as ndfsfi_432.doc so if i try and upload note1.doc the md5 is the same as the one in the db. however if i open note1.doc copy everything and paste it into note2.doc and upload it the md5 is different so this is not an effective method. Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted January 24, 2011 Share Posted January 24, 2011 Since it seems you want to check the actual contents of the file if are the same maybe have to use preg_match() of the actual content. http://php.net/manual/en/function.preg-match.php You can probably first check if's the same filesize first, if same size then check contents against other similar sized contents using preg_match. Quote Link to comment Share on other sites More sharing options...
mike12255 Posted January 24, 2011 Author Share Posted January 24, 2011 I htought this too. But you figure Im hoping to have atleast 300+ users using this. so at 300 notes to add one ontop of all the current functions it has to go through and read 300 pages, will majorly slow it down I need to think of a faster but still appropriate method. Quote Link to comment Share on other sites More sharing options...
mike12255 Posted January 24, 2011 Author Share Posted January 24, 2011 bump Quote Link to comment Share on other sites More sharing options...
ignace Posted January 24, 2011 Share Posted January 24, 2011 The best thing I can suggest is extracting the text from whatever file is uploaded and comparing it with the text in the database. If it's above some level of similarity then have it flagged for review. You can also crowdsource your approval system: let people flag items. That way you can greatly reduce the staff workload. Indeed. You shouldn't be storing the files but their contents. If someone uploads in docx format only those with Office 2010 can actually open the file. Read out the file contents, store those into the database and let the user choose the format prior to downloading the document. Quote Link to comment Share on other sites More sharing options...
mike12255 Posted January 24, 2011 Author Share Posted January 24, 2011 I feel the only true safe way for me to do this is by letting the user view the note on the website (with copy and past disabled) and have an option to print the note. Quote Link to comment Share on other sites More sharing options...
requinix Posted January 24, 2011 Share Posted January 24, 2011 with copy and past disabled Not entirely possible. Kinda possible, but easy to get around. Quote Link to comment Share on other sites More sharing options...
mike12255 Posted January 25, 2011 Author Share Posted January 25, 2011 Damn I have no idea what to do then, cause if people can cheat, they will and the site will fail. Quote Link to comment Share on other sites More sharing options...
Pikachu2000 Posted January 25, 2011 Share Posted January 25, 2011 You might try Googling 'php plagiarism detection' and see what it turns up. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.