papaface Posted February 6, 2011 Share Posted February 6, 2011 Hi there I'm trying to work out how I'd go about searching my database for URLs that have been submitted by users that are duplicates. Now this isn't as simple as http://www.example.com/something.php and http://www.example.com/something.php oh no! People have entered links such as http://www.example.com//something.php or http://www.example.com///something.php The script (I didn't write it) currently see's these links as unique, but they're not. They function exactly the same way. Anyone have any advice on how to weed these out? I was thinking maybe a straight forward LIKE '%//%' in the mySQL syntax, but this will bring all URLs up due to the http:// aspect of the link Any feedback would be appreciated! Quote Link to comment https://forums.phpfreaks.com/topic/226870-find-duplicate-links-in-mysql-table/ Share on other sites More sharing options...
fortnox007 Posted February 6, 2011 Share Posted February 6, 2011 Hmm isn't this something for regular expressions? I can't directly come up with a good one for this but that's the first thought i have. Quote Link to comment https://forums.phpfreaks.com/topic/226870-find-duplicate-links-in-mysql-table/#findComment-1170615 Share on other sites More sharing options...
papaface Posted February 6, 2011 Author Share Posted February 6, 2011 Yeah, I think the initial checking should be done via regex, but there would need to be a way of cross referencing the records so that it can spot a duplicate. The database has around 127,000 links. Quote Link to comment https://forums.phpfreaks.com/topic/226870-find-duplicate-links-in-mysql-table/#findComment-1170616 Share on other sites More sharing options...
fortnox007 Posted February 6, 2011 Share Posted February 6, 2011 hehe i am thinking really hard about it, but nothing good popped up yet Quote Link to comment https://forums.phpfreaks.com/topic/226870-find-duplicate-links-in-mysql-table/#findComment-1170620 Share on other sites More sharing options...
fortnox007 Posted February 6, 2011 Share Posted February 6, 2011 Is this maybe something? havent tested it, but the idea is that it should group the url's according to the end part after having them grouped. $query = "select your_url from your_table group by your_url having REGEXP '(/|//|//|///).[a-z]{3,4}$'"; But i would love to hear some expert about this Quote Link to comment https://forums.phpfreaks.com/topic/226870-find-duplicate-links-in-mysql-table/#findComment-1170630 Share on other sites More sharing options...
QuickOldCar Posted February 6, 2011 Share Posted February 6, 2011 I think firstly I would make a rule not to allow any // or /// in a url.(but not the ://) Then go right into the database and change all the // and /// to a simple / Then for your checks I guess can right up a script that can cycle and look for each url. I'd probably start from asc id number, get the url value, and do a mysql select query to get any similar results, if is more than one then do a hold or delete on all except the current id. Quote Link to comment https://forums.phpfreaks.com/topic/226870-find-duplicate-links-in-mysql-table/#findComment-1170669 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.