Jump to content

Find duplicate links in mySQL table


papaface

Recommended Posts

Hi there

 

I'm trying to work out how I'd go about searching my database for URLs that have been submitted by users that are duplicates.

Now this isn't as simple as http://www.example.com/something.php and http://www.example.com/something.php oh no!

 

People have entered links such as http://www.example.com//something.php or http://www.example.com///something.php

 

The script (I didn't write it) currently see's these links as unique, but they're not. They function exactly the same way.

 

Anyone have any advice on how to weed these out?

 

I was thinking maybe a straight forward LIKE '%//%' in the mySQL syntax, but this will bring all URLs up due to the http:// aspect of the link

 

Any feedback would be appreciated!

Link to comment
Share on other sites

Is this maybe something? havent tested it, but the idea is that it should group the url's according to the end part after having them grouped.

$query = "select your_url from your_table group by your_url having REGEXP '(/|//|//|///).[a-z]{3,4}$'";

But i would love to hear some expert about this

Link to comment
Share on other sites

I think firstly I would make a rule not to allow any // or /// in a url.(but not the ://)

 

Then go right into the database and change all the // and /// to a simple /

 

Then for your checks I guess can right up a script that can cycle and look for each url.

 

I'd probably start from asc id number, get the url value, and do a mysql select query to get any similar results, if is more than one then do a hold or delete on all except the current id.

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.