Jump to content

What's the best way to make a bookmarkable URL?


surocab

Recommended Posts

Hi everyone, I'm new here so I apologize if I should be posting this question somewhere else.

 

I have a question about how to best design a bookmarkable (permanent) URL, with some constraints. I'm designing a website using PHP and MySQL where users can post and edit articles that they've written, and I'm having trouble with the navigation to the "edit" page, specifically what unique information should be in the URL for the page. In my MySQL database, I have a table for all the articles that people have written, with columns "id", "author", "title", and "content". The primary key is "id", which is also the only unique column. With the URL, I want to give the person a bookmarkable, permanent way to edit their article. However, I don't want to use the title, because that can change, and I don't want to use the id, because it might provide information about how many articles there are. One thing I've considered is generating a unique random number/string and using it in the URL. However, I don't know if this is a good idea, because I'm not sure what's the best way to generate a unique random number/string, and I'm not sure if it will be too much of a load on the server. I was thinking about doing this with something like

$randomidCandidate = rand(0,9999999999);
while (mysqli_num_rows(mysqli_query("SELECT title FROM articles WHERE randomid='$randomidCandidate'", $database_connection))) {
     $randomidCandidate = rand(0,9999999999);
}

but I'm worried that 1) this will result in more and more $idCandidate generation attempts as the number of articles grows, and thus an increasingly heavy load on the server, and 2) having an "id" column with datatype BIGINT(10) will take up too much space and slow things down. I'm not sure if this concern is really important, though.

 

How do people normally deal with this kind of issue? I'd be very grateful for any advice or suggestions on alternative solutions. I'm familiar with mod_rewrite and don't mind using it, if it would help in this case, although I haven't been able to think of how it would. Thanks in advance.

Link to comment
Share on other sites

A popular approach is to make a slug.  A slug is based on the title of the article, but replaces spaces and punctuation and other characters that would have to be url encoded, and frequently there will be some sort of addition that insures that there will not be a problem should two users happen to create two articles with the exact same title.  You would save the slug separately from the title.

 

There are advantages to this, as it has been observed that search engines work off language, so having a url that matches the title of the article is often helpful in getting the article indexed.

 

The slug can be safely recomputed any time the article is saved, so long as it is not published.  Even after it's published the only danger in changing the slug comes from having the site indexed, and having 404 errors presented to visitors who come to the site from a search engine, but that is not a major problem.

 

In terms of mod_rewrite rules, that is not directly related to the question of slugs or another mechanism.  If your site works like this:

 

index.php?slug=3_slugs_work_great

 

The link that should be presented should be something like:

 

www.site.com/slug/3_slugs_work_great

 

This is again because it has been noticed that many times search engines will not index url's with lists of url parameters.

Link to comment
Share on other sites

Thanks for the explanation. I didn't realize that "slug" was the correct term for that.

 

However, it doesn't quite solve my problem. What I'd like to have is a permanent URL, in other words, one that won't change or have to be recomputed if users edit their article titles. I don't want users to have to update their bookmarks whenever they change their article title.

 

If it helps, I don't think this portion of the site would ever get indexed by a search engine since users would have to be logged in to access the "editarticle.php" page, so if my understanding is correct, then that part isn't a concern in this case.

Link to comment
Share on other sites

I really have to question your assumptions.  Why would I bookmark an article I wrote while I was still working on it?  It doesn't really match my experience.

 

Ok, so if you don't care about search indexing (a decision that if you ever change your mind about you will regret)  then the sky is pretty much the limit.  As for generating a random number, there is no good reason to go for a bigint vs an int.  The reality is that in either case, there is a small chance you will have a collision, but you could code around that by putting an index on it and looking up the value before you save the row, or handling a constraint error.

 

You could also go for an md5 or sha1 hash of something like the user_id + time(). 

 

Another possibility is to just use the id after all, but obfuscate the fact that you are using it.  I wrote this code for a now long defunct site.  Feel free to experiment with it.

 

class Obfuscate {
    static private function XOREncrypt($plain, $key) {
        $keyLength = strlen($key);
        $plainLength = strlen($plain);
        // Loop through input string
        for ($i = 0; $i             $rPos = $i % $keyLength;
            $r = ord($plain[$i]) ^ ord($key[$rPos]);
            $plain[$i] = chr($r);
        }
        return $plain;
    }

    static public function obfuscateIntParam($in, $key) {
        // Add some noise 
        $noise = rand(0, 15);
        $in = $noise + ($in         $in = self::XOREncrypt(decHex($in), $key);
        // Make safe for uri
        $in = strrev(strtr(base64_encode($in), '+/=', '-_,'));
        return $in;
    }

    static public function deObfuscateIntParam($in, $key) {
        $in = base64_decode(strrev(strtr($in, '-_,', '+/=')));
        $in = hexDec(self::XOREncrypt($in, $key));    
        $in = $in >> 4;
        return $in;
    }
}

 

You could do a test like this:

 

for ($x=1; $x         $url =Obfuscate::obfuscateIntParam($x, 'Ilovethesmellofnapalm');
        $original = Obfuscate::deObfuscateIntParam($url, 'Ilovethesmellofnapalm');
        echo "$x. $url = $original \n";
}

 

The advantage to something like this is: no additional storage required, and no giant hash url.

 

 

 

 

 

 

 

Link to comment
Share on other sites

Why would I bookmark an article I wrote while I was still working on it?

 

I see your point, but I'm anticipating that users might want to bookmark an article in progress so that they can easily go back to continue editing it.

 

As for generating a random number, there is no good reason to go for a bigint vs an int.  The reality is that in either case, there is a small chance you will have a collision, but you could code around that by putting an index on it and looking up the value before you save the row, or handling a constraint error.

 

But I thought that there's some difference in storage between a bigint and an int? http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html under "Storage Requirements for Numeric Types" lists bigint as 8 bytes and int as 4 bytes, but I'm not sure if I'm understanding that table correctly.

 

For coding around the possibility of a collision, how much of a hit to performance would it be to check the value before I save the row, and then possibly have to generate a new random number if there is a collision?

 

You could also go for an md5 or sha1 hash of something like the user_id + time().

 

Another possibility is to just use the id after all, but obfuscate the fact that you are using it.  I wrote this code for a now long defunct site.  Feel free to experiment with it.

 

Thanks, I didn't think of that! Thanks also for the Obfuscate() code. I'll try it out.

Link to comment
Share on other sites

I wouldn't over think the storage requirements unless you truly believe that you are going to have a huge database, which in my experience only happens to a rare few companies.  You are right that a bigint takes 8 bytes whereas an int, only takes 4.  Since an UNSIGNED int can store over 4 Billion rows, that's a pretty large universe.

 

The cost of quick select statement is not significant, however in order to truly do it properly, you need to put a write lock on the table.  Basically what you need with that approach is (in pseudocode):

 

-write lock table

-get your rand

-while select count(*) where yourrand = saverand > 0

-- get your rand

- insert row

-unlock table

 

For that reason it's a lot cleaner to make the unique index on the column and exception handle a unique constraint violation error after you try an insert.  Simple enough to just keep trying until you don't get the constraint error. 

 

These collisions will most likely be few and far between... like winning the lottery although the chances increase dramatically over time, but you know that they are possible, so you have to do something to handle them.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.