Jump to content

Data Field Decisions


kingnutter

Recommended Posts

Hi everyone,

 

I am putting together a website which will be a specialist thesaurus with an open submission system (permissions dependant on membership status).

 

The directory of categories will be heirachical.

 

What I would like in the first instance is some advice on my Words table most importantly regarding the amount of data fields. I am slightly concerned that there are too many of these, but I can’t think how else to split it up.

 

I’ve made some notes after each field to explain what each is for and any uncertainties I have.

 

CREATE TABLE words (

    wid BIGINT unsigned NOT NULL auto_increment, (This is a unique Identifier for each word as many words have more than one definition.)

    word TEXT NOT NULL, (The actual word itself)

    category BIGINT unsigned NOT NULL DEFAULT 0,

    tags TEXT, (when users submit a word they have to provide tags which will be used for user searche queries)

    def LONGTEXT, (Definition)

    pub, (This is to define whether the entry is published. I imagine it would be true or false but I’m not sure how to define this)

    author TEXT, (Submitter’s username. Is this a job for VARCHAR?)

    date, (of entry – I expect this would be something involving a TIMESTAMP)

    last_upd ***ANOTHER TIMESTAMP***

    pop_votes_amount BIGINT, (This would be for users to vote for the popularity of an entry. This one counts the votes)

    pop votes_total BIGINT,

    pop_score BIG(?) INT (pop votes_total / pop votes_amount rounded up)

    PRIMARY KEY (wid),

)

 

One of my biggest questions is the use of various INT sizes. Is it possible to change them at a later date or am I potentially slowing down the site unnecessarily?

 

First post. All thoughts welcome.

 

Cheers,

Gary

 

Link to comment
Share on other sites

Welcome to PHPFreaks!  ;D

 

TEXT/LONGTEXT is quite undesirable -- and in v5.0+, VARCHAR can be 65K, so you rarely need TEXT (VARCHAR was only <=255 in v4.1 and below).  These very large fields have implications for sorting/group by operations, data seeks, etc.  They should be avoided if possible -- and if not, they should be in their own table, ideally.

 

BIGINT it *really* big... INT gives you 4 billion numbers (8 billion if it's UNSIGNED)... takes up half the space, which can be very important for indexing too!

 

You can use TIMESTAMP for created/modified, but it's a bit tricky, there are time zone issues, max date = 2037, etc., etc., so be warned.  I prefer DATETIME, even though it's a few extra bytes.

 

So, in a nutshell, drop the BIGINTs, use VARCHARs with reasonable lengths (be generous, but not an order of magnitude), stay away from keywords like "date", and don't use [LONG]TEXT fields.  Obviously, indexing some other fields will be necessary, too.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.