Jump to content

Database Scheme Questions


fry2010

Recommended Posts

I have a couple of questions and thoughts, I wondered if anyone could give some advice and recommendations:

 

Question 1) In creating a member site I want lots of details about each member. However I imagine that many of these options may not be used. So in terms of performance, is it better to have one table that holds ALL member data or split into a couple of tables?

 

The data I am thinking of is: facebook link, twitter link, website url, age, sex, gender, occupation, blog, privacy, newsletter subscription etc.. then usual member details like username, password etc.

Should I combine this data or create seperate tables? Sorry if this is too general.

 

Question 2) My understanding of MyISAM tables is that the table gets locked when it is being used, thus this can lead to bottlenecks. Would it be a good idea for later on performance wise, to have two tables of the same data then bind a table id to each user. Kind of how a multiple hard drive system might work on your computer. I would only intend to use this method for tables I believe to be used heavily. e.g.

 

create table `user` (

userId int unsigned not null AUTO_INCREMENT,

tableData tinyint(1) unsigned not null DEFAULT 1,

etc....

);

 

 

create table `userProfile_table1` (

twitter char(255) not null DEFAULT '',

facebook etc....

);

 

create table `userProfile_table2` (

twitter char(255) not null DEFAULT '',

facebook etc....

);

 

 

At first I know what you will say, having two tables containing the same layout and data is a bad design. But both tables won't actually store the same data. They will store data for seperate users. e.g

 

Lets say I had 2000 members sign up. 1000 of those members will get their data listed into `userProfile_table1`. The other 1000 have their data listed into `userProfile_table2`

 

So now instead of all 2000 trying to access one table, it will be 1000 trying to access one table.

 

A problem I see however, is that it will increase the load on the table `user` since it now needs to find out which table the user has their data saved into.

 

If this is not such a ridiculous idea and I havnt embarresed myself, another thought would be: What if I worked out the optimum number of users it becomes worthwhile to split into a new table, then create a new table at that point. So eventually if there was 100's of thousands of members there could be several tables.

Link to comment
Share on other sites

Question 1) In creating a member site I want lots of details about each member. However I imagine that many of these options may not be used. So in terms of performance, is it better to have one table that holds ALL member data or split into a couple of tables?

 

The data I am thinking of is: facebook link, twitter link, website url, age, sex, gender, occupation, blog, privacy, newsletter subscription etc.. then usual member details like username, password etc.

Should I combine this data or create seperate tables? Sorry if this is too general.

Some data can be treated as "metadata", where you store the value but also a key indicating what the value is for. So instead of a table with UserID/FacebookURL/TwitterName/etc. you have a table with UserID/Key/Value. The best version of this is to make Key be a foreign key to yet another table...

metadata metadata:

ID | Description        | IsLink | Validation                                     | Formatting
---+--------------------+--------+------------------------------------------------+---------------------------
1 | Your Facebook page |      1 | ^(http://(www\.)?facebook\.com/)?(.*)          | http://www.facebook.com/$3
2 | Your Twitter name  |      1 | ^(http://(www\.)?twitter\.com/#!/)?([a-z0-9_]+) | http://twitter.com/#!/$3

metadata data:

UserID | MetadataID | MetadataValue
-------+------------+--------------------------------
     1 |          1 | http://www.facebook.com/youtube
     1 |          2 | http://twitter.com/#!/youtube

With the thing above (which I probably thought too hard about) the types of metadata are stored in one table along with important information such as the prompt for the user, whether the data is a link, what it has to look like, and how it can be reformatted. A second table actually stores the individual values. That's appropriate for social media, websites, blogs...

 

Things like age and sex/gender I would store normally (as separate fields in a table). Privacy and subscription options are also separate from everything else - I'd just as soon store those as serialized data in one field in a table, like a serialized array.

 

Question 2) My understanding of MyISAM tables is that the table gets locked when it is being used, thus this can lead to bottlenecks. Would it be a good idea for later on performance wise, to have two tables of the same data then bind a table id to each user...

It's not a ridiculous question, but it is a bad idea. Don't split up tables like that: there are way too many disadvantages.

Link to comment
Share on other sites

People often do what you are proposing when they expect that they will have a large number of rows.  A large number of rows is when you get up past 10's of millions of rows.  Will you have 10 million users in your usertable? 

 

Yes Myisam uses table locking, however this table locking is extremely fast.  There is however, nothing preventing you from using innodb instead, which does row level locking and also has performance related improvements like clustered indexes and results caching similar to what other well known commercial rdbms's like Oracle and SQL Server have.  Innodb also has real recovery, so if your server locked up or lost power, you'd recover near instantaneously vs mysisam which has to check the tables and indexes for consistency in a process that can take hours.

 

You need to be brutally honest with yourself about the prospects you have for the number of users you can expect. 

 

 

Link to comment
Share on other sites

to requinix:

 

Interesting, so I suppose by using a metadata method I could also store other pieces of data in this mannor.  Still trying to get my head around that method though, but it looks pretty neat. Also I keep forgetting about serialization. That seems like a good idea to use when there are many values to a similair property... But how come you dont suggest to serialize age, sex and gender etc also?

 

This leaves me with another question, does having a single row in a table offer better performance over several rows? I suppose what Im asking is, does serialization really offer performance improvement over just using seperate rows of tinyint(1)? I suppose that may come down more to the types of query being performed.. So if a whole bunch of data is selected at once, then serialize it. If not, then place that data in a seperate row.

 

Thanks for the great info too..

 

 

to gizmola:

 

Being brutally honest, I expect at best a few thousand members, so I guess Ill just stick with one table. Thanks.

Link to comment
Share on other sites

Serializing data and storing that is fine so long as you KNOW you will never query for any of the serialized data. So in other words, if you have a bunch of data you need to store, read out and display when you're presenting the data for a user.  Here's an example when it would be really bad:  Let's say you have a list of flags for user communication preferences: 

 

Newletter: 1

Weekly_Newsletter: 0

Can_PM: 1

Show_email: 1

 

So you store this as an array, perhaps serialized as a json string in a database column named "flags".

 

Now your system needs to run the weekly email letter process.  Oops, the query can't reference the Weekly_Newsletter column because it's packed into flags.  So the database will have to tablescan (return every single row).

 

The meta table design suggested by Requinix is advantageous in this case, because even with a seperate column, you will probably hit the "low cardinality" issue where even though you have an index on the column, because there are only 2 values (0, 1 or T, F) it's possible that mysql will tablescan because it decides use of an index is not efficient.  The meta design makes it more likely that an index will be employed because there are ironically, many more rows in the table for each of the different meta types you're storing.

 

I have found a meta design works particularly well when only a percentage of the total universe will have a particular meta type stored.  So if you ultimately end up with 100k users, and only 10% actually update their facebook account, this is where the advantages of a meta design outweigh the disadvantages of having a seperate table.

 

The other advantage of a meta type design is when you want to be flexible in terms of the information you will be storing.  With a meta design you are able to develope a system where literally you can add a new metatype in the metatype table and instantaneously the application will be updated and can start storing that information without you changing a single line of code.  Your form code becomes more complicated because the fields on the form have to be driven by the metatype table, but it can be done.

 

With a traditional subtype table (user_profile) you have to alter the table structure and related queries when you want to change something.  This does not mean however, that applying KISS doesn't make sense for you.  You have to weigh the importance of this against your other priorities, and implementing a meta table design is more complex. 

Link to comment
Share on other sites

Ok.... so if there is a group of data that you will never need to seperate when performing a request for multiple users, then its a good idea to use serialization. However if you plan to use a  peice of that data seperatly then dont. Have I got that right?

Link to comment
Share on other sites

Anytime that you may need to filter for users based off of that data, you should not serialize it. TBH, it's rare I serialize data in my databases. So basically, yes, if there is a group of data that you will never need to seperate when performing a request for multiple users you can use serialization (note: not always a good idea, but it is typically OK)

 

I like to be able to pull statistics on what people are using (ex: template choice), about them (ex: average age), etc.... but then again my profession is based off of web analytics :P

Link to comment
Share on other sites

Ok.... so if there is a group of data that you will never need to seperate when performing a request for multiple users, then its a good idea to use serialization. However if you plan to use a  peice of that data seperatly then dont. Have I got that right?

 

I'm basically with Requinix and KingPhilip on this.  I don't recommend serializing data in most cases.  People should only do it when they have a clear design objective.  One example I can think of is in the Joomla cms world, there is an extension called K2 that basically graphs a custom article system onto joomla.  They wanted to provide people with the ability to store user defined fields.  Because it's a cms, they needed to avoid adding tables or altering table structure, so when you define custom fields, that simply drives the display of the forms, and any data that is actually recorded involving any of the custom defined fields gets stored as a serialized array.  That's a specialized case where the tradeoff makes sense.  They don't really make it clear to people that these user defined fields come with a price, but for most people, the application pretty much guarantees that the entire table will never be larger than a few hundred rows. 

 

I should also add that there are well known no-sql databases like mongodb that can offer you this same sort of flexibility.  If serializing data is looking really good to you, maybe you would be better off with a nosql db.

Link to comment
Share on other sites

Its not so much that, gizmola, im just looking at the options and best practices available. The reason I asked is because in some other threads and sources i have read that you should not where possible create lots of tables, but rather group them together.

 

So, performance wise, is it acually better to use one table for member data, even if some of that data will not likely be used by a high percentage, or split that data into seperate tables? Sorry for the questions, im just trying to nail it so I know for future also.

Link to comment
Share on other sites

So, performance wise, is it acually better to use one table for member data, even if some of that data will not likely be used by a high percentage, or split that data into seperate tables? Sorry for the questions, im just trying to nail it so I know for future also.

TBH it depends on the data that you're storing in the table. If there are multiple values, they should be split up into another table (if you're using a RDMS, which from the sounds of it you are - MySQL.) This will provide a relationship between a user and some of their options/settings/whatever. An example of what should be in a single table row:

  • Age
  • Sex
  • Location
  • Name

 

However, things like the following should be setup in separate tables:

  • Transactions
  • "likes" (bands, celebs, etc.)
  • Websites (if you allow for unlimited, or really more than 1, websites to be listed)

 

Etc... Does that make sense?

Link to comment
Share on other sites

Ok I got another one.

 

I wish to log unique hits per page, and then some content on the site is going to change after x many hits.

 

Would it be wise to use InnoDB to store the unique hits per page? here is the table layout idea:

 

create table uniqueHits (
pageId mediumint( unsigned not null REFERENCES pageTable(pageId),
uniqueUserIp char(15) not null,
PRIMARY KEY (pageId),
UNIQUE KEY(uniqueUserIp)
); ENGINE=InnoDB;

create table `pageTable` (
pageId mediumint( unsigned not null AUTO_INCREMENT,
changeAtHits smallint(5) unsigned not null DEFAULT 100,
displayedContentVersion tinyint(1) unsigned not null DEFAULT 0,
PRIMARY KEY (pageId)
);

 

Each page load would have the following queries:

 

$sql1 = "SELECT COUNT(*) FROM `uniqueHits` WHERE pageId = $pageId && `uniqueUserIp` = $uniqueUserIp LIMIT 1";

// If no entry is found, insert new entry
if($rowCount == 0)
{

  // Insert this new unique hit for the page
  $sql2 = "INSERT INTO uniqueHits (uniqueUserIp) VALUES ($uniqueUserIp) WHERE pageId = $pageId";

  // Now check to see if the number of unique hits has reached the change content limit:
  $sql3 = "SELECT a.changeAtHits, COUNT(b.pageId) 
              FROM `pageTable` AS a
              LEFT JOIN `uniqueHits` AS b
              ON a.pageId = b.pageId
              WHERE a.pageId = $pageId LIMIT 1";

  // These are example values of what is returned from previous query
  $changeAtHits = 100;
  $uniqueHitsCount = 100;


  // Change the content displayed since it reached the target of 100 unique hits
  if($uniqueHitsCount >= $changeAtHits)
  {
    $sql4 = "UPDATE pageTable SET displayedContentVersion = 2 WHERE pageId = $pageId";
  }

}
  

 

As you can see for every page there is 4 querys being made just to log and update the unique hits, this doesnt include other querys that are used on the page.

It also means there could be a sh*t load of records to go through to count the number of unique hits per page.

 

Im open to better ideas on going about this/ database scheme.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.