Jump to content

Recommended Posts

Hello,

 

I have a website which deals with products sales. The site has products listed on the basis of country, states and even cities. There would be estimated 50 million hits per month and same amount of mysql rows of products , details and reviews related to it per month. So, i m sure that if i maintain everything in single database my database will crash too often because of several million rows. So my thought is that i will make database based on countries. So for United States, there will be one database and everything related to US will be  put in this database, similarly for Canada, i will have another database, everything related to canada will be kept in this database...and so on. 

I will appreciate the suggestion for this purpose. what should i do?

 

Thanks

watsmyname

Thanks a lot Thorpe and fenway

 

@Thorpe, well the informations are based on most user interaction, you can say its a custom forum where user discuss about a popular products in the market, the cost they paid it, the after sales service they got, etc, etc. Several millions of users will be discussing in the topic, they will also be comparing price of products from different states, cities, countries or even continents. In the same time several millions users may be just browsing the forum and see the discussion on the same time.

 

@fenway

How craiglist manage their database? do they have single database to maintain all the classifieds?? What technic they are using? can you please throw a light upon it?

In my some project, mysql database used to crash when the table had several thousands of rows. So does performance of Mysql server depends upon the system specification??

 

Thanks

watsmyname

Not sure about the details, but I'm sure they've published extensively.

 

But I'll tell you what I tell everyone else -- you're not going to get to craiglist's size using information you obtain on a free forum.  So I wouldn't worry about that limitation until you get there.

 

As for mysql crashing & performance, everything's interconnected -- but table size is unlikely the root cause.

thanks guys,

 

well actually i am much concerned about search functionality in my site. I went through  craiglist and the search functionality is pretty fast. The search functionality in Facebook is too fast as well,  I guess they've used full text search functionality of mysql. Beside this are they using anything else to optimize search speed??

 

Thanks

watsmyname

Another question,  the suggestion i want.

 

The project i m working is little more like ebay.com [but its not the ecommerce site, we will just have discussion and comparision of products], in a sense that we have hundreds of categories and subcategories, and for each subcategories we have different properties. For example for real estate we might have no. of bed rooms, garages etc. For car, we might have VIN number,  mileage, etc. Like each of the hundreds sub categories have different properties. And we have search form for items in each subcategories. Search form is different for each categories as we have different properties.  So creating search form manually for each subcategories is not possible.  So my plan is first from back end i add subcategory say "Apartments/Real Estates". After this we'll have a place where we can add properties like bedrooms, garages, kitchens, pets allowed or not etc. We'll add different properties as required for different sub-categories.

 

Any suggestion or any comment whether i am  going right or not??

 

Thanks

 

 

I have 2 comments:

 

1st as to your db design, the generic attributes system you describe is fine and can work quite well, but another alternative is to use a subtype model, where you create a table structure for each of the subcategories.  Each option has strengths and weaknesses. 

 

The strength of the generic system is that you can create an admin tool to configure new subcategories and its attributes, however, since the actual data will be stores in a bunch of rows (1 for each individual property) you have a lot more work to get back the data for a single entry.  The tables will be long and narrow which in general works fine until the database gets absolutely gigantic.

 

The subtype tables will perform better, since queries for one subcategory will not effect queries for any other subcategories, but of course you will have to be building a lot of different tables, and will need a mapping system that maps the schema of each subcategory table to your UI.

 

2nd, in terms of performance for really big sites, there are whole books on the subject, but in terms of facebook and others, there are generally 2 different principles in place:

 

-Database queries are cached using a distributed caching system like memcached

 

-The databases themselves often use a shard mechanism to allow data to be distributed by key to one or more tables or databases.

 

-Typically multiple mysql servers are in play, using replication and a readers/writers split, where there is a cluster of "reader" mysql databases that are being queried.  Usually this works well because there are traditionally many more SELECT statements being executed in comparison to inserts/updates/deletes.

 

I would say for a small system, the most reasonable expectation you might have is that you may have seperate reader/writer database connections setup in your application, where initially they both use the same server, but could be an array of multiple servers in the future.

 

Also take a look at building in the use of a caching system early on.  Even for a small to medium size server, this can help maximize the performance you get, although it's fair warning that the first thing that usually happens with a site that experiences significant traffic is that more frontend webservers are needed, and the infrastructure needs to be split off a single server into a multi server setup.  This causes all sorts of problems for people in terms of where they are storing sessions, how they will distribute code and assets to their frontend farm etc.  There are many different ways to address these problems, and no magic answer. Large sites are constantly battling performance issues, but once they have enough traffic that they can monetize things, there is usually money for people to focus on scalability.

 

Thanks a ton gizmola,

 

I think creating individual table for each subcategories will be effective. Like first we create a table where we list all the subcategories and name of the tables related to those subcategories. And while creating sub-categories from back end, we'll have option to specify the attributes [properties]. Once done, we'll save the name of subcategory and the name of the new table to the indexing table. and we create a new table with attributes we mentioned from the back end. Am i right?

 

But i m still confused how we create a search form or product posting form on client side depending upon the created sub category table?? Like for apartment we need Area [textbox], Number of Bedrooms [Combobox], Pets Allowed [Checkbox].. How we manage these form fields dynamically depending upon the sub categories??

 

Thanks a lot

Watsmyname

 

 

If I understand you, yes basically that is one way to do it.  Another way would be just to use a naming convention for the tables like:

 

db_parentcat_subcat

 

Then your code can basically knows what table to query against via the convention, and can easily be formulated in a variable.

 

The easiest way to hande the mapping, would be to have a simple object/relational mapper file for each table that you code.  Often in the MVC frameworks, this would be one of the scripts that would exist in the model layer.  You could have a base class that sets up the types of UI elements and generic methods to handle each, and then you derive a child class and set that up.  Really OOP is tailor made for these types of problems.  With PHP autoloading you can also use name conventions to get autoloading of these scripts making updates to the functionality something that doesn't require a major release everytime you build a new subcategory and it's corresponding ORM model class.

 

The class name convention could be:

 

categorySubcategory.php

 

And you might expect to find something like:

 


class CategorySubCategory extends Subcategory {
   private $keys = array('id' => 'integer');
   private $columns = array('apartment' => array('type' => 'varchar', 
                                                                                'maxlength' => 40),
                              'pets' => array('type' => 'boolean'),  //etc
   function __construct() {
      parent::__construct($this->keys, $this->columns);
   }
}

 

These are mostly boilerplate configuration of the key columns and attributes, that your UI code can use to determine what sort of UI widget it should use. 

 

There are a couple of well known ORM libraries being used in the php world.  This article talks about 2 of them, just to get an idea of what they are trying to do, and you could always consider using one or the other rather than rolling your own.

 

http://tfountain.co.uk/blog/2008/7/8/propel-doctrine-comparison       

                                                                 

                                                                           

}

Thanks again,

 

The great tips, much appreciated. What i came up with is that,  we creates attributes for the subcategories, then create a new database table for that subcategory, say "tbl_apartments", and these attributes are saved in this table, simultaneously, we create two xmls for each subcategory, one will be used to create a search form and another to post new product form in the client side. I thought of creating xmls because, suppose there would be a combobox required which might have 100 of options, but we store just one selected value in the table "tbl_apartments" we created. But your idea of creating individual table for each sub-category is nice idea, the search would be pretty faster.

 

Thanks

Watsmyname

 

 

Sure glad to help.  Often people use a config file like a .xml or .ini or similar.  I personally try and have my php application configs be plain php arrays that I can include, because it limits the overhead, but the .xml is certainly a legit way of handling that problem, and certainly will offer good documentation, readability and maintainability.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.