Jump to content

Complex project


mohcis

Recommended Posts

Hello there,

I am new to PHP and MYSQL, but I have a basic understanding of DB concept and some programming experience with other languages.

I am currently working on a project and I need your input as to the best way to design my website.

Here is the project description: [u]a vendor rating system.[/u]

[u] The process:[/u] There are 9,000 vendors who sell to customers in multiple states. A vendor may or may not operate in all States.
-Each customer has different characteristics for instance age, marital status, gender, range, etc.
-Each vendor sells a combination of different products (there are about 6 different types of product on the market and these products are mutually exclusive).
-A customer can use only one product type at a time. There are 8 different ways each product can be used. For instance, a body soap can be used to take bath and wash hand. It's not frequent to find people who collect different brand of body soap.

- Each vendor can have relationship (on a different matter: for instance exchanging information about their customers) with other vendors.
- I have the raw data about all the products sold from all the vendors along with all the characteristics previously described.

[u]Goals:[/u] I would like to assign a score (based on customer experience with the product) to each vendor. Specifically, I want to accomplish the following:

0- Define the scoring system: for instance, the percentage of specific group of satisfied customers for a vendor, after taking into account whether or not the vendor sold at least 2% of the product to the specified group.

1. The customer comes to the website, inputs the characteristics and the zipcode, then the web application will show the top ranked vendors for the customer.

2. The customer can decide not to give all the characteristics, and I still should be able to provide top vendors.

3. The customer may want to see the top vendors in the county, state or nationwide instead and the web application should be able to give it as well.

4. I also want to show the customer the list of vendors that do business with a particular vendor.

Now my questions are:

How many tables do I need in this situation?
How do I go about designing a consistent database for this?

Any help will be appreciated.

Thank you

Cisse.
Link to comment
Share on other sites

cust_id | fname | lname | ...  >> [b]customer table[/b]

vendor_id | vname | ... >> [b]vendor table[/b]

state_id | name | abbreviation | ... >> [b]state table[/b]

prod_id | name | cat_id | ... >> [b]product table[/b]

cat_id | name | ... >> [b]product category table[/b]

The ... represents any extra information required for each row in that table, such as an address for the customer table.  Given those 4 tables we can start relating your data.

rel_id | vendor_id | cust_id >> [b]vendor / customer relationship table[/b]
Each row in the table is a unique combination of vendor_id and cust_id.

rel_id | vendor_id | state_id >> [b]vendor / state relationship table[/b]
Each row in the table is a unique combination of which state a particular vendor is active in.

rel_id | vendor_id | prod_id >> [b]vendor / product relationship table[/b]
Each row in the table is a unique combination of a product carried by a vendor.

purchase_id | vendor_id | cust_id | prod_id | date | cost | ... >> [b]purchase history table[/b]
This table will allow you to generate a purchase history of all customers across all vendors
for all products.

That's just one possible DB setup and possible relating tables.  Remember to keep in mind when creating your DB how the data relates to itself.  Define clearly what a product is, what a customer is, what a vendor is, etc.  The number of tables required will depend on the types of relationships the data has.  Is the relationship between a product and vendor one to one (only that vendor carries that product) or one to many (several vendors carry that product)?  Does each vendor operate in only one state (one to one) or can they operate in several states (one to many)?
Link to comment
Share on other sites

Thank you roopurt18, this gives me another perspective.

I did not think of creating the relationships as you did. This helps alot.

I am not sure how big the purchase history table is going to be because the customer table alone has more than 3 millions records. Is there a potential to slow down mysql if I get this many records in one table?

For the querries, should I just do all the background calculations (goals 1 thru 4) and then post the results in one simple table or just let customers to querry the database from the raw data? I think the web application will be faster if I store only the results in a table. But again, the number of reports will be some what limited and the application will not be that flexible.

As to your questions, the relationship between a product and a vendor is many to many (a vendor can carry many products and several vendors can carry the same product).

Each vendor can operate in many states.

Thank you.
Link to comment
Share on other sites

I'm still learning how to design a good DB myself.  I keep hearing that MySQL is designed to handle lots of data and to be fast, but most of the time people bring up numbers like "a few hundred thousand."  I'm really not sure what kind of performance you're going to see if you have millions of records.

You say the customer table has 3 million records, is that across all vendors or per vendor?

If it's the average per vendor and you have 9k vendors, you might want to give each vendor a separate DB, all of which have the same tables.  Otherwise you'd be looking at billions of records.

But seeing as how you want to run queries against all vendors, your life will probably be simpler if everything is in the same DB.

Here's my recommendation:
Put everything into a single DB, leave indexes off your tables (with the exception of the primary key), and write your site.  Set your site up with a debugging mode that will dump all of your SQL queries as sent to MySQL.  For any sluggish portion of the site that you then come across, take note of the queries and the columns searched against (i.e. Look at the WHERE clauses).  Take note of the most common fields (product name, vendor name, etc.) and go back and use ALTER TABLE statements to add indexes on those columns.  You should see some performance gain that way.

As for the particular reports you want to display, such as vendor rating, I'd try to do them in real time first and see what kind of performance you get.  If the performance is tolerable, leave it.  If there's just too much data for a particular report to be done on the fly, consider setting up a cron job to run daily or weekly to precompile the results into a table and then you can just SELECT from that table.

I've only been working with PHP and MySQL for a little more than a year and only 6 months professionally; if I'm totally off-base hopefully someone with more experience will chime in.
Link to comment
Share on other sites

You have more experience than me, I have been working with PHP for only 1 month.  Thank you for your input, I hope to get more feedback on this.

As for the database size, I have about 12 table with more than 3 millions records. So, I have a minimum of 36 millions records in total. I am not sure if MYSQL will handle it, but I think that with a better database design, it can be done. The whole question is to find the right design.

This is what I was previously thinking:
Have all 36 millions records in one table. Then create all the possible combinations of customers' profiles by concatenating the fields.
For instance, I will concatenate Cust-Id, State, County, zipcode, vendor-Id and product to identify the products sold in a particular zip code by a particular vendor. That way, I can run a select querry from the DB. But if this has to run through 36 millions records each time someone uses the web application, I can anticipate trouble down the road.

Thanks for your input.

Cisse
Link to comment
Share on other sites

If you find that you'll go the route of precompiling your reports, I'd probably set up a cron job to run every morning at like 2 or 3AM.  The purpose of this cron is to update all reports that actually need updating.  Not necessarily every report is going to need to be updated every time vendor info changes, so when the data does change, flag which reports need updating.

How does the vendor data get updated?  Do the vendors import it?  Do you import it?
Link to comment
Share on other sites

Hello there,

The vendor data get updated by an employee of the vendor.
Like, whenever there is a sale, an employee will go online, input the information relating to the sale.

Yes, it looks like I will need a user management system.

Also, each vendor may need a full report on its activity at the end of the month. This is bunch of table. I am not considering graphs at this point, that will be over my head. The format of the report will be the same (about 50 pages) and should generate specifically on the activity of each vendor. There is a need for tracking these reports over time. Will I need a different database to store the reports in pdf format?

I need help breaking down this project into doable pieces.

Thanks!
Link to comment
Share on other sites

TBH, I would start with the UMS (user management system) since no one will be able to do anything without logging in.  The UMS will most likely be the most simple part of the website.  Working on it now will allow you to continue thinking about the rest of the DB design and still produce some working code.

I would also start gathering information from your vendors as to how they'd like to use the site.  Do they want static reports that are built every day?  Or do they want up-to-date reports generated on the fly?  Some may want both.

It doesn't matter how you program the site if it doesn't work for them.

(EDIT)  Forgot to mention something.  It might be beneficial to allow super-users to tell the site to treat them as a particular user.  For example, you're the admin, you're logged in, but you need to switch to one of your regular users to test something.  Rather than logging out and back in, just add a feature for admins to switch to a specific user and then switch back when they're done.  It's something I wish I had done.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.