question re: indexing highly dynamic database tables

amedhussaini · April 21, 2008

although i'm in the beginning stage of web development with php/mysql, i'm always looking ahead trying to forecast potential problems ahead of time.

my question pertains to social networking sites and database design. its my understanding that to replicate a circle of friends, etc, it only requires two tables. if these tables have hundreds of thousands of rows, how do these sites spit out queries so fast? from what i know, an index speeds up query returns, but every time a table changes, an index must be created again from the new data table. my question then:

1. 100,000 rows of data in a table. indexing is on. we insert a new row of data.. now there are 100,001 rows of data. will the re-index take a lot of cpu cycles or time? or is it negligible since its only a small modification to the 100,000 rows of data already indexed?

thanks in advance!

cheers,

amed

ohdang888 · April 21, 2008

they use unique ids when they set up the table... so row 1 has an id of 1, row 2 has #2, etc.

So they just do "SELECT * FROM table WHERE id='$id'"

and when they insert, they don't bother to do 'id' VALUES ('id'), its automatically done.

amedhussaini · April 21, 2008

i think that goes without saying..

let me restate the problem again, just in case i stated it incorrectly..

Table 1

[id] [username] [blah] [blah]

Table 1 is a basic "members" table.

Table 2 - add a friend

[id] [id -from table 1] [id of new friend, also from table1]

each time you add a friend, an entry is put in table 2. to pull all of Sarah's friends, for example, we'd query table 2 for all entries that have her id.

so.. my question was, if table 2 has hundreds of thousands of entries, perhaps even millions, how do these social networks spit out returns so quickly? cacheing? indexing?

if it is indexing, by indexing the second column in table 2, each time a person adds a friend, a new row is inserted into table 2 and it has to be indexed again.. is this re-index insanely cpu intensive? or marginal?

I hope that is a bit more clear.. I apologize if it just mucks things up lol.

jaymc · April 21, 2008

yeh I wonder how too

Its marginal though mate, I had a table with 7.5 million rows in that was the most frequently accessed and updated table

Still, queries including inserts updates and selects where instant

The genious of the database I guess

It does get a bit dirty as you go on though so always better to split mysql load or cluster

I guess thats what the big dogs do

Does anyone else have any input on this

Gamic · April 21, 2008

They could (and probably do) decide to have the data in the friends table stored in such a way that all friends of person one are stored in the same block(s) on the hard disk (rather than spread around different blocks). This would reduce the number of read operations required to actually get the information from the hard disk to the memory. This might mean that multiple instances of the same friend relationship occur to help speed up these queries.

id | user_id | friend_id | storage_block
1  | 1       | 2         | 1
2  | 1       | 3         | 1
3  | 2       | 1         | 2
4  | 3       | 1         | 3
5  | 1       | 4         | 1
6  | 4       | 1         | 4

Then all the index would need to store is that for user_id=1 the storage_block=1

Which means that we could get all friends of user_id = 1 in one block read (rather than 4 block reads).

Of course, this is a simplified example . When there are millions of rows you can see how this approach would help save a lot of time.

jaymc · April 21, 2008

How do you think tables that have 200 million rows are dealt with

Sign In

question re: indexing highly dynamic database tables

Recommended Posts

amedhussaini

Link to comment

Share on other sites

ohdang888

Link to comment

Share on other sites

amedhussaini

Link to comment

Share on other sites

jaymc

Link to comment

Share on other sites

Gamic

Link to comment

Share on other sites

jaymc

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information