Jump to content

question re: indexing highly dynamic database tables


amedhussaini

Recommended Posts

although i'm in the beginning stage of web development with php/mysql, i'm always looking ahead trying to forecast potential problems ahead of time.

 

my question pertains to social networking sites and database design.  its my understanding that to replicate a circle of friends, etc, it only requires two tables.  if these tables have hundreds of thousands of rows, how do these sites spit out queries so fast?  from what i know, an index speeds up query returns, but every time a table changes, an index must be created again from the new data table.  my question then:

 

1.  100,000 rows of data in a table.  indexing is on.  we insert a new row of data.. now there are 100,001 rows of data.  will the re-index take a lot of cpu cycles or time?  or is it negligible since its only a small modification to the 100,000 rows of data already indexed?

 

thanks in advance!

 

 

cheers,

amed

they use unique ids when they set up the table... so row 1 has an id of 1, row 2 has #2, etc.

 

So they just do "SELECT * FROM table WHERE id='$id'"

 

and when they insert, they don't bother to do 'id' VALUES ('id'), its automatically done.

i think that goes without saying..

 

let me restate the problem again, just in case i stated it incorrectly..

 

Table 1

[id] [username] [blah] [blah]

 

Table 1 is a basic "members" table.

 

Table 2 - add a friend

[id] [id -from table 1] [id of new friend, also from table1]

 

each time you add a friend, an entry is put in table 2.  to pull all of Sarah's friends, for example, we'd query table 2 for all entries that have her id.

 

so.. my question was, if table 2 has hundreds of thousands of entries, perhaps even millions, how do these social networks spit out returns so quickly?  cacheing?  indexing?

 

if it is indexing, by indexing the second column in table 2, each time a person adds a friend, a new row is inserted into table 2 and it has to be indexed again.. is this re-index insanely cpu intensive?  or marginal?

 

I hope that is a bit more clear.. I apologize if it just mucks things up lol.

yeh I wonder how too

 

Its marginal though mate, I had a table with 7.5 million rows in that was the most frequently accessed and updated table

 

Still, queries including inserts updates and selects where instant

 

The genious of the database I guess

 

It does get a bit dirty as you go on though so always better to split mysql load or cluster

 

I guess thats what the big dogs do

 

Does anyone else have any input on this

They could (and probably do) decide to have the data in the friends table stored in such a way that all friends of person one are stored in the same block(s) on the hard disk (rather than spread around different blocks). This would reduce the number of read operations required to actually get the information from the hard disk to the memory.  This might mean that multiple instances of the same friend relationship occur to help speed up these queries.

 

id | user_id | friend_id | storage_block
1  | 1       | 2         | 1
2  | 1       | 3         | 1
3  | 2       | 1         | 2
4  | 3       | 1         | 3
5  | 1       | 4         | 1
6  | 4       | 1         | 4

 

Then all the index would need to store is that for user_id=1 the storage_block=1

 

Which means that we could get all friends of user_id = 1 in one block read (rather than 4 block reads).

 

Of course, this is a simplified example :P. When there are millions of rows you can see how this approach would help save a lot of time.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.