Jump to content

Solved the problem with my 'zip codes'.Look at my solution. is it a go or no go?


Recommended Posts

just a little recap:

 

i have a table who's structure is as follows:

 

CREATE TABLE `elgg_profile_data` (

  `ident` int(10) unsigned NOT NULL auto_increment,

  `owner` int(10) unsigned NOT NULL default '0' COMMENT '-> users.ident',

  `access` varchar(20) collate utf8_unicode_ci NOT NULL default 'PUBLIC',

  `name` varchar(255) collate utf8_unicode_ci NOT NULL default '',

  `value` text collate utf8_unicode_ci NOT NULL,

  PRIMARY KEY  (`ident`),

  KEY `owner` (`owner`,`access`,`name`)

) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=19001 ;

 

 

as you can probably tell, each attribute about a user is stored on a separate row ie if a user has a firstname, lastname, and middle name properties, they will have 3 corresponding rows in the database with the row.name = firstname, lastname and middle name.  and it's corresponding row.value = mike jones john.

 

here's the query i developed to search for a user.

 

select * from elgg_users eu where ident in

 

(

SELECT owner FROM elgg_profile_data WHERE name='firstname' and value = 'mike'

)

 

and eu.ident in

(

SELECT owner FROM elgg_profile_data WHERE name='lastname' and value = 'jones'

)

 

and eu.ident in

(

SELECT owner FROM elgg_profile_data WHERE name='middlename' and value = 'john'

)

 

 

now i pumped this database with over 1 million rows of dummy data, and this query took 9 seconds.  is this unacceptable?

 

not looking for coding help, just a "it's a go", or "its a no go" (and why!)

 

thanks so much for everyone who has already helped

 

One more thing....

 

here is the query, right:

 

 

select * from elgg_users eu where ident in

 

(

SELECT owner FROM elgg_profile_data WHERE name='firstname' and value = 'mike'

)

 

and eu.ident in

(

SELECT owner FROM elgg_profile_data WHERE name='lastname' and value = 'jones'

)

 

and eu.ident in

(

SELECT owner FROM elgg_profile_data WHERE name='middlename' and value = 'john'

)

 

 

 

 

 

now this is only for a 3 attribute search.  i can build a bigger query by just adding more:

 

 

and eu.ident in

(

SELECT owner FROM elgg_profile_data WHERE name='middlename' and value = 'john'

)

 

 

clauses.  will this be efficient for a query that could be as specific as 10 elements?

It depends on what you're doing... if you're doing some reporting 9 seconds probably isn't too big of a deal. If you're doing real time transaction processing for a mainstream application... then no 9 seconds won't work.

 

All that being said, given your data model I think the reality is this is about all you have to work with. It's going to be more efficient to process this stuff on the database server than to pull back a high level query and then sort through it in code, but subqueries are definitely expensive.

It depends on what you're doing... if you're doing some reporting 9 seconds probably isn't too big of a deal. If you're doing real time transaction processing for a mainstream application... then no 9 seconds won't work.

 

All that being said, given your data model I think the reality is this is about all you have to work with. It's going to be more efficient to process this stuff on the database server than to pull back a high level query and then sort through it in code, but subqueries are definitely expensive.

 

 

hey dbo.  What is a good response time?  when you talk about doing the work on the server are you talking about using stored procedures?

 

just one more quick question for you too, would it be quicker if i created another table based off my profile_data table so that one row would hold all of that user's attributes?  for example

 

 

firstname          middlename        lastname            age              foo            bar            key

---------------        ---------------------      -----------------          ---------          --------        --------          --------

mike                  john                      jones                30                bar            foo              001

 

 

then i wouldnt need subqueries.  it would be select key where foo='bar' and age='30' and middlename='john'...... etc.

 

what are your thoughts on making a seperate table for queries

When I said doing work on the server I really meant the database server... not necessarily using stored procedures so much as just running the query entirely on the database. The alternative is obviously to run some query that gets the data that you want... then having to write PHP code to further refine your search. It's just more efficient to let the DB server do the work if you can. Stored procedures don't buy you a ton... they really just save you the overhead of sending the query across the wire... since were dealing with a small amount of text it doesn't buy you much.

 

Your proposed model is obviously easy to query, the risk you run however is if you want to add more fields to the table. In this model your table grows wider and wider and can get messy in a hurry. The current model allows for more flexibility in this department, but is harder to query. You've got to make the judgment call depending on what you're doing now, and what you might want to do in the future.

 

You've got to be careful not to create redundancy in data, that's a big thing that databases try to avoid, so if you decide to change the model, I'd truly make it a change and not add it in addition to - UNLESS - you create a whole new database that runs a batch process and nightly updates a reporting database. By doing this you can leave the application and DB as they are today, and move that data to an environment that is easier to write reports against, but if it has to be real time this probably won't work either.

 

I wish I could give a better answer, hopefully I've at least given you some food for thought.

When I said doing work on the server I really meant the database server... not necessarily using stored procedures so much as just running the query entirely on the database. The alternative is obviously to run some query that gets the data that you want... then having to write PHP code to further refine your search. It's just more efficient to let the DB server do the work if you can. Stored procedures don't buy you a ton... they really just save you the overhead of sending the query across the wire... since were dealing with a small amount of text it doesn't buy you much.

 

Your proposed model is obviously easy to query, the risk you run however is if you want to add more fields to the table. In this model your table grows wider and wider and can get messy in a hurry. The current model allows for more flexibility in this department, but is harder to query. You've got to make the judgment call depending on what you're doing now, and what you might want to do in the future.

 

You've got to be careful not to create redundancy in data, that's a big thing that databases try to avoid, so if you decide to change the model, I'd truly make it a change and not add it in addition to - UNLESS - you create a whole new database that runs a batch process and nightly updates a reporting database. By doing this you can leave the application and DB as they are today, and move that data to an environment that is easier to write reports against, but if it has to be real time this probably won't work either.

 

I wish I could give a better answer, hopefully I've at least given you some food for thought.

 

 

exactly the kind of answer I was looking for.  many good points that i will certainly look into.  Thank You

You're welcome. Let me know where you end up. I'm interested to see your solution.

 

 

Hey dbo

 

what I ended up doing was creating a different table with ONLY the searchable fields included.  I figured that since I will ONLY EVER use 9 fields to search, I wont need to make this table any wider.  I will still be able to create additional fields for the profiles, but these will be displayed on the profile as tags.  These tags are searchable via my tags algorithm, which is completely independent of my new table. 

 

This new table has DRASTICALLY reduced my query time from 9 seconds to .10 seconds when I search all 9 fields on a 100,000 member dummy db.

 

OK, now can I get some input on these changes, and I have a couple questions about this approach.  ;D

 

First, is .1 seconds acceptable for a query on a live site?  If not, what range should I aim to get it within.  (now that i think about it, i havent added an index to that table, so that should speed it up even more)

 

Second, knowing that I will not ever expand my search table, was adding this table a smart thing to do to speed up my app? 

 

 

Thanks!!

#1 - This is definitely acceptable.

 

#2 - It's fine to have taken this approach if you eliminated the other table. If you've added this table in addition and have redundant data then not so much... because you've now introduced a dependency that these two tables be in sync or you could have problems.

#1 - This is definitely acceptable.

 

#2 - It's fine to have taken this approach if you eliminated the other table. If you've added this table in addition and have redundant data then not so much... because you've now introduced a dependency that these two tables be in sync or you could have problems.

 

 

OK, now I can understand what you were talking about with the redundancy.  If i change one of the tables, the other must change as well, introducing more work.  How about this:  I ONLY insert the search data into the search table, then extended profile data goes in the profile_data table.  I believe this would remove the dependency and redundancy, while allowing for flexible profiles and quick search.

 

You have really helped.  Thank you!

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.