Jump to content

Recommended Posts

I am working with a ridiculous database and am having trouble querying efficiently. The database is set up something like this:

 

form_answers

  formID

  answerID

 

form_text_answers

  answerID

  value

 

form_date_answers

  answerID

  value

I first reduce the form_answers to a list that contains answerIDs that are in both form_text_answers and form_date_answers. How can I return the values from both of those tables though?

 

The guy that has done all of the previous queries actually joins each table with a new query that reduces the form_answers table for every table that he needs the value field from.

 

The only (somewhat) efficient way I can think of to do that is like this:

SELECT
   (SELECT answerID FROM form_answers WHERE [criteria]) as answers,
   fta.value,
   fda.value
FROM
   form_text_answers fta,
   form_date_answers fda
WHERE
   fta.answerID IN answers AND
   fda.answerID IN answers

 

However, MySQL doesn't recognize 'answers' as a list that I can look in.

 

Anyone have any ideas or am I going to have to stick with the inefficiency theme that everything else is already in?

 

Thanks for any help

Link to comment
https://forums.phpfreaks.com/topic/181782-joining-ids-based-on-a-single-list/
Share on other sites

Thanks for the help. That query returns an empty set though. If I do each of those CROSS JOINs separately I get results, though. I'm assuming the first JOIN is truncating all of the values that would match the second JOIN. Any other ideas?

 

Thanks

Yeah that works, thanks. I'm always hesitant to use any JOINS at all, but this seems like it is going to be the most efficient way to do it. I am running into another problem, though; I thought you could use a sub-query in the WHERE clause to compare to, but a relationship in the sub-query is restricting the super-query.

SELECT
                     fa.*,
                     fta.value,
                     fda.value,
                     fsa.value
                  FROM
                     form_questions fq,
                     form_subcategories fsc,
                     form_answers fa
                  LEFT JOIN
                     form_text_answers fta ON fta.answerID = fa.answerID
                  LEFT JOIN
                     form_date_answers fda ON fda.answerID = fa.answerID
                  LEFT JOIN
                     form_select_answers fsa ON fsa.answerID = fa.answerID
                  LEFT JOIN
                     form_number_answers fna ON fna.answerID = fa.answerID
                  WHERE
                     fa.questionID = fq.questionID AND
                     fq.subCategoryID = fsc.subCategoryID AND
                     fsc.categoryID = 1 AND
                     (SELECT fda2.value FROM form_answers fa2, form_date_answers fda2 WHERE fa2.questionID = 4 AND fa2.answerID = fa.answerID AND fda2.answerID = fa2.answerID)

 

If I leave out that last line I get 15 thousand results, but with it in, I get one thousand and it nly returns results with a question ID of 4. This has to be because of the "fa2.answerID = fa.answerID" part, but how can I make that relationship in the sub-query without restricting the super-query?

 

The reason I need to do a sub-query is because I need all questionIDs returned, but I need to compare that value if the ID is 4.

 

Thanks for any help.

Sorry, this database is very confusing.

 

The form_date_answers table contains all values associated with a particular form that are dates. The form_answers table has a questionID that relates to a specific question. All of the results that I am getting will have a value in the form_date_answers where the answerID relates to the one in form_answers and the questionID will be 4 (There will also be ones with other questionIDs). If the question ID is 4, I need to check if the date is greater than the current day. Because of this, I can't do a comparison based on what is already being returned. I need a sub-query to compare the value ONLY if it is a questionID of 4.

 

Here is an example result set returned:

 

answerID 	formID 	questionID 	value 	value 	value
903 	63 	1 	data1... 	NULL 	NULL
1197 	90 	1 	data2 ... 	NULL 	NULL
1232 	94 	1 	data3 ... 	NULL 	NULL
1246 	95 	1 	data4... 	NULL 	NULL
1260 	96 	1 	data5... 	NULL 	NULL
1274 	97 	1 	data6... 	NULL 	NULL
1288 	98 	1 	data7... 	NULL 	NULL

 

Since I am using LEFT JOIN, it returns a ton of NULL values for the ones that don't exist. As you can see, the three values are from the three respective answers tables. If the data was in the second value column, it would be from the date table. In that case, I would need to see if that value related to the questionID 4 and then compare it to NOW(); however, if the questionID wasn't 4, I don't want to compare it.

 

Thanks again for looking at this for me.

Hmm...

 

How about

SELECT
                     fa.*,
                     IF(fa.questionID = 4,fda.value = NOW(),NULL) AS callItSomeHow,
                     fta.value,
                     fda.value,
                     fsa.value
... //rest stays the same

I had no idea there was an IF function in MySQL. That could be very useful, thank you.

 

Unfortunately, I think I just realized that this isn't possible using the LEFT JOINs. Since all of the data is returned in separate rows, there isn't a way to limit the related rows by what date is in a different one. On top of that, if I get the result un-normalized like that, I will have to loop through them all and there are WAY too many for that to be efficient. I'm beginning to think the most efficient way to do this would actually be to do two separate queries. The only other way I can think of is to do it like the other queries are on this system where each table JOINs to a new sub-query with all of the conditions in it.

 

I will write up a couple queries and post them here (tomorrow). Maybe someone can tell me which makes more sense.

 

Thanks again for helping.

After a bunch of different methods, I started leaning toward a certain way of doing this as being the most efficient. Then, after testing, it wasn't efficient at all. In fact, it is much slower than a similar query that uses ten joins. Can someone explain why the query with all of the joins is faster?

 

This query averages .75 seconds:

SELECT
                     fa_end.formID,
                     fda.value
                  FROM
                     form_answers fa_end,
                     form_answers fa_act,
                     form_questions fq,
                     form_subcategories fsc,
                     form_date_answers fda,
                     form_select_answers fsa
                  WHERE
                     fa_end.questionID = fq.questionID AND
                     fq.subCategoryID = fsc.subCategoryID AND
                     fsc.categoryID = 1 AND
                     fa_end.questionID = 4 AND
                     fsa.value = 'Yes' AND
                     fa_act.questionID = 155 AND
                     fda.answerID = fa_end.answerID AND
                     fsa.answerID = fa_act.answerID AND
                     fda.value > NOW()
                     GROUP BY formID

 

This query averages .05 seconds:

SELECT fa.*
                  FROM forms f
                  INNER JOIN form_answers fa ON f.formID = fa.formID
                  INNER JOIN form_questions fq ON fa.questionID = fq.questionID
                  INNER JOIN form_subcategories fsc ON fq.subCategoryID = fsc.subCategoryID
                  LEFT JOIN form_text_answers fta ON fa.answerID = fta.answerID
                  INNER JOIN
                  (
                     SELECT fa.formID, fda.value
                     FROM form_answers fa
                     INNER JOIN form_date_answers fda ON fa.answerID = fda.answerID
                     WHERE fa.questionID = 4
                     AND fda.value > CURRENT_DATE()
                  ) AS fda ON fa.formID = fda.formID
                  INNER JOIN
                  (
                     SELECT fa.formID
                     FROM form_answers fa
                     INNER JOIN form_select_answers fsa ON fa.answerID = fsa.answerID
                     WHERE fa.questionID = 155
                     AND fsa.value != 'No'
                  ) AS fsa ON fa.formID = fsa.formID
                  INNER JOIN
                  (
                     SELECT fa.formID, fna.value, COUNT(*) AS currentCount
                     FROM form_answers fa
                     INNER JOIN form_number_answers fna ON fa.answerID = fna.answerID
                     WHERE fa.questionID = 7
                     GROUP BY fa.formID
                     HAVING currentCount < fna.value
                  ) AS fna ON fa.formID = fna.formID
                  WHERE fsc.categoryID = 1
                  GROUP BY fa.formID

 

 

Thanks

Sorry for asking this of you, but after reading about EXPLAIN and analyzing the results, it still looks to me that the first one would be faster. The only part where I am unsure of which result is faster is the select_type column where the first one uses all "SIMPLE" selects and the second uses "PRIMARY" and "DERIVED." It looks like that is just because the first one isn't using any JOINs, but I would expect that to be faster anyway. Am I missing something specific about these results?

 

slow query:

id select_type table type possible_keys key key_len ref rows Extra

1 SIMPLE fq const PRIMARY,subCategoryID PRIMARY 4 const 1 Using temporary; Using filesort

1 SIMPLE fsc const PRIMARY,categoryID PRIMARY 8 const,const 1 Using index

1 SIMPLE fa_end ref PRIMARY,questionID questionID 4 const 464 Using where; Using index

1 SIMPLE fa_act ref PRIMARY,questionID questionID 4 const 1037 Using index

1 SIMPLE fda eq_ref PRIMARY PRIMARY 4 cec.fa_end.answerID 1 Using where

1 SIMPLE fsa eq_ref PRIMARY PRIMARY 4 cec.fa_act.answerID 1 Using where

 

faster query:

id select_type table type possible_keys key key_len ref rows Extra

1 PRIMARY <derived2> ALL NULL NULL NULL NULL 12 Using temporary; Using filesort

1 PRIMARY f ref PRIMARY PRIMARY 4 fda.formID 1 Using index

1 PRIMARY fa ref formID,questionID formID 4 cec.f.formID 5 Using where; Using index

1 PRIMARY <derived4> ALL NULL NULL NULL NULL 961 Using where

1 PRIMARY <derived3> ALL NULL NULL NULL NULL 985 Using where

1 PRIMARY fq eq_ref PRIMARY,subCategoryID PRIMARY 4 cec.fa.questionID 1

1 PRIMARY fsc eq_ref PRIMARY,categoryID PRIMARY 8 cec.fq.subCategoryID,const 1 Using index

1 PRIMARY fta eq_ref PRIMARY PRIMARY 4 cec.fa.answerID 1 Using index

4 DERIVED fa ref PRIMARY,questionID questionID 4   278 Using where; Using index; Using temporary; Using f...

4 DERIVED fna eq_ref PRIMARY PRIMARY 4 cec.fa.answerID 1

3 DERIVED fa ref PRIMARY,questionID questionID 4   1037 Using index

3 DERIVED fsa eq_ref PRIMARY PRIMARY 4 cec.fa.answerID 1 Using where

2 DERIVED fa ref PRIMARY,questionID questionID 4   464 Using index

2 DERIVED fda eq_ref PRIMARY PRIMARY 4 cec.fa.answerID 1 Using where

 

Thanks again for the help.

(Sorry the tabs don't line up.)

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.