Psycho Posted October 7, 2019 Share Posted October 7, 2019 I'm working on a project where a user will be able to create custom "surveys" which other users will fill out. Since the questions in the survey will dynamically created, I would think the *proper* way to construct this would be something like this: Questions table: question_id (PK), survey_id (FK), question_text, etc. User Results: result_id (PK), user_id (FK), date_taken, etc. Data Results table: table: data_result_id (PK), result_id (FK), question_id (FK), response, etc. So, the data for a user's response to a survey might look something like this in the data results table data_result_id | result_id | question_id | response 55 4 22 True 56 4 23 Blue 57 4 24 2 58 4 25 Capricorn So all the data is columnar and each user submission is associated based on the result_id. This is all well and good except . . . I think this would make it difficult to create dynamic reports based on responses. E.g. To do something simple such as get all the surveys where the response to question #2 is blue I would have to do something like this: SELECT * FROM data_results WHERE result_id IN ( SELECT result_id FROM user_results JOIN data_results ON user_results.result_id = data_results.result_id WHERE survey_id = 8 AND question_id = "Blue") Doing more in-depth analysis would get even more complicated. Now, if the results were stored in this format it would be much simpler to create/run dynamic reports: result_id | parent | favorite_color | no_of_cars | zodiac_sign 4 True Blue 2 Capricorn Then, I could run just SELECT * FROM survey1_results WHERE favorite_color = "Blue" But that would require processes that are generally considered bad practice: I would have to dynamical create a result table for each survey and I would have to dynamically add/remove columns from that table when the survey is being created/edited. So, is this an instance where having a process for a user to dynamically create tables/fields is the exception that proves the rule or am I overestimating the complexity of the first solution? Not sure which way to go and don't want to code myself into a corner. Quote Link to comment https://forums.phpfreaks.com/topic/309343-suggestion-on-db-schema/ Share on other sites More sharing options...
requinix Posted October 7, 2019 Share Posted October 7, 2019 You can go the relational/SQL way or the document/NoSQL way. The SQL way is the first version where you normalize everything. Definitely do not have tables with columns according to the question: it sounds nice with your hardcoded example query, but when it comes time for something dynamic you'll have to pick the column names dynamically and that's a sign of Bad Stuff. Yeah, the full query looks worse, but any good database system will be able to handle that very easily. The NoSQL way is to store each result in something like JSON form. It's still indexable, and the query to search is a little less complicated than the relational one, but it's... well, it's not relational. question_id | result ------------+------------------ 4 | { | "answers": { | "22": true, | "23": "Blue", | "24": 2, | "25": "Capricorn" | } | } (that's a hybrid relational/document style, like MySQL or PostgreSQL with JSON columns; true NoSQL doesn't have table columns and the data would be more like { "question_id": 4, "result": { ... } }) Relational works better when you need to set individual values as you go, since with a document the whole thing needs to be rewritten if some part of it changes. The downside is you have to normalize it pretty hard to get a usable schema. But then with documents, your application has to be really sure it's writing data in the proper structure because it's all essentially free-form. Quote Link to comment https://forums.phpfreaks.com/topic/309343-suggestion-on-db-schema/#findComment-1570421 Share on other sites More sharing options...
Psycho Posted October 7, 2019 Author Share Posted October 7, 2019 (edited) Thanks @requinix. Two questions: 1) Your example has a "question_id" with four results for different questions. I assume the first column is supposed to the the result_id (the unique identifier for a user submission of a survey)? 2) You state that "It's still indexable, and the query to search is a little less complicated than the relational one . . . ". How would you query the data if stored in that manner. The example I gave before was just a simple one. This will require more complicated queries to look for trends in the data. E.g. primarily there will be a date field and a location field which will be be used to see how the other values deviate based on the date and location values. If this was a "normal" table (each column is a different field) it would be easy. I know I *can* do it with the first example I gave, it just seem unnecessarily hard. And, I have no clue how I would do it with the example you gave. Any references you can point me to? EDIT: Perhaps it is not that "hard" to do it with the normalized table, but I'm having a hard time conceptualizing how I would query/process the data. Edited October 7, 2019 by Psycho Quote Link to comment https://forums.phpfreaks.com/topic/309343-suggestion-on-db-schema/#findComment-1570424 Share on other sites More sharing options...
requinix Posted October 7, 2019 Share Posted October 7, 2019 30 minutes ago, Psycho said: 1) Your example has a "question_id" with four results for different questions. I assume the first column is supposed to the the result_id (the unique identifier for a user submission of a survey)? Yeah, probably. I didn't really think too hard about what was in the table. Point is, the row corresponds to one person's survey results. Quote 2) You state that "It's still indexable, and the query to search is a little less complicated than the relational one . . . ". How would you query the data if stored in that manner. Depends on the database. PostgreSQL goes like ...WHERE result->'answers'->>'23' = 'Blue' Quote The example I gave before was just a simple one. This will require more complicated queries to look for trends in the data. E.g. primarily there will be a date field and a location field which will be be used to see how the other values deviate based on the date and location values. If this was a "normal" table (each column is a different field) it would be easy. I know I *can* do it with the first example I gave, it just seem unnecessarily hard. And, I have no clue how I would do it with the example you gave. Any references you can point me to? References? No. I think your issue is just wrapping your head around what it is and how it works. You can kinda think of the arrow chain above as a "column", in that it will get you a value from the JSON. If you wanted to query the table for results between a date range, it might be like ...WHERE result->'answers'->>'123' BETWEEN '2019-01-02 03:04:05' AND '2019-06-07 08:09:10' AND result->'answers'->>'456' = 789 - 'answers' is the tree structure in the JSON with the answers - '123' is the ID of the date question, whose answers are dates - the two dates are strings; your application knows (1) the question #123 is a date type so (2) it knows to build the BETWEEN...AND part of the query - '456' is the ID of the location question, whose answers are IDs corresponding to a list in some table somewhere of possible answers to choose from - 789 is the ID of the answer (location) to search for The code looks like $where = $placeholders = []; foreach ($things_to_search_for as $question_id => $search) { $question_info = get_question_info($question_id); if ($question_info["type"] == "date") { if ($search["type"] == "range") { $where[] = "result->'answers'->> ? BETWEEN ? AND ?"; $placeholders[] = $question_id; $placeholders[] = $search["start"]; $placeholders[] = $search["end"]; } } else if ($question_info["type"] == "choose") { $where[] = "result->'answers'->> ? = ?"; $placeholders[] = $question_id; $placeholders[] = $search["id"]; } } $query = $db->prepare("SELECT * FROM results_table WHERE " . implode(" AND ", $where)); $query->execute($placeholders); (If you're wondering, -> and ->> are actually operators, which means they can use placeholders) For the normalized one, I'm not sure what you're asking about. It seemed like you had it: one query to find the result IDs where the answer to a particular question fits particular criteria, and another to find all the answers for each result. The answers table would probably have multiple columns for the generic types necessary (integer, string, date/time, maybe one or two more) and you piece together the WHERE conditions much like above. Quote Link to comment https://forums.phpfreaks.com/topic/309343-suggestion-on-db-schema/#findComment-1570425 Share on other sites More sharing options...
Barand Posted October 7, 2019 Share Posted October 7, 2019 (edited) The simple part is creating an application to create different surveys. The hard part is an app for analyzing the results of any survey that is thrown at it. You are going to have infinite combinations of different types of answer simple Boolean (Y/N) single value number/date/text(avoid) single choice list multiple choice list value ranges (crap, poor, OK, good, excellent) with cross-tabulations/pivot tables between any two responses (eg average/count/total of X for each Y) Frequency charts to analyze responses I suppose it's a bit like printers (sell it cheap and make your money selling the ink cartridges). In this case you sell the ability to create surveys then monetize it by selling customised analysis services. Something like this should work Edited October 7, 2019 by Barand add pic Quote Link to comment https://forums.phpfreaks.com/topic/309343-suggestion-on-db-schema/#findComment-1570426 Share on other sites More sharing options...
Psycho Posted October 7, 2019 Author Share Posted October 7, 2019 (edited) @Barand the "questions" absolutely have different available response values (Boolean, Int, Varchar, etc.). But, if I go with a "normalized" response table where all the values are in a single column, I assumed I would have to make the response column a Varchar type to store a textual representation of whatever the response is. Otherwise, I'm at a loss on how to structure the responses to individual questions when the questions (and possible responses) can be changed. However, I think I've had a mental breakthrough - at least with using the "normalized" data structure. It might help to give an example of how the data will be used. This is for collectors of antique weapons to gather data on particular items to ascertain when specific production methods or properties changed over time or at different manufacturing facilities. A relateable analog of this might be for car production. So, let's say they are interested in Ford vehicles and wanted to know how the paint colors varied year over year at each plant. (Assume that the survey for each car make requires different fields, so the survey for Ford is different from other manufacturers) If I created a specific table just for the Ford vehicles surveys (non-normalized), the query would be something like this: Quote SELECT production_year, plant, auto_type, color FROM FORD_SPECIFIC_SURVEY_results_table WHERE auto_type = 'sedan' OR auto_type = 'hatchback' ORDER BY production_year, plant Now, if I needed the same data from the normalized, single results table approach I think it would work like this: Quote SELECT r.result_id, q.question, r.response -- result_id would probably be better described as submission_id -- but leaving it the same as referenced above to prevent confusion FROM ALL_SURVEYS_results_table AS rJOIN questions AS q ON q.question_id = r.question_id WHERE -- ## CONDITION TO IDENTIFY THE SUBMISSIONS I WANT -- ## BASED ON SPECIFIC SURVEY AND ANY FILTERING result_id IN ( --Get the unique survey identifiers for specific survey with the filtering SELECT result_id FROM user_results JOIN data_results ON user_results.result_id = data_results.result_id WHERE survey_id = 8 -- Identifier for specific survey AND (question_id = 5 AND response = 'sedan') -- Identifier/value for filtering on the the auto_type field OR (question_id = 5 AND response = 'hatchback') -- Identifier/value for filtering on the the auto_type field ) -- ## CONDITION TO IDENTIFY THE FIELDS/VALUES I WANT -- ## PASSING THE SPECIFID QUESITON IDs AND r.question_id IN (5, 12, 15) -- IDs for the production_year, plant & color fields Of course, this requires more post-processing of the data in order to use it for reports, graphs, etc. But, I *think* this approach will work: i.e. all selection/filtering of specific submissions will take place within the "result_id IN" subquery and the field selection would be done with the "question_id IN" condition. Thoughts? Edited October 7, 2019 by Psycho Quote Link to comment https://forums.phpfreaks.com/topic/309343-suggestion-on-db-schema/#findComment-1570428 Share on other sites More sharing options...
Barand Posted October 7, 2019 Share Posted October 7, 2019 So I guess you'd have a front-end form for the user so they can select various options ( varying depending on the survey) and then submission of the form would generate the above query and execute it. The challenge then becomes the program which, given a specific survey, will generate that front-end form. Quote Link to comment https://forums.phpfreaks.com/topic/309343-suggestion-on-db-schema/#findComment-1570432 Share on other sites More sharing options...
Barand Posted October 7, 2019 Share Posted October 7, 2019 FYI, if you are considering the JSON route suggested by @requinix there is a MySQL example here Quote Link to comment https://forums.phpfreaks.com/topic/309343-suggestion-on-db-schema/#findComment-1570433 Share on other sites More sharing options...
Psycho Posted October 8, 2019 Author Share Posted October 8, 2019 20 hours ago, Barand said: FYI, if you are considering the JSON route suggested by @requinix there is a MySQL example here I'm definitely considering this. It seems it would be infinitely easier. The linked example you provided helped me visualize it better. Quote Link to comment https://forums.phpfreaks.com/topic/309343-suggestion-on-db-schema/#findComment-1570445 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.