Jump to content

A data structure problem !


Go to solution Solved by Jacques1,

Recommended Posts

Hi all,

 

I wish to define a hierarchical relation in a table or set of tables. I'll use an example to describe the problem.

 

In a hotel there is One Manager(M), Assistant Managers(AMs) and a Deputy Managers(DMs) and each of them can recruit 

executives for the hotel. 

  

1. The Manager(M) can recruit the AMs and the DMs.

2. Further the AM can recruit a DM which must be approved(AP_M) by the Manager. // AP_M -> field for approval by the Manager

3. The AM can directly recruit executives and they must also be approved by the Manager.(AP_M)

4. The DM can recruit executives and they must be approved by the AM (AP_AM) and the Manager (AP_M). // AP_AM -> approval by     AM

 

 

Further :

 

a) The Manager can view the performance(P) of all the AMs, DMs, and executives in the hotel whether recruited by himself 

    or by the AMs or DMs.

 

b) An AM can view the performance (P) of all the DMs that he recruits, all executives recruited by his DM's and 

    all executives recruited by the AM directly.

 

c) The Manager cannot recruit executives. That is done by the AM and DM. 

 

Please can someone help me define the table/tables and help implement these relations in them. 

 

Thanks all!

 

Link to comment
https://forums.phpfreaks.com/topic/301416-a-data-structure-problem/
Share on other sites

OK, so you explained the scenario, but you didn't give us anything about the context of the data itself.

 

1> what's the point of AP_AM on DM recruitment if it still requires AP_M anyway?  That's illogical.  Either the AM can authorise or they can't.

2> what performance data are you talking about?

3> you say in the second paragraph that M can recruit exec's but then later have a condition that they can not. Which is accurate?

4> what information relating to the authorisation process is relevant? e.g. authRequestDate, authRequestUser, authGrantUser, authGrantDate, authDeclineReason etc.

 

From what you describe it's a simple tree structure hierarchy that your looking for, but there are still a couple of ways to build that depending on how much data surrounds each aspect of the process and how relevant that data is to the other data being used.

1> what's the point of AP_AM on DM recruitment if it still requires AP_M anyway?  That's illogical.  Either the AM can authorise or they can't.

Sounds like it requires approval from both the M and AM. So two flags, like in case the M approves before the AM or vice versa.

 

+1 to how it depends what other data is involved. Do the M, AM, DM, and executives all have more or less the same data associated with them but have different ranks? Will some information need to be tracked for some positions and not for others?

 

But on that note, I'm not sure how much needs to be represented here in the first place. This is part of a sort of hiring process, right? Then these tables should track only hiring information - when someone applied/was headhunted, approval dates, notes, etc. - while actual employee data for the people who have been successfully hired belongs elsewhere.

Thanks funster and requinix for the reply.

 

 

 

1> what's the point of AP_AM on DM recruitment if it still requires AP_M anyway?  That's illogical.  Either the AM can authorise or they can't.

Requinix's rightly said  " Sounds like it requires approval from both the M and AM. So two flags, ".  It's a kind of double check you may say.

 

 

 

2> what performance data are you talking about?

 

Actually I am not exactly sure at this stage what data would be involved for performance. So I just took one bit of data P just to know how that one parameter would be handled and it would give me the idea to handle other data.

 

 

 

3> you say in the second paragraph that M can recruit exec's but then later have a condition that they can not. Which is accurate?

Sorry that's a mistake on my part. No the manager M will not directly recruit the executives.

 

 

4> what information relating to the authorization process is relevant? e.g. authRequestDate, authRequestUser, authGrantUser, authGrantDate, authDeclineReason etc.

Once again right now I am only interested in the minimum fields that would Grant the user Authorization ( a flag ) as well as take into account the user granting the authorization. 

 

@ Requinix.

 

 

Then these tables should track only hiring information - when someone applied/was headhunted, approval dates, notes, etc. - while actual employee data for the people who have been successfully hired belongs elsewhere. 

 

Very true Requinix. The idea is to backtrack the hierarchy to the Manager in each case. I had the simple case in mind like this :-

 

  ExecID     AP_M   AP_AM   AP_DM 

==================================

       11           1           1            1                               recruited by DM 

       12           1           1                                             recruited by AM

 

but this does not really backtrack the hierarchy. It just shows who all approved rather than who all had to approve. For eg in the second case how can anyone know just by looking at the data that the whether the DM has not approved or the approval was not required. I hope I am able to explain myself.

 

Besides if only one in a hundred cases does a DM recruit a franchisee, it will result in an entire column  being wasted. (I think the term is that data is not normalized). 

 

Kindly help me grapple with this. Thank loads. 

@requinix: Yeah, I just wanted to stress the need to establish if the AM has to pre-approve before the M has visibility, or if there was a pool sitting there that both the AM and M could see and that if the M was having a quiet Tuesday they could just bypass the whole AM approval and give someone a job.  Also I'm unclear on if it's a pre-auth process if there is a bypass option that the M could use to authorise in the absence of a given AM (thinking long term sick or 3 week vacation kinda thing) or what if any other process would be appointed to this scenario.

 

@ajoo: The DB schema design is a core part to app design and it pulls on aspects, to a greater or lesser degree, of the whole design brief - from the seemingly least significant bit (couldn't resist the pun) of data through to the intended UI. Because of this, answering your question accurately and properly isn't possible with the information that you have given.  I appreciate that posting up a complete design brief on a public forum is neither practical nor proper, but in the case here you really do need to go through the whole thing with a fine tooth comb and ascertain everything that you are being asked and then from that information build up the best way to implement the storage schema.

Actually I am not exactly sure at this stage what data would be involved for performance. So I just took one bit of data P just to know how that one parameter would be handled and it would give me the idea to handle other data.

That's not a good place to start.

 

 

...right now I am only interested in the minimum fields...

That's also not a good way to continue.

 

That said, as a bare minimum, not taking into account any data that I would believe to be pertinent to the process such as dates and reasons for decline etc. and just off the top of my head I would suggest something like this:

 

--------------------------------------------------

 

tblMgtStaff

UID - int, unsigned, auto_inc, PK

fName - varchar

lName - varchar

 

tblAuthUsers

AUID - int, unsigned, auto_inc, PK

MID - int, unsigned, FK(tblMgtStaff.UID).Update-Cascade.Delete-Restrict

authLevel - int, unsigned, FK (tblAuthLevels.AID).Update-Cascade.Delete-Restrict

authParent - int, unsigned, FK(tblMgtStaff.UID).Update-Cascade.Delete-Restrict

 

tblAuthLevels

AID - int, unsigned, PK

assingedLevel - varchar |Manager/Assistant Manager/Deputy Assistant Manager|

 

tblRecruited

RID - int, unsigned, auto_inc, PK

fName - varchar

lName - varchar

recrutedBy - int, unsigned, FK(tblMgtStaff.UID).Update-Cascade.Delete-Restrict

preAuthBy - int, unsigned, FK(tblMgtStaff.UID).Update-Cascade.Delete-Restrict

finalAuthBy - int, unsigned, FK(tblMgtStaff.UID) .Update-Cascade.Delete-Restrict

 

--------------------------------------------------

 

That should be enough to work out who works for who, present data based on authorisation level while maintaining branch segregation and trace through all steps from exec to manager.  It's not exact, but it should be enough to answer the question.

 

::Edit to fix quote

Edited by Muddy_Funster
  • Like 1

Hi Funster, thanks again for that quick response. The approval is hierarchical. It has to be approved by the AM or Dm which ever initiates the process and then go upward. 

 

All I need is to be able to design an efficient link back relation using one or more tables so that for example if M needs to look at the performance of the team of a given AM namely,

 

1. All DM;s of the AM and all executives of those DM's  // Complex

2. All executives of the AM

 

or in case of a DM    // Simple

1. all Executives of that DM

 

It should be easy to query the database.

 

Any help is appreciated. 

 

Thanks !

 

P.S.  I replied before your second reply. Will check that and revert again.

Edited by ajoo

Hi Funster, 

 

Thanks for the examples. I'll be grateful if you can explain a bit of what you have done. 

 

I don't understand for ex what's PK or FK  Please elaborate a little. 

 

 

All I need is to be able to design an efficient link back relation using one or more tables so that for example if M needs to look at the performance of the team of a given AM namely,

 

1. All DM;s of the AM and all executives of those DM's  // Complex

2. All executives of the AM

 

or in case of a DM    // Simple

1. all Executives of that DM

 

It should be easy to query the database.

 

Also maybe you could explain how I might be able to use these tables to accomplish the two examples quoted above from my last post.

Thanks loads.

PK = Primary Key

FK = Foreign Key

 

These are both fundamental parts of database design.

 

What you would do would be to relate between each table to build a structured response. I'll show you the simple one:

 

SELECT tblRecruited.fName AS FirstName, tblRecruited.lName AS LastName
FROM tblRecruited
LEFT JOIN
tblMgtStaff
ON
(
  tblMgtStaff.UID = tblRecruited.recruitedBy
)
WHERE
(
  UID = ?
)

 

That returns all the records that have the recruitedBy field set to whichever ID number you choose to enter as the ?

  • Solution

The problem of the above database layout is that it doesn't implement any of the rules, thus allowing nonsense data. For example, a single staff member with an arbitary rank (let's say the janitor) can recruit, pre-approve and approve an assistent manager.

 

You can try to fix this with lots of application-side checks, but then you may still end up with nonsense data. What if the data is inserted or edited directly? What if one day there's a bug in the ever-changing application code? You'll never know if the data you're relying on is actually valid.

 

I'd do this the other way round: Spend a lot of time on a proper database layout which will only accept correct data. This will in turn save you a lot of code and bugs.

 

I see three different recruiting cases:

  • The manager can recruit a person as an assistent manager or deputy manager
  • An assistent manager can suggest a person as a deputy manager or executive, which must then be approved by the manager
  • A deputy manager can suggest a person for an executive role, which must then be approved by both an assistent manager and the manager (or just the manager, I assume)

Possible relations:

  • staff(staff_id, last_name, first_name, ...)
  • assistant_managers(staff_id)
  • deputy_managers(staff_id)
  • applicants(application_id, last_name, first_name, ...)
  • manager_recruitments(applicant_id, role)
  • assistent_manager_recruitments(applicant_id, role, approved_by_manager)
  • deputy_manager_recruitments(applicant_id, role, approving_assistent_manager_id, approved_by_manager)

You can simplify the relations with views. The physical layout isn't necessarily the layout you have to use when accessing the data.

 

Note that the above data model allows an applicant to have multiple roles at once. If you want to prevent that as well, you'll need an additional table for all applications (which consist of an applicant and a role).

Edited by Jacques1
  • Like 1

@Jacques1 The OP doesn't have the relevant information to to a full and proper DB design.  

 

 

 

The problem of the above database layout is that it doesn't implement any of the rules, thus allowing nonsense data...

 

I would rather have stored procedures and BEFORE INSERT triggers for rule checks than a needlessly bloated table structure with arbitrary single-column tables. But I expect I'm about to be schooled in the fact that I'm wrong in taking that approach  :anim_reading:   ;D  >:)

Hi Funster and Guru Jacques. Thanks for the reply to you both.

 

@Funster : I have almost created  sample tables like you indicated and was studying the relations between them. I must admit that I have very little knowledge of good table designs. In fact I have been studying and trying to understand data normalization. I do feel that my tables need a lot of that. In fact I also wanted to ask how bad is it to have tables that are not normalized. is it a trade off on efficiency or can that bring down an application. It doesn't seem to but then like I admitted, I know very little of DB design.  

 

@ Guru Jacques : I haven't really understood what you were trying to point out to Funster. I have read your reply 3 times and I am still trying to understand it.  Please elaborate, if possible with a small example.

 

More than happy to be schooled by both of you :happy-04: ! Thank you both very much. 

I was just poking fun at @Jacques1 regarding a previous post we debated on. He is an incredible talent, but often comes across more abrasive than helpful so I like to get a jibe in when I can (which is rarely to be fair)  :tease-03:

 

Regards "Normalisation" : really it's a problem of scale.  When your only talking about a few hundred rows in a dozen or so tables it can be quite difficult to see the justification of why it is one of the most fundamentally important aspects of DB design.  However, once things grow - and they always do - it becomes exponentially more apparent why it's needed.

 

Once you start trying to query tables that have senseless column names used in obtuse partial relationships attached to tables with 50-60 columns of redundant data per row holding tens of millions of rows of data named something that is 6 characters long and "made sense at the time" you will quickly be reaching for the revolver.  

 

Not only does it make writing queries a new form of self-harm, but it also has a directly proportional impact on performance to the degree that I have seen a DB designed so badly by a third party in MS-SQL Server that queries which should have run in a few tens of seconds crashed out - caused a named pipe timeout triggered by the query taking more than 3 minutes to return a dataset from a five table relationship where the biggest one held a shake over one million records (which is really not much in DB terms).

 

TLDR: YES - Lack of Normalisation Can Break Your App

  • Like 1

Hi Funster ! 

 

Thanks for the reply.

 

I'll try and learn more on DB desgin and normalization. Did not really get what Guru Jacques was trying  to tell you. I am sure he will respond. But from your reply I guess he was hinting at maybe just a single table. I am not sure if my assessment of his reply is correct.er I am sure he'll revert in good time. 

 

Yes ! Guru Jacques is a multi-million dollar Guru ! Has been extremely helpful ! You too !

 

Thanks again Funster.

Edited by ajoo

You're quite welcome, and no, @Jacques1 was saying that you should use more tables, enforcing referential constraints to ensure that only relevant data is viable.  Each item in his list of possible relations is a table with the content of the parenthesis the relevant columns to build relations between the tables:

 


Possible relations:

  • staff(staff_id, last_name, first_name, ...)
  • assistant_managers(staff_id)
  • deputy_managers(staff_id)
  • applicants(application_id, last_name, first_name, ...)
  • manager_recruitments(applicant_id, role)
  • assistent_manager_recruitments(applicant_id, role, approved_by_manager)
  • deputy_manager_recruitments(applicant_id, role, approving_assistent_manager_id, approved_by_manager)

 

It's as I said, there's no way to give a definitive "This is how it should be done" without the full information. Both @Jacques1's method and mine have different ways of doing things and we could argue till the keys came off our keyboards which one is "right" but at the end of the day, like so much of development, there is a massive amount of subjective preference applied to every point of view.  That's why there is a need for best practices and why, even the best practices that do exist only stand as guidelines rather than referential rules. An analogy would be to say that "Cast concrete is the best material for building houses, from this day forward all houses must be built of this material." That dog just aint gonna hunt. 

  • Like 1

Did not really get what Guru Jacques was trying  to tell you.

 

What I'm saying is that your (or Muddy_Funster's) database design doesn't implement any of the rules. Anybody can recruit anybody, regardless of what rank they have or whether the recruitment has been approved by their manager (they can simply approve it themselves).

 

So where exactly are you planning to enforce the recruitment rules? In the PHP application? This doesn't really work (as explained above). With a database trigger as suggested by Muddy_Funster? That's theoretically possible, but it will be painful and difficult to understand.

 

The layout of a database should reflect the actual data you're storing. That's currently not the case. The table says that anybody can recruit anybody and then get anybody's approval. In other words, the lowest staff member could recruit an assistment manager and simultanously approve their own decision. This is obviously nonsense.

 

So it's not really about less tables vs. more tables. It's about the right tables.

  • Like 1
Hi all,

 

Generally when a table is split into many to enforce normalization won't it require too many inserts commands to 

insert that data into those tables? Won't that dictate the use of transactions, almost always? Is there a simpler 

way to insert the data simultaneously into multiple tables?

 

 

Specifically I have a table user_details (or you may say staff details )

 

user_details(uid, fname, lname, dob, gender, father_name, spouse_name, add1,add2,city,zip,state,country,email,cell,phone,user_ststus)

uid is the PK and all these fields come from the same form and go into this single table.

 

If I wish to keep the address (2 address fields, city, state, zip, country ) all in the same table then is there any

scope for this table to be normalized further?

 

And now, again if I understand correct, Guru Jacques suggests that I should use the uid to create 2 more tables for the 

assistant_manager(user_id) and the deputy_manager(user_id) and probably use those tables for approval of staff to enforce 

referential integrity.

 

I would also like to confirm if, in the table applicants(application_id, last_name, first_name, ...), 

 

applicant_id would also be the same as staff_id. (Because the user_details form is being submitted by the AM or DM after that person

has approved of him / her). The first leg of approval is complete since the form is being submitted. If this is wrong then what

should be the applicant id and from where should / would it originate?

 

Thanks all !

Edited by ajoo

Again, in my eyes, the obvious answer is triggers and stored procedures within the database.  But I get the impression that @Jacques1 was offering his solution instead of this approach, so I'm disinclined to suggest you hybridise (or bastardise - depends on your point of view) the methodologies.  I'm sure @Jacques1 will elaborate further on how to properly solve your issue.

Generally when a table is split into many to enforce normalization won't it require too many inserts commands to 

insert that data into those tables? Won't that dictate the use of transactions, almost always? Is there a simpler 
way to insert the data simultaneously into multiple tables?

 

I'm not aware of any complex multi-table insertions required in the above layout. Can you give an example? Most tables cover mutually exclusive cases, so they are by definition not used together.

 

Generally speaking: In a non-trivial application, it's perfectly normal and unavoidable to handle multiple tables. That doesn't mean you need transactions all the time, because not all insertions actually happen simultaneously.

 

If you do need to insert data into multiple tables at the same time, a transaction is the simplest way. All you have to do is start the transaction and then either commit or rollback, depending on whether the action succeeded or not.

 

 

 

Specifically I have a table user_details (or you may say staff details )

 
user_details(uid, fname, lname, dob, gender, father_name, spouse_name, add1,add2,city,zip,state,country,email,cell,phone,user_ststus)
uid is the PK and all these fields come from the same form and go into this single table.
 
If I wish to keep the address (2 address fields, city, state, zip, country ) all in the same table then is there any
scope for this table to be normalized further?

 

Why are there two addresses but only one city, state etc.? What exactly is the address?

 

 

 

And now, again if I understand correct, Guru Jacques suggests that I should use the uid to create 2 more tables for the 

assistant_manager(user_id) and the deputy_manager(user_id) and probably use those tables for approval of staff to enforce 
referential integrity.

 

Yes.

 

 

 

I would also like to confirm if, in the table applicants(application_id, last_name, first_name, ...), 

 
applicant_id would also be the same as staff_id. (Because the user_details form is being submitted by the AM or DM after that person
has approved of him / her). The first leg of approval is complete since the form is being submitted. If this is wrong then what
should be the applicant id and from where should / would it originate?

 

Whether (pre-approved) applicants and staff members can share the same table depends on the data you want to store.

 

Normally, actual staff members require more information than a person who has merely applied for a job. But that may be different in your case.

Hi Funster and Guru Jacques,

 

Thanks for taking out time to reply, 

 

 

 

 

I'm not aware of any complex multi-table insertions required in the above layout. Can you give an example?

I asked that generally, but once I get implementing this, I'll get back with any examples that I come across.

 

 

 

Why are there two addresses but only one city, state etc.? What exactly is the address?

 

Those are two lines for the #xxx street and maybe landmark. A person can use one or both. Maybe I should use just one text field for that.

 

 

 

  • assistent_manager_recruitments(applicant_id, role, approved_by_manager)
  • deputy_manager_recruitments(applicant_id, role, approving_assistent_manager_id, approved_by_manager)

 

Should the the field approved_by_manager In the AM table, and the fields approving_assistent_manager_id, approved_by_manager  in the DM table be the actual AM_ID and DM_ID from their respective AM, DM tables or should they be like a 0,1 flag fields

If they should be the AM_ID and DM_ID respectively( which is what I feel they are) shouldn't there be another field like recruited (a recruitment status field) which is a 0,1 flag field indicating that person is recruited or not ? 

 

Thanks again Funster and Guru Jacques.

Those are two lines for the #xxx street and maybe landmark. A person can use one or both. Maybe I should use just one text field for that.

 

You should split the data into atomic values (the first normal form). If there's a street and a landmark, there should be a street attribute and a landmark attribute. Additional data (e. g. the house number) also belong into separate attributes.

 

The reason is simple: Once you've put multiple pieces of data into a single field, it's very difficult or even impossible to extract the individual parts again.

 

 

 

Should the the field approved_by_manager In the AM table, and the fields approving_assistent_manager_id, approved_by_manager  in the DM table be the actual AM_ID and DM_ID from their respective AM, DM tables or should they be like a 0,1 flag fields

If they should be the AM_ID and DM_ID respectively( which is what I feel they are) shouldn't there be another field like recruited (a recruitment status field) which is a 0,1 flag field indicating that person is recruited or not ?

 

The approved_by_..._manager_id fields should be the actual staff member IDs so that you know who recruited the person. If the manager is always the same person or you don't care who he/she is, the approved_by_manager field can be a simple boolean.

 

Whether or not a person is recruited can be derived from the approved_by_... fields and your rules. You can store the result in an extra field if you think that simplifies your application, but it's not necessary for the data itself.

  • Like 1
Hi Guru Jacques, 

 

I have the following table 

user_detail(uid (PK, AI), reg_date, role, fname, lname, dob, age, gender, fathername, spouse, add1, add2,

city, state, country, zip, email, cell, phone, user_status)

 

I was using a single query to insert these values in the table as follows 

 

$query = "INSERT into user_detail(reg_date, role, fname, lname, dob, age, gender, fathername, spouse, add1, add2,

city, state, country, zip, email, cell, phone, user_status)

VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)";

 

Now I am splitting the table user_detail as follows:

 

user_detail(uid(PK) ,fname,lname,dob,reg_date)

user_add(uid(FK), house_no, add_street, add_landmark, street,city, zip, state)

user_contact(uid(FK), email, phone, cell)

 

Since all these values are being spewed by a single form and must be stored simultaneously would i not have to use a 

transactional query for inserting values into the 3 different tables? So I think this is the first example of a multiple 

insert that I was talking about in the earlier message. 

 

So now I think that a transactional query is the best way to go about inserting the data in these tables. Or is there

another easier way to form an insert query to accomplish this?

 

Thank you very much !

Hi Guru Jacques, 

 

Why am i splitting the database !? Good question sir !! :happy-04::confused:  

 

Just because I thought that it could be segmented further and normalized. you know like separating the address from the contacts and so on. Do you think I need not split it !? is it normalized as it is and needs no improvement? I'll  be happy to let it be !! I can start working on the tables you suggested.

 

Thank you.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.