Jump to content

Display list of records ordered by frequency of use


Recommended Posts

I have several forms which will request the user to select one of several options where the most common choices will be pronounced, and might look something like:

  • Select Option A
  • Select Option B
  • Select Option C
  • Select Option D
  • Click here for more choices

The client is responsible to determine which ones to display (opposed to hiding within "more choices") and the server is responsible to provide the list of options along with some means to determine which ones are most common.

For a new user, the order will be based on my "best guess", however, whenever a user selects a given option its ranking for that user only should go up, and in addition more recent activities should have more bearing on the returned order.  In addition, there will be more than one of these lists and each of these lists will expose a public identifier (most will use a string but some might use an incrementing integer series) instead of the primary key.

I have some fragmented ideas how to best implement but would like other people's thoughts on how to best do so.

Starting with the list schema, undecided on which approach below.  Any preferences or something else altogether?

Solution 1:

abstract_list
- id (INT PK)
- discriminator (string)
- name (string unique with discriminator)
- ranking (TBD whether float or integer. This represents my initial "best guess")

list1 (list2, etc)
- id (FK to abstract_list.id)
- discriminator (FK to abstract_list.discriminator)
- public_id (string or integer based on specific list. unique with discriminator)

Solution 2:

abstract_list
- id (INT PK)
- discriminator (string)
- name (string unique with discriminator)
- ranking (TBD whether float or integer. This represents my initial "best guess")
- public_id (Application responsible to cast as string or integer. unique with discriminator)

list1 (list2, etc.  This class has no properties and only a method to cast the public ID as required.  Seems like a waste...)
- id (FK to abstract_list.id)


I will then add another table to join each list to the user.

user_list_ranking
- abstract_list_id
- user_id
- ranking (TBD)

To retrieve the list options, will query abstract_list and LEFT JOIN user_list_ranking and then sort on some mathematical formula acting on abstract_list.ranking and COALESCE(user_list_ranking.ranking, 0).

To promote recent activity, maybe increment user_list_ranking.ranking every time the user selects a given option and also decrement all other options of the associated list.  To prevent excessive windup, cap the the value to some max value (i.e. If list1 has a total of 10 records, cap it three times that value resulting in 30), and cap the minimum at zero.  Uncharted territory here and am very open to other approaches.

I am undecided whether I should just return an ordered array of the options, or also include either an order integer or some normalized value from 0 to 1.

If there is a common way to do this, please let me know what it is so I may mimic it.  If not, please provide any critique or recommended changes to my intended approach.

Thanks

Link to comment
Share on other sites

40 minutes ago, requinix said:

Doesn't the ranking thing conflict with the which-ones-to-display thing?

No, the ranking thing (AKA order) defines which options are most often used.  The which-ones-to-display thing dictates whether to display the top 5 (etc) most common or potentially only those that have a ranking above a given value.

Link to comment
Share on other sites

So the client doesn't pick which ones they want shown, they only pick how many ranked ones and/or how highly ranked something is to be shown in the list? In other words it's just a UI customization feature that I'd argue isn't even a thing that clients should have to/be able to bother with?

But to the question. Sounds like you've decided that the "ranking" is essentially just a usage counter? +1 every time someone uses it?

Link to comment
Share on other sites

51 minutes ago, requinix said:

So the client doesn't pick which ones they want shown, they only pick how many ranked ones and/or how highly ranked something is to be shown in the list?

Yes.  There might be 50 options but a typical user will often use the same choices as they had in the past so why make them sort through all the other choices which they don't want.

 

53 minutes ago, requinix said:

In other words it's just a UI customization feature that I'd argue isn't even a thing that clients should have to/be able to bother with?

If not the client, then who?  An API server provides an array 50 objects which represent possible options and each object includes a name, identifier, and ranking property, and a web app or phone app needs to be written in order to present the choices to the user.  Maybe my poor choice on terms, and did not mean to imply that the "client" was a HTTP client but some application that uses the data.

59 minutes ago, requinix said:

But to the question. Sounds like you've decided that the "ranking" is essentially just a usage counter? +1 every time someone uses it?

My only goal is to present a limited number of choices which have high likelihood of being their desired choice, and believe that a usage counter is likely part of the solution.  I don't think a simple counter will work as if someone's favorite colors used to be blue, green, and red, but then a year later it changed to yellow, purple, and brown, it would take a long time before their favorite colors become the most prominently displayed colors.   Regarding "someone", these rankings are specific to a given user and the server will receive a request with includes the user ID.

Link to comment
Share on other sites

19 minutes ago, NotionCommotion said:

I don't think a simple counter will work as if someone's favorite colors used to be blue, green, and red, but then a year later it changed to yellow, purple, and brown, it would take a long time before their favorite colors become the most prominently displayed colors. 

Then just count their usage over the last N months.

Link to comment
Share on other sites

5 minutes ago, Barand said:

Then just count their usage over the last N months.

I originally thought of doing so but it sounds kind of tricky.  How would you propose doing so?

Not saying that my idea to cap maximum counts to some given value and to decrement other choices other than the chosen option is any easier, but at least I don't care about the start and end of some floating timeframe.

Another thought which also isn't any easier is to save a timestamp with the ranking on a per user basis.  Then when the user selects the same option the next time, the previous value is downgraded based on the time between the timestamp and present time and then upgraded based on it being selected.  Probably a bad idea as I need to go though this whole process just to provide the choices...

Link to comment
Share on other sites

38 minutes ago, NotionCommotion said:

How would you propose doing so?

thus...

+----------------+               +-------------+
|   mydata       |               |  option     |
+----------------+               +-------------+
| user_id        |        +------| option_id   |
| option_id      |>-------+      | option_name |
| date_saved     |               +-------------+
+----------------+

SELECT d.option_id
     , o.option_name
     , count(*) as tot
FROM mydata d
     JOIN option o USING (option_id)
WHERE user_id = ?
      AND date_saved > CURDATE() - INTERVAL 12 MONTH
GROUP BY d.option_id
ORDER BY tot DESC

EDIT:

On second thoughts, make that

FROM option LEFT JOIN mydata

Edited by Barand
Link to comment
Share on other sites

You could store usage stats in a Redis hash by user ID, option ID, and number of times selected. Get the top n option IDs per user and grab those from the database. Depending on how often your users interact with the system, you may not even have to crank up the TTL for the hash.

Link to comment
Share on other sites

19 hours ago, Barand said:

thus...

Ah, wasn't thinking of saving each event in the DB.  Seems excessive considering I don't really need the data other than for this purpose but data is cheap.  Suppose I could get even tricker and do something like SUM(EXP( $const * (date_saved - NIX_TIMESTAMP()))). 

Link to comment
Share on other sites

1 hour ago, maxxd said:

You could store usage stats in a Redis hash by user ID, option ID, and number of times selected. Get the top n option IDs per user and grab those from the database. Depending on how often your users interact with the system, you may not even have to crank up the TTL for the hash.

Thanks maxxd,  wasn't considering Redis.  I should more often.  I like how dealing with the life of the events is pretty much automatically taken care of by using TTL.  Potential drawback will be lack of persistence upon restarting Redis, but will highly consider using Redis.

Link to comment
Share on other sites

6 hours ago, Barand said:

At the moment, then, they select an option but you don't store it with any data anywhere?

No, I save only the last selected value.  Potentially, I would have an lastUpdatedTimestamp property but would not store the history unless there were compelling audit trail reasons to do so but this is just for ux.

Link to comment
Share on other sites

8 hours ago, NotionCommotion said:

Potential drawback will be lack of persistence upon restarting Redis, but will highly consider using Redis.

Not saying it's a perfect use case, but it does lend itself fairly well. Persistence is always a consideration with Redis, but honestly with ElastiCache for Redis and the like, restarts aren't that prevalent in my experience. That having been said, my last job abused the crap out of Redis so my attitude may be a bit skewed by Stockholm syndrome...

Link to comment
Share on other sites

Basically, business logic was designed and built around data being in the store as if it was persistent. Redis was treated as a first-class database and there were portions of logic that would fail catastrophically if the node was unavailable. Technically possible (obviously) just terrifyingly fragile IMO.

Link to comment
Share on other sites

On 7/18/2021 at 3:59 PM, maxxd said:

... Persistence is always a consideration with Redis, but honestly with ElastiCache for Redis and the like, restarts aren't that prevalent in my experience. ...

I looked into ElastiCache.  Looks like it is only available on AWS servers.  Correct?

Assuming so, looking for other options.  I would like to have the redis DB saved on disk whenever the redis service is stopped or the server is rebooted.

First started looked through /etc/redis.conf and read about snapshotting, but didn't seem applicable.  Then looked at redis cluster (thinking it was similar to ElastiCache) but it is experimental and also don't think applicable.

Next I did a reality check and stored a new value in redis and found that it persisted after both restarting the service and rebooting the server.  I didn't expect this and was pleasantly surprised.

Is it expected behavior for redis to save on disk? What is ElastiCache (or similar) doing and why is is required? Where is this functionally configured (I expect somehow using ExecStart=/usr/bin/redis-server /etc/redis.conf --supervised systemd and ExecStop=/usr/libexec/redis-shutdown located in /usr/lib/systemd/system/redis.service and some setting(s) in /etc/redis.conf but am not sure)?

Thanks

Link to comment
Share on other sites

To tell you the truth, I thought rebooting a server would destroy the cache - I think it used to maybe? Or maybe I'm just remembering incorrectly. At any rate, yes ElastiCache is strictly AWS though I think everyone has their own versions. It's also easy to get mixed up because AWS also has it's own ElasticSearch implementation that isn't called ElastiCache and isn't related in any way. Because apparently they ran out of 2 and 3 letter abbreviations to use...

Anyhow, I wasn't involved in setting it up but from the AWS console it looked to me like ElastiCache basically is a Redis cluster. And though I disagree with how we used Redis in the past, I will say there were only a handful of times it fell out of sync or lost data, and those were due to bugs in the code and not server or service failure. And if you're using it as intended it really is just a cache, so if you do lose something it shouldn't be a catastrophic loss - in my case, it was several weeks to a month or so of playing catch-up and rehydrating accounts that were affected but we didn't know until the user complained.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.