Jump to content

Returning results in order of relevence


stubarny

Recommended Posts

Hi All,

 

I need to create a job search tool for the below table of job adverts:

 

JobAdvertNumber | JobAdvertTitle

1 | Accountant

2 | Management Accountant

3 | Financial Accountant

4 | Management Accountant

5 | Accountant

6 | Financial Accountant

 

I also have a big (millions of records) table of job title correlations, for example:

 

JobTitle1 | JobTitle2 | Correlation

Accountant | Financial Accountant | 0.76

Accountant | Management Accountant | 0.84

 

 

Please could you tell me how to create a job search tool that returns the adverts in order of their job title's correlation with the searched for term. For example if I did a job search for "Accountant" it should return all the "Accountant" jobs first, then all the "Management Accountant" jobs and finally all the "Financial Accountant" jobs.

 

Many thanks,

 

Stu

Link to comment
Share on other sites

That's a basic foreign key relation, and if you're not sure on how to set that up in MySQL then I recommend searching for it. It's the cornerstone of relational database management systems, after all. It's also a part of normalizing the data, as you don't want to duplicate data in the database, as you've done here. That's why you utilize an ID field as the relation: Less chance of errors creeping into your data.

 

And no, it doesn't have anything to do with aliases. At least not per se, but aliases are often used in JOINs to shorten down the finished query and/or make it easier to read.

Link to comment
Share on other sites

sorry, I didn't realise you meant foreign keys. But using your methodology I could enter 1 or 5 for "Accountant" which isn't normalised?

 

Let me try again, this time adding an extra table for the "keyphrases" and then referring to it with foreign keys:

 

Keyphrases:

KeyphraseID | KeyphraseContent

1 | Accountant

2 | Management Accountant

3 | Financial Accountant

 

JobAdverts:

JobAdvertID | JobAdvert_KeyphraseID

1 | 1

2 | 2

3 | 3

4 | 2

5 | 1

6 | 3

 

I also have a big (millions of records) table of job title correlations, for example:

 

Correlations

CorrelationID | CorrelationKeyphrase1_KeyphraseID | CorrelationKeyphrase2_KeyphraseID | CorrelationContent

1| 1 | 3 | 0.76

2| 1 | 2 | 0.84

 

 

Is that more useful?

 

I think I need an SQL query to:

1. Find KeyphraseID for the searched for keyphrase

2. Select all records in the JobAdverts table

3. Attach CorrelationContent to each record in the JobAdverts table --> (done by finding CorrelationContent where KeyphraseID of the search term = CorrelationKeyphrase1_KeyphraseID and JobAdvert_KeyphraseID = CorrelationKeyphrase2_KeyphraseID)

4. Sort by CorrelationContent DESC

 

Please could you show me how to create an SQL query to do that?

 

Thanks,

 

Stu

Link to comment
Share on other sites

I know it is implied but I think it would simplify your queries if you had

 

Correlations

CorrelationID | CorrelationKeyphrase1_KeyphraseID | CorrelationKeyphrase2_KeyphraseID | CorrelationContent

  1          |          1                        |              3                    |        0.76

  2          |          1                        |              2                    |        0.84

  3          |          1                        |              1                    |        1.00          <- add this

Link to comment
Share on other sites

Use your "keyphrases" table to generate a dropdown search menu so that the KeyphraseID is passed to the search query (in this case 1 for Accountant). This is more reliable than someone entering "Acountant" (mis-spelling deliberate to make the point).

 

All you need to do then is join correlations to adverts on keyphrase2 = jobAdvert_keyphraseID and WHERE keyphrase1 = 1 (in this case) and order by CorrelationContent DESC.

 

I won't write the whole query as I hate using essays for column names.

 

I'd also extend the correlations table like below so that id Accountant is selected it finds the other accountant types, but if Management Accountant is selected it just finds those

Correlations

CorrelationID | CorrelationKeyphrase1_KeyphraseID | CorrelationKeyphrase2_KeyphraseID | CorrelationContent

1| 1 | 3 | 0.76

2| 1 | 2 | 0.84

3| 1 | 1 | 1.00

4| 2 | 2 | 1.00    <- add also

5| 3 | 3 | 1.00    <- add also

 

Link to comment
Share on other sites

In addition to what Barand wrote, I'd also drop the "CorrelationID" field. The Unique ID, and thus primary key, for each correlation is combination of the two keyphrase IDs. After all, you only want one instance of each possible combination, right? ;)

 

There's also a by-sentence in Barands last post which I think you should strongly consider as well, namely the fact that you replicate information in your fieldnames as well. There's no need to name the fields after the table which you find them in, as you always reference the table in the queries anyway. All that it does is waste space and time, plus make it harder to read the queries.

For example, this is how I'd name the fields in the "Correlations" table:

keyphrase_1_id, keyphrase_2_id, relevance

 

Thus, when referencing them, you'd have:

correlation.keyphrase_2_id, correlation.relevance

I think you'll agree that it's much better than:

Correlations.CorrelationKeyphrase2_KeyphraseID, Correlations.CorrelationContent

Link to comment
Share on other sites

Hi,

 

Thank you very much for your help - I've tidied up the field names (I've left the initial of the table name just so that the php variables are ready-named when I retrieve them out of sql), and I've removed the ID field on the correlations field as suggested.

 

 

Keyphrases:

kID | kContent

1 | Accountant

2 | Management Accountant

3 | Financial Accountant

 

JobAdverts:

jaID | ja_kID

1 | 1

2 | 2

3 | 3

4 | 2

5 | 1

6 | 3

 

Correlations:

cKeyphrase1_kID | cKeyphrase2_kID | cContent

1 | 3 | 0.76

1 | 2 | 0.84

1 | 1 | 4 (my correlations are 'relative' instead of absolute therefore a good correlation often has a value greater than 1 - sorry, not my source data!)

 

Please could you give me a hand creating SQL code for the below tasks?:

 

I think I need an SQL query to:

1. Find kID for the searched for keyphrase

2. Select all records in the JobAdverts table

3. Attach cContent to each record in the JobAdverts table --> (done by finding cContent where kID of the search term = cKeyphrase1_KeyphraseID and JobAdvert_KeyphraseID = cKeyphrase2_KeyphraseID)

4. Sort by cContent DESC

 

Is the below code is close to the correct answer? Is there a way of finding the relevant value for kID and using the value all in the same query, instead of splitting this into 2 queries like below?

 

SELECT kID from Keyphrases WHERE kContent = '$searched_for_keyphrase'
# extract result to variable $kID

SELECT * from JobAdverts INNER JOIN Correlations ON (Correlations.cKeyphrase1_kID=$kID AND Correlations.cKeyphrase2_kID=JobAdverts.ja_kID) ORDER BY JobAdverts.cContent DESC

 

Many thanks,

 

Stu

Edited by stubarny
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.