Jump to content

comparing arrays


highrise

Recommended Posts

Hi Guys

 

I'm quite new to PHP and I have a question. I have a MySQL database which has a load of musicians on it. Each musician has a field that stores their influences, which are separated by commas, e.g: The Beatles, The Beach Boys, Kiss etc etc.

 

Now, what I would like to do is start with the influences of one musician, and then compare it to all the others to find the closest matches - i.e find the musicians that have (say) more than three influences i common. This would then allow you to see which musicians had the most influences in common with the primary one, the one you started with.

 

Here's what I have in mind, it seems a little laborious - load the influences of the primary musician into an array using the explode command. Then step through each of the other musicians in a loop, each time putting their influences in to an array and checking to see how many matches there are between the two arrays. Whenever there is at least one match, the name of the musician and the number of matches would be put into a third array. This array would then be sorted at the end, so that the musicians with the most matches would be at the top, and the script could then display them.

 

This would probably work, but if looking at several hundred or several thousand musicians, it would surely be quite slow - is there a better way? I don't particularly want to use a separate, normalised table at this point - I know it would be the quickest way but it would involve a lot of restructuring for me. I know that PHP has several clever inbuilt routines that might do this job for me, so if anyone can suggest anything I'd appreciate it.

 

Thanks in advance for the help!

Link to comment
Share on other sites

 

Here's what I have in mind, it seems a little laborious ...*snip*[Yep]-  .... is there a better way? [Yep, use a separate table] I don't particularly want to use a separate, normalised table at this point - I know it would be the quickest way [Yep] but it would involve a lot of restructuring for me. [Whoops]

 

I'd recommend making the separate table, recoding, and chalking up the experience to planning ahead next time.

 

However, if you still really want to keep that structure (and you probably do), I'd recommend against pulling all the data into PHP to analyze it.  You should just grab enough to manufacture a query and use that to get results.

 

(This code is a quick exercise in concept; you'll need to flesh it out with error checking, formatting, etc.)

<?php
// Find current user's influences
$influences = mysql_result(mysql_query('SELECT influences FROM users WHERE id=' . $id),0);

// Assemble a query by breaking the influences string into seperate "FIND_IN_SET('artist',influences) OR..." conditions
$result = mysql_query("SELECT id,username,influences FROM users WHERE FIND_IN_SET('" . str_replace(',',"',influences) OR FIND_IN_SET('",$influences) . "',influences) AND id != $id GROUP BY id HAVING COUNT(*) >= 3 ORDER BY COUNT(*) DESC");

// Get those results
if ($result && mysql_num_rows($result)) while ($row = mysql_fetch_assoc($result)) echo "$row[username] $row[influences]<br />\n";

?>

 

 

Otherwise, with a table containing userid and a single influence in a row you could use a query such as:

SELECT b.uid,users.username,GROUP_CONCAT(DISTINCT b.artist SEPARATOR ', ') AS MatchingInfluences
FROM influences a JOIN influences b ON a.uid!=b.uid AND a.artist=b.artist AND a.uid=$id JOIN users ON b.uid=users.user_id
GROUP BY b.uid HAVING COUNT(*) > 1;

Link to comment
Share on other sites

Another problem you will have is if these artist are user-inserted, that is you have musicians typing in the names of bands.  That means that there will be someone who likes "Guns 'n Roses" who will never find someone who likes "Guns & Roses" or the person who loves "GnR".  Not to mention the goober who can't spell: "gunz and rosses".

 

Not that I have a solution, but you should be aware of that.  :)

Link to comment
Share on other sites

one thing I was wondering about though - if I decide to go for a separate table (well, two tables, one for the influences, the other for the relationships, i.e user id / influence id), how would I go about converting them? I have about 3,000+ records all with the data in this comma delimited format.

 

thanks

Link to comment
Share on other sites

Not too hard; you could write a PHP script to do it.  In fact, I'm not sure you could do it in MySQL purely.  Since it's a one-time conversion, don't worry about performance or anything.  You can write a query to read in the data, explode it, check if the influence is already inserted, if not insert it, if so get id, insert uid/influence_id.... etc.  Back up first.

Link to comment
Share on other sites

thanks, I'll look into that. I probably don't need to back up if I leave the original data intact.

 

One thing though, I will probably make the shift over to a table, but in the meantime I am having trouble with this line:

 

$result = mysql_query("SELECT id,username,influences FROM users WHERE FIND_IN_SET('" . str_replace(',',"',influences) OR FIND_IN_SET('",$influences) . "',influences) AND id != $id GROUP BY id HAVING COUNT(*) >= 3 ORDER BY COUNT(*) DESC");

 

I have highlighted in bold where I think the problem lies - any ideas? Thanks again

 

 

Link to comment
Share on other sites

This snippet:

<?php

$influences = 'King\'s X,Spock\'s Beard,Yoko Ono';
$query = "SELECT id,username,influences FROM users WHERE FIND_IN_SET('"
. str_replace(
	',',
	"',influences) OR FIND_IN_SET('",
	addslashes($influences))
. "',influences) AND id != $id GROUP BY id HAVING COUNT(*) >= 3 ORDER BY COUNT(*) DESC";

echo "$query\n";

?>

 

Yields this output:

SELECT id,username,influences FROM users WHERE FIND_IN_SET('King\'s X',influences) OR
FIND_IN_SET('Spock\'s Beard',influences) OR FIND_IN_SET('Yoko Ono',influences) AND id !=  GROUP BY id
HAVING COUNT(*) >= 3 ORDER BY COUNT(*) DESC

 

I added addslashes() in case of apostrophes.  You probably should use mysql_real_escape_string() there.  So, in other words, it seems to be working for me.  What errors/problems are you finding?

Link to comment
Share on other sites

it's working now, I was having a few minor problems replacing a couple of field names, it's good! thanks. One thing I am struggling a little bit with though, is how I would fit the MySQL 'Trim' command into that string - sometimes the influences have leading spaces, other times they don't, I'd like to remove them.

Link to comment
Share on other sites

You should have the "clean up" in PHP before inserting the data into MySQL -- trim whitespace, mysql_real_escape_string, etc.

 

If you have "  bandname;bandname2;bandname3  ", you can use

UPDATE users SET influences = TRIM(influences)

and you'll get "bandname;bandname2;bandname3"

 

If you have "bandname  ;  bandname2;  bandname3", I don't think you can fix that in MySQL; you'll have to do a one-time PHP cleaner function (take them out, use preg_replace to find the whitespace around the semi-colon, write the values again), and make changes to your usual PHP to clean the names before they are put in a string and inserted.

 

Well, maybe you could do it all in MySQL, but it'd be a pain if there are alot of spaces.

UPDATE users SET influences = TRIM(influences);
# Repeat each of these next two lines until no rows match
UPDATE users SET influences = REPLACE(influences,' ;',';') WHERE influences LIKE '% ;%';
UPDATE users SET influences = REPLACE(influences,'; ',';') WHERE influences LIKE '%; %';

Link to comment
Share on other sites

it's not a lot of spaces, it's just that sometimes people input the influences like this:

 

beatles, rolling stones, the who

 

and other times like this

 

beatles,rolling stones,the who

 

so when it does the query, it does this: ' beatles', instead of 'beatles'

 

(use of 'the' is a whole other problem of course)

 

What I'm finding is that it is working, but sometimes you look at one users influences, and you might get one user listed, but if you look at that user, the first user is not listed as having it in common.

 

I really appreciate the help by the way, thanks for taking the time, I'm learning as I go along  ;D

Link to comment
Share on other sites

I added the following line:

 

$influences =str_replace(", ",",",$influences);

 

which removes the leading spaces after the commas. But now it only picks up those users with influence lists without commas. The problem here is this:

 

beatles, rolling stones, the who

 

is actually "beatles"," rolling stones","the who"

 

and so is different from this:

 

rolling stones, the beatles, the who

 

since this is actually "rolling stones"," beatles","the who"

 

if one user wrote this one way and one the other, the only match it would find would be 'the who'. I need to remove the leading spaces from the SQL query. I don't really want to change the actual data held in the record though, as this is used for display purposes.

 

Link to comment
Share on other sites

ok, I think I know how to solve the problem, but I can't quite figure out how to do it. Within the string replace function, I need to set it up so that it strips the original influences and then looks, for example, for both '$influences' and ' $influences' (with a leading space). In your example, this would come out like this:

 

SELECT id,username,influences FROM users WHERE FIND_IN_SET('King\'s X',influences) OR FIND_IN_SET(' King\'s X',influences) OR

FIND_IN_SET('Spock\'s Beard',influences) OR FIND_IN_SET(' Spock\'s Beard',influences) OR FIND_IN_SET('Yoko Ono',influences) OR FIND_IN_SET(' Yoko Ono',influences) AND id !=  GROUP BY id

HAVING COUNT(*) >= 3 ORDER BY COUNT(*) DESC

 

see what I mean? There should be a way of putting this into the code, but I have not quite managed it yet.

Link to comment
Share on other sites

ok, I got it to work - the final query was :

 

$query = "SELECT id,name,influences FROM users WHERE ((FIND_IN_SET('"

. str_replace(

',',

"',influences) OR FIND_IN_SET('",addslashes($influences)). "',influences) OR (FIND_IN_SET('"

. str_replace(

',',

"',influences) OR FIND_IN_SET(' ",addslashes($influences)). "',influences)))) AND id <> '$id' GROUP BY features.id HAVING COUNT(*) >= 3 ORDER BY COUNT(*) DESC";

 

I added second line to the query. It strips out the original influences of all leading spaces, and then looks for occurences of each influence, with out without a leading space. Success!

 

 

Link to comment
Share on other sites

You shouldn't be using a "static" query like that.  In fact, it shouldn't work at all.....  can you print out the $query there and post?  And what do you mean by "it works one way but not the other?"  MySQL isn't case sensitive by default; you have to use the BINARY operator to force it.

 

I'm going to change something in the SQL -- I benchmarked using FIND_IN_SET vs. LIKE '%%', and the LIKE was about 30% faster.

 

<?php

$influences = 'King\'s X,Spock\'s Beard,Yoko Ono';
$query = "SELECT id,username,influences FROM users WHERE influences LIKE '%"
. str_replace(
	',',
	"%' OR influences LIKE '%",
	mysql_real_escape_string(preg_replace('/\bthe /i','',$influences))
. "%' AND id != $id GROUP BY id HAVING COUNT(*) >= 3 ORDER BY COUNT(*) DESC";

echo "$query\n";

?>

(output:)
SELECT id,username,influences FROM users WHERE influences LIKE '%King\'s X%' OR influences
LIKE '%Spock\'s Beard%' OR influences LIKE '%Yoko Ono%' AND id !=  GROUP BY id
HAVING COUNT(*) >= 3 ORDER BY COUNT(*) DESC

 

The str_replace replaces each comma with part of an SQL clause, making a string of "influences LIKE '%...%'" comparisons.  You won't have to worry about spaces with the LIKE query, unless a user double spaces within a band name.

Link to comment
Share on other sites

Correction.  My code is missing a parenthesis.  I'm also going to just throw everything in a preg_replace.  That'll take care of spaces, a leading "the", and insert the "OR influences..." part.

 

<?php

$id = 1;
$influences = 'King\'s X,Spock\'s Beard,Yoko Ono,The Beatles,Thespians';
$query = "SELECT id,username,influences FROM users WHERE influences LIKE '%"
. preg_replace(
	array(
		'/\bthe /i',
		'/\s*,\s*/'
	),array(
		'',
		"%' OR influences LIKE '%"
	),
	mysql_real_escape_string($influences))
. "%' AND id != $id GROUP BY id HAVING COUNT(*) >= 3 ORDER BY COUNT(*) DESC";

echo "$query\n";

?>

Link to comment
Share on other sites

  • 2 weeks later...
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.