SELECT query - apply LIMIT condition to unique items only

stubarny · January 29, 2012

Hello,

Please could you show me how to find the 7th unqiue website_page_name from the table website_page_names?

The code below finds the 8th website_page_name but includes duplicates, I'd like to only count through unique website page names - please could you point me in the right direction?

$query ="SELECT website_page_name FROM website_page_names ORDER BY website_page_name ASC LIMIT 7, 1";

Many Thanks,

Stu

AyKay47 · January 29, 2012

the query provided specifies that only 1 row should be returned, so I am confused as to how you are receiving multiple rows?

mikosiko · January 29, 2012

you maybe want to GROUP BY and select those elements with count() = 1 and then LIMIT your results starting in 6

stubarny · January 30, 2012

Thanks guys,

Just to clarify - I only want to return the Nth record (only one record) but I want to operate the LIMIT function by counting only unique rows (therefore completely ignoring duplicates).

Therefore if I had a list of colours below the 3rd unique row is "pink".

blue

green

pink

you maybe want to GROUP BY and select those elements with count() = 1 and then LIMIT your results

thanks mikosiko - please could you tell me if this would work quickly for a large database (>10 million records)?

mikosiko · January 30, 2012

having in mind your data example, you don't need to use GROUP BY nor count() (unique has not the same meaning as distinct)

for your objectives just use DISTINCT and LIMIT (starting at 6 or whatever you need)

fenway · January 30, 2012

DISTINCT will be very slow on large datasets.

mikosiko · January 30, 2012

DISTINCT will be very slow on large datasets.

yes, I agree

AyKay47 · January 30, 2012

for your objectives just use DISTINCT and LIMIT (starting at 6 or whatever you need)

mikosiko · January 30, 2012

for your objectives just use DISTINCT and LIMIT (starting at 6 or whatever you need)

and your point is?

stubarny · January 30, 2012

Thanks guys.

OK I perhaps i need to rethink my database design if this is going to get slow over time.

I have a database made up of website addresses and company names. Everytime a user submits a company name in a form they are also asked to submit the website address. (this is to make sure that the websites are automatically kept up to date).

So whenever I need to find the latest website address for a company I just refer to the match with the latest timestamp.

Please could you tell me the best way of going about this? How about using a cronjob to delete the duplicates (but I don't like deleting data in case of errors)? Or is there something like a 'DISTINCT' index type in MYSQL?

Thanks for your help,

Stu

mikosiko · January 30, 2012

So whenever I need to find the latest website address for a company I just refer to the match with the latest timestamp.

How the objectives that you just posted are relate to your first one?

... show me how to find the 7th unqiue website_page_name from the table website_page_names?

both seems very different to me.

what problem do you have using the timestamp?, and moreover is any reason why you are storing more than the latest website for a company?

for better help please post your tables and the code that you have currently.

BTW: To clarify... As fenway said DISTINCT is very slow on large datasets but, when used with LIMIT it is not critical (obviously assuming that your LIMIT's values are small), as an example: I do have a table with around 21 millions records and a SELECT DISTINCT <any_fieldname> FROM <the_table> LIMIT 50 only took around 0.0011s (0.0001s) ... that is no slow to me at all.

kickstart · January 31, 2012

I have a database made up of website addresses and company names. Everytime a user submits a company name in a form they are also asked to submit the website address. (this is to make sure that the websites are automatically kept up to date).

So whenever I need to find the latest website address for a company I just refer to the match with the latest timestamp.

In that case I would have a select for company and the max date for that name. Something like this (made up tables and columns):-

SELECT Company, MAX(Date_Added) FROM CompanyWebSites

Then JOIN that back against the company web sites table to get the rest of the details

SELECT Sub1.Company, Sub1.LatestDateAdded, WebSiteName
FROM SELECT Company, MAX(Date_Added) AS LatestDateAdded FROM CompanyWebSites) Sub1
INNER JOIN CompanyWebSites
ON Sub1.Company = CompanyWebSites.Company AND sub1.LatestDateAdded = CompanyWebSites.Date_Added

Assuming that date added is unique for any companies web sites then that should give you what you want.

It would probably be more efficient to have a table of companies as well and use a unique numeric company id rather than the name.

All the best

Keith

stubarny · January 31, 2012

Thanks guys, much appreciated.

BTW: To clarify... As fenway said DISTINCT is very slow on large datasets but, when used with LIMIT it is not critical (obviously assuming that your LIMIT's values are small), as an example: I do have a table with around 21 millions records and a SELECT DISTINCT <any_fieldname> FROM <the_table> LIMIT 50 only took around 0.0011s (0.0001s) ... that is no slow to me at all.

Many thanks for that - I'll keep the design as it is.

All the best,

Stu

Sign In

SELECT query - apply LIMIT condition to unique items only

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information