Jump to content

Recommended Posts

MySQL 5.1.46

 

The Goal:

On each row, check if there are any records having the same ssn within the previous 10 minutes using the "created_on" datetime field.. If a record *is* found within the previous 10 minutes, don't collect the current row and move onto the next row and do the same thing. If a record *is not* found, collect the current row and move onto the next row and do the same thing.

 

SHOW CREATE TABLE posts_inbound

| posts_inbound | CREATE TABLE `posts_inbound` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `publisher_id` int(11) unsigned default NULL,
  `transaction_id` varchar(32) default NULL,
  `api_key` varchar(64) default NULL,
  `score_version` char(10) default NULL,
  `score` int(11) NOT NULL default '0',
  `sold` char(1) default NULL,
  `price` decimal(5,2) NOT NULL default '0.00',
  `post_status` varchar(32) default NULL,
  `post_errors` text,
  `postback_status` varchar(32) default NULL,
  `postback_errors` text,
  `reference_id` varchar(64) default NULL,
  `affiliate_id` varchar(64) default NULL,
  `affiliate_sid` varchar(64) default NULL,
  `lead_id` varchar(64) default NULL,
  `buyer_id` varchar(64) default NULL,
  `buyer_sid` varchar(64) default NULL,
  `redirected` tinyint(1) NOT NULL default '0',
  `source_url` varchar(64) default NULL,
  `ip_address` varchar(16) default NULL,
  `user_agent` varchar(255) default NULL,
  `time_stamp` datetime NOT NULL default '0000-00-00 00:00:00',
  `is_mobile` tinyint(1) NOT NULL default '0',
  `loan_amount` decimal(6,2) NOT NULL default '0.00',
  `first_name` varchar(64) default NULL,
  `last_name` varchar(64) default NULL,
  `email` varchar(64) default NULL,
  `gender` char(1) default NULL,
  `date_of_birth` char(10) default NULL,
  `marital_status` char(1) default NULL,
  `drivers_license_number` varchar(30) default NULL,
  `drivers_license_state` char(2) default NULL,
  `social_security_number` char(9) default NULL,
  `is_military` char(1) default NULL,
  `best_contact_time` char(1) default NULL,
  `mother_maiden_name` varchar(64) default NULL,
  `home_phone` varchar(32) default NULL,
  `cell_phone` varchar(32) default NULL,
  `address` varchar(40) default NULL,
  `city` varchar(32) default NULL,
  `state` char(2) default NULL,
  `zip` char(5) default NULL,
  `rent_or_own` tinyint(1) default NULL,
  `monthly_rent_or_mortgage` decimal(6,2) NOT NULL default '0.00',
  `time_at_current_address` tinyint(2) default NULL,
  `previous_address` varchar(40) default NULL,
  `previous_city` varchar(32) default NULL,
  `previous_state` char(2) default NULL,
  `previous_zip` char(5) default NULL,
  `employer_name` varchar(64) default NULL,
  `employer_address` varchar(40) default NULL,
  `employer_city` varchar(32) default NULL,
  `employer_state` char(2) default NULL,
  `employer_zip` char(5) default NULL,
  `work_phone` varchar(32) default NULL,
  `time_at_job` tinyint(2) default NULL,
  `monthly_income` decimal(7,2) NOT NULL default '0.00',
  `income_source` tinyint(1) default NULL,
  `pay_frequency` tinyint(1) default NULL,
  `first_pay_day` char(10) default NULL,
  `second_pay_day` char(10) default NULL,
  `last_pay_amount` decimal(7,2) NOT NULL default '0.00',
  `is_direct_deposit` tinyint(1) default NULL,
  `bank_name` varchar(64) default NULL,
  `bank_phone` varchar(32) default NULL,
  `bank_account_type` char(1) default NULL,
  `bank_account_length` tinyint(2) default NULL,
  `bank_routing_number` char(9) default NULL,
  `bank_account_number` varchar(17) default NULL,
  `reference_first_name` varchar(64) default NULL,
  `reference_last_name` varchar(64) default NULL,
  `reference_phone` varchar(32) default NULL,
  `reference_relation` varchar(20) default NULL,
  `time_start` double default NULL,
  `time_end` double default NULL,
  `time` double default NULL,
  `created_date` date NOT NULL default '0000-00-00',
  `created_on` datetime NOT NULL default '0000-00-00 00:00:00',
  `updated_on` datetime NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY  (`id`),
  KEY `publisher_id` (`publisher_id`),
  KEY `transaction_id` (`transaction_id`),
  KEY `api_key` (`api_key`),
  KEY `created_date` (`created_date`),
  KEY `social_security_number` (`social_security_number`),
  KEY `idx_bank_account_type` (`bank_account_type`),
  KEY `idx_email` (`email`),
  KEY `ssn_id` (`social_security_number`,`id`)
) ENGINE=MyISAM AUTO_INCREMENT=7874256 DEFAULT CHARSET=latin1 |

 

Here's sort of a pseudo query of what I'm trying to collect:

SELECT * FROM posts_inbound WHERE (social_security_number NOT BETWEEN (created_on - INTERVAL 10 MINUTE) AND created_on)

 

Here's a query that works, but is very very slow:

SELECT social_security_number AS ssn, created_on AS timeframe_start, (SELECT COUNT(id) FROM posts_inbound WHERE social_security_number=ssn AND (created_on >= (timeframe_start - INTERVAL 10 MINUTE) AND created_on < timeframe_start)) AS dup_count FROM posts_inbound WHERE (created_date BETWEEN '2012-08-09' AND '2012-08-10') GROUP BY social_security_number, id HAVING dup_count = 0

 

The explain for the above query:

mysql> explain SELECT social_security_number AS ssn, created_on AS timeframe_start, (SELECT COUNT(id) FROM posts_inbound WHERE social_security_number=ssn AND (created_on >= (timeframe_start - INTERVAL 10 MINUTE) AND created_on < timeframe_start)) AS dup_count FROM posts_inbound WHERE (created_date BETWEEN '2012-08-09' AND '2012-08-10') GROUP BY social_security_number, id HAVING dup_count = 0;
+----+--------------------+---------------+-------+-------------------------------+------------------------+---------+------+--------+-----------------------------+
| id | select_type        | table         | type  | possible_keys                 | key                    | key_len | ref  | rows   | Extra                       |
+----+--------------------+---------------+-------+-------------------------------+------------------------+---------+------+--------+-----------------------------+
|  1 | PRIMARY            | posts_inbound | range | created_date                  | created_date           | 3       | NULL | 198107 | Using where; Using filesort | 
|  2 | DEPENDENT SUBQUERY | posts_inbound | ref   | social_security_number,ssn_id | social_security_number | 10      | func |      9 | Using where                 | 
+----+--------------------+---------------+-------+-------------------------------+------------------------+---------+------+--------+-----------------------------+

 

I'm game to either write an entirely new query or optimize the one above that works..

 

Thanks to all who reply!

Link to comment
https://forums.phpfreaks.com/topic/267891-select-timeframe-unique-records/
Share on other sites

Something like this, using a left join to check for records within 10 mins

SELECT a.* 
FROM posts_inbound a
LEFT JOIN posts_inbound b 
    ON a.social_security_number = b social_security_number 
    AND b.created_on > a.created_on - INTERVAL 10 MINUTE
WHERE b.social_security_number IS NULL

Correction to above:

 

SELECT a.* 
FROM posts_inbound a
LEFT JOIN posts_inbound b 
    ON a.social_security_number = b social_security_number
    AND b.created < a.created_on 
    AND b.created_on > a.created_on - INTERVAL 10 MINUTE
WHERE b.social_security_number IS NULL

Thanks for replying Barand!

 

The query you provided is keeping the Joined records.. We do not want to keep those, just want to keep posts_inbound "a" if there are no records with the same ssn within the 10 minute timeframe, otherwise move on and keep nothing from that record..

 

Want to end up with 1 record per ssn per 10 mintues from that record..

 

I'm having troubles trying to explain my objective clearly..

 

Maybe this might help:

 

if the current rows social_security_number does not exist between now and 10 minutes ago (no prior posts within 10 mins)

    keep the current row in the result

else

    do not keep the row and move on to the next row and repeat

 

Trying to filter out 10 minute uniques..

 

 

Once I corrected a couple of typos it worked for me

 

SELECT a.*

FROM posts_inbound a

LEFT JOIN posts_inbound b

    ON a.social_security_number = b.social_security_number

    AND b.created_on < a.created_on

    AND b.created_on > a.created_on - INTERVAL 10 MINUTE

WHERE b.social_security_number IS NULL;

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.