
Scooby08
Members-
Posts
244 -
Joined
-
Last visited
Profile Information
-
Gender
Not Telling
Scooby08's Achievements

Regular Member (3/5)
0
Reputation
-
I'm sorry, the example I sent should be like so: (if this contains all numbers (civic#), no match)\s(lot|unit|spc|space|ste|suite|bldg|building) Thanks..
-
It seems so simple, but just cannot get this figured out.. In the following I'm trying to match the word "space" that does not contain numbers in the word before it.. 161 space rd <-- no match 209 summit ave space 6 <-- match Ultimately am trying to match (lot|unit|spc|space|ste|suite|bldg|building) that do not have numbers in the word before them.. Close example.. (if this contains a number, no match)\s(lot|unit|spc|space|ste|suite|bldg|building) Thanks so much!
-
Alrighty DavidAM.. Good stuff! I appreciate the response.. In my case, it's definitely necessary to store aggregated data for summary reports (being there's so many records), so I created a child table like so.. But then again you said to remove anything that can be calculated by other rows so I then broke it down to this Am I on the right track here? Or have I misread your suggestions.. Thanks!
-
MySQL Version 5.5.28 I am working on storing summary data such as counts, averages, rates, times for each hour of the day, grouped by "account_id", "affiliate_id", "hour" for each day.. I've included the start of a table to give an idea as to what I'm trying to do so far, but as you can see the table has many columns and will have many more when I add the rest of my columns.. The purpose is for faster page loading of reports. I guess what I'm looking for is any advice on alternative solutions that may work for me here.. Or is this the proper way to store hourly data for faster page loads?? Thanks all!!
-
Thanks for replying Barand! The query you provided is keeping the Joined records.. We do not want to keep those, just want to keep posts_inbound "a" if there are no records with the same ssn within the 10 minute timeframe, otherwise move on and keep nothing from that record.. Want to end up with 1 record per ssn per 10 mintues from that record.. I'm having troubles trying to explain my objective clearly.. Maybe this might help: if the current rows social_security_number does not exist between now and 10 minutes ago (no prior posts within 10 mins) keep the current row in the result else do not keep the row and move on to the next row and repeat Trying to filter out 10 minute uniques..
-
MySQL 5.1.46 The Goal: On each row, check if there are any records having the same ssn within the previous 10 minutes using the "created_on" datetime field.. If a record *is* found within the previous 10 minutes, don't collect the current row and move onto the next row and do the same thing. If a record *is not* found, collect the current row and move onto the next row and do the same thing. SHOW CREATE TABLE posts_inbound | posts_inbound | CREATE TABLE `posts_inbound` ( `id` int(11) unsigned NOT NULL auto_increment, `publisher_id` int(11) unsigned default NULL, `transaction_id` varchar(32) default NULL, `api_key` varchar(64) default NULL, `score_version` char(10) default NULL, `score` int(11) NOT NULL default '0', `sold` char(1) default NULL, `price` decimal(5,2) NOT NULL default '0.00', `post_status` varchar(32) default NULL, `post_errors` text, `postback_status` varchar(32) default NULL, `postback_errors` text, `reference_id` varchar(64) default NULL, `affiliate_id` varchar(64) default NULL, `affiliate_sid` varchar(64) default NULL, `lead_id` varchar(64) default NULL, `buyer_id` varchar(64) default NULL, `buyer_sid` varchar(64) default NULL, `redirected` tinyint(1) NOT NULL default '0', `source_url` varchar(64) default NULL, `ip_address` varchar(16) default NULL, `user_agent` varchar(255) default NULL, `time_stamp` datetime NOT NULL default '0000-00-00 00:00:00', `is_mobile` tinyint(1) NOT NULL default '0', `loan_amount` decimal(6,2) NOT NULL default '0.00', `first_name` varchar(64) default NULL, `last_name` varchar(64) default NULL, `email` varchar(64) default NULL, `gender` char(1) default NULL, `date_of_birth` char(10) default NULL, `marital_status` char(1) default NULL, `drivers_license_number` varchar(30) default NULL, `drivers_license_state` char(2) default NULL, `social_security_number` char(9) default NULL, `is_military` char(1) default NULL, `best_contact_time` char(1) default NULL, `mother_maiden_name` varchar(64) default NULL, `home_phone` varchar(32) default NULL, `cell_phone` varchar(32) default NULL, `address` varchar(40) default NULL, `city` varchar(32) default NULL, `state` char(2) default NULL, `zip` char(5) default NULL, `rent_or_own` tinyint(1) default NULL, `monthly_rent_or_mortgage` decimal(6,2) NOT NULL default '0.00', `time_at_current_address` tinyint(2) default NULL, `previous_address` varchar(40) default NULL, `previous_city` varchar(32) default NULL, `previous_state` char(2) default NULL, `previous_zip` char(5) default NULL, `employer_name` varchar(64) default NULL, `employer_address` varchar(40) default NULL, `employer_city` varchar(32) default NULL, `employer_state` char(2) default NULL, `employer_zip` char(5) default NULL, `work_phone` varchar(32) default NULL, `time_at_job` tinyint(2) default NULL, `monthly_income` decimal(7,2) NOT NULL default '0.00', `income_source` tinyint(1) default NULL, `pay_frequency` tinyint(1) default NULL, `first_pay_day` char(10) default NULL, `second_pay_day` char(10) default NULL, `last_pay_amount` decimal(7,2) NOT NULL default '0.00', `is_direct_deposit` tinyint(1) default NULL, `bank_name` varchar(64) default NULL, `bank_phone` varchar(32) default NULL, `bank_account_type` char(1) default NULL, `bank_account_length` tinyint(2) default NULL, `bank_routing_number` char(9) default NULL, `bank_account_number` varchar(17) default NULL, `reference_first_name` varchar(64) default NULL, `reference_last_name` varchar(64) default NULL, `reference_phone` varchar(32) default NULL, `reference_relation` varchar(20) default NULL, `time_start` double default NULL, `time_end` double default NULL, `time` double default NULL, `created_date` date NOT NULL default '0000-00-00', `created_on` datetime NOT NULL default '0000-00-00 00:00:00', `updated_on` datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (`id`), KEY `publisher_id` (`publisher_id`), KEY `transaction_id` (`transaction_id`), KEY `api_key` (`api_key`), KEY `created_date` (`created_date`), KEY `social_security_number` (`social_security_number`), KEY `idx_bank_account_type` (`bank_account_type`), KEY `idx_email` (`email`), KEY `ssn_id` (`social_security_number`,`id`) ) ENGINE=MyISAM AUTO_INCREMENT=7874256 DEFAULT CHARSET=latin1 | Here's sort of a pseudo query of what I'm trying to collect: SELECT * FROM posts_inbound WHERE (social_security_number NOT BETWEEN (created_on - INTERVAL 10 MINUTE) AND created_on) Here's a query that works, but is very very slow: SELECT social_security_number AS ssn, created_on AS timeframe_start, (SELECT COUNT(id) FROM posts_inbound WHERE social_security_number=ssn AND (created_on >= (timeframe_start - INTERVAL 10 MINUTE) AND created_on < timeframe_start)) AS dup_count FROM posts_inbound WHERE (created_date BETWEEN '2012-08-09' AND '2012-08-10') GROUP BY social_security_number, id HAVING dup_count = 0 The explain for the above query: mysql> explain SELECT social_security_number AS ssn, created_on AS timeframe_start, (SELECT COUNT(id) FROM posts_inbound WHERE social_security_number=ssn AND (created_on >= (timeframe_start - INTERVAL 10 MINUTE) AND created_on < timeframe_start)) AS dup_count FROM posts_inbound WHERE (created_date BETWEEN '2012-08-09' AND '2012-08-10') GROUP BY social_security_number, id HAVING dup_count = 0; +----+--------------------+---------------+-------+-------------------------------+------------------------+---------+------+--------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------------+-------+-------------------------------+------------------------+---------+------+--------+-----------------------------+ | 1 | PRIMARY | posts_inbound | range | created_date | created_date | 3 | NULL | 198107 | Using where; Using filesort | | 2 | DEPENDENT SUBQUERY | posts_inbound | ref | social_security_number,ssn_id | social_security_number | 10 | func | 9 | Using where | +----+--------------------+---------------+-------+-------------------------------+------------------------+---------+------+--------+-----------------------------+ I'm game to either write an entirely new query or optimize the one above that works.. Thanks to all who reply!
-
I ended up using the following: Changing count(*) to count(id) did help to speed it up a bit, but my overall issue ended up being in the surrounding code outside of that query.. There's about 2,500,000 rows of data and now it's quick as can be and is not lagging in any way.. Thanks to all!
-
Something like so?
-
I have a query that seems to lag a bit and although I'm not entirely sure the issue is this query, but it's a starting point.. Is there any way to optimize this query any further? I have an index on the column "created" and the created date is in the following format "2012-07-13". When running the query I get a count of 43551, but when I run an explain it says the following.. I've also tried the following variations for the heck of it, but they all come up the same.. Any ideas??
-
I know this..
-
I'm just looking for a few recommendations of encrypting/decrypting data such as ssn's, bank account #'s, routing #'s, stuff like that.. I've never had to do this in the past and when browsing I have ran across various ways to do so such as, crypt(), mcrypt(), base64_encode() to name a few.. Just don't know which is best to choose and was hoping someone with experience could send some suggestions or advice.. Thanks all..
-
That's how I roll!
-
This is true Pikachu2000..
-
Thanks again Josh.. I'll get er' from here.. By the way, if anybody can answer just the regex question to match "Box 12" and not match "1555 pine box elder rd", regardless as to what I'm using it for, that would be most helpful!!
-
You are correct Josh, except for the "Box 12" one.. I am posting these to a third party and they run a validation on these as well and they are saying that it is a po box address.. All I'm trying to do is create a custom filter that will work for this third party so I don't post them po box address as they do not want them.. It's really close to being where I need it.. I am already using the code and it's working great, but the "Box 12" types still get by and in the even that an address actually has a "p" in front of the work "box" it'll treat it as a po box when it really is not.. (1555 pine box elder rd) I really do appreciate the help Josh! Thank you