Jump to content

bitt3n

Members
  • Posts

    36
  • Joined

  • Last visited

    Never

Profile Information

  • Gender
    Not Telling

bitt3n's Achievements

Newbie

Newbie (1/5)

0

Reputation

  1. someone else observed this -- I'm confused why it says this, since I do have an index on that table. these are the keys on my products table Keyname Type Cardinality Action Field PRIMARY PRIMARY 476122 product_id shop_id UNIQUE 476122 shop_id,link title_2 UNIQUE 476122 title,image brand INDEX 8983 brand title INDEX 238061 title date INDEX 158707 date shop_id_2 INDEX 90 shop_id product_id INDEX 476122 product_id,date shop_id_3 INDEX 158707 shop_id,date so index named shop_id_2 is the shop_id index. on shops, the index is Keyname Type Cardinality Action Field PRIMARY PRIMARY 90 shop_id this should be ok, right? also here are the SHOW CREATE tables CREATE TABLE `products` (\n `product_id` int(10) NOT NULL auto_increment,\n `shop_id` int(10) NOT NULL default '0',\n `title` varchar(120) NOT NULL default '',\n `brand` varchar(80) NOT NULL default '',\n `price` float NOT NULL default '0',\n `image` varchar(255) NOT NULL default '',\n `image_height` smallint(6) default NULL,\n `image_width` smallint(6) default NULL,\n `link` varchar(255) NOT NULL default '',\n `date` datetime NOT NULL default '0000-00-00 00:00:00',\n PRIMARY KEY (`product_id`),\n UNIQUE KEY `shop_id` (`shop_id`,`link`),\n UNIQUE KEY `title_2` (`title`,`image`),\n KEY `brand` (`brand`),\n KEY `title` (`title`),\n KEY `date` (`date`),\n KEY `shop_id_2` (`shop_id`),\n KEY `product_id` (`product_id`,`date`),\n KEY `shop_id_3` (`shop_id`,`date`)\n) ENGINE=MyISAM AUTO_INCREMENT=547387 DEFAULT CHARSET=latin1 CREATE TABLE `shops` (\n `shop_id` int(10) NOT NULL auto_increment,\n `shop` varchar(80) NOT NULL default '',\n `last_look` datetime NOT NULL default '0000-00-00 00:00:00',\n PRIMARY KEY (`shop_id`)\n) ENGINE=MyISAM AUTO_INCREMENT=142 DEFAULT CHARSET=latin1
  2. I've got a few people speculating that the problem may be the result of the 'temp table size' or 'max heap table size' parameters, or that I need to use a 'covering index'. Are these likely to be related to the problem?
  3. sure, here it is id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE u const PRIMARY,prior_login PRIMARY 4 const 1 Using temporary; Using filesort 1 SIMPLE s ALL PRIMARY NULL NULL NULL 90 1 SIMPLE p ref shop_id,date,shop_id_2,shop_id_3 shop_id 4 db.s.shop_id 4761 Using where
  4. The following query executes in 3.6 seconds on a products table of 475K rows (returning ~1K rows). I would like to make it more efficient. SELECT p.*, shop FROM products p JOIN users u ON p.`date` >= u.prior_login and u.user_id = 22 JOIN shops s ON p.shop_id = s.shop_id ORDER BY shop, `date`, product_id; on the product table I have indexes on (product_id), (date), (product_id,date),(shop_id,date). on the shops table I have an index on (shop_id). If I remove from the ORDER BY clause both date and product_id, the query executes in 0.06 seconds. If I add either of those columns back to the ORDER BY, execution time goes back to 3.6 seconds. I notice if I remove the first JOIN, the query executes in 7.4 seconds (returning all 475K rows). I am running MySQL client version: 5.0.67.
  5. I figured it out, I should have established the second URL via curl_setopt() rather than curl_init(), which resets the cookie location etc
  6. edit: double post
  7. ha I realized I forgot to remove the case insensitive modifier I was using for another part of the regex, cosmic level idiocy
  8. in the following, I want put only 'Capitalized Words' into a group. text="ALL CAPITAL LETTERS - Capitalized Words" the best way to do it seems to use the logic: "match everything until the following character is lowercase, then put the current and following characters in a group, stopping at the final quotation mark" to do this I've tried text="(.(?![a-z]))+(?<mygroup>[^"]+)" which I understand to mean "match at least one of anything that isn't followed by a lowercase character, then make the group" however I've clearly screwed something up. What did I do wrong?
  9. I'm trying to submit a POST form in order to input a given location (eg United Kingdom), then visit another page on the same site to obtain price information for this location. (I have permission to use this price data.) I'm pretty sure I'm screwing up submitting the POST form somehow. If I display the output of the first call to curl_exec($ch) I get 'Object moved to here.' with a link to default.aspx. What's the best way to fix this? <?php // STEP 1. let's create a cookie file $ckfile = tempnam ("/tmp", "CURLCOOKIE"); // STEP 2. visit the homepage to set the cookie properly $ch = curl_init ("http://www.pret-a-beaute.com/ChangeSeason.aspx"); curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt ($ch, CURLOPT_POST, true); curl_setopt( $ch, CURLOPT_POSTFIELDS, 'Country="67"'); curl_exec ($ch); curl_setopt ($ch, CURLOPT_POST, false); /* STEP 3. visit cookiepage.php */ $ch = curl_init ("http://www.pret-a-beaute.com/Whatsnew.aspx?CategoryId=0&Direction=0&OrderBy=36&ViewAll=1&PageNo=1&TopCatAll=1&TableLookupStr="); curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true); $output = curl_exec ($ch); echo($output); ?> the form in question is <form method="post" action="ChangeSeason.aspx" name="ChangeSeason"> <table align=center width="100%" border="0" cellspacing="5" cellpadding="0"> <tr> <td align=left class="MAINTEXTBLACKsmall" style="padding-left:75px;"> COUNTRY </td> </tr> <tr> <td align=center> <select name="Country" id="Country" class="selectbox" onchange="ChangeSeason.submit();"><option value=0>CHOOSE YOUR COUNTRY</option> <option value="67">United Kingdom</option> <option selected value="66">USA</option> </select> </td> </tr> </table> </form>
  10. No, this is actually a different issue. I was getting a server error on account of using negative lookaheads. Removing the negative lookaheads solved the server error problem. However, the script was now timing out before the regex had parsed all the matches. Removing the limit increases on backtrack and recursion solved the timeout problem, and now all results get parsed. requinix suggested this might be on account of the fact that the backtrack and recursion increases were actually being used by the regex, which seems logical, and is obviously not good. however preg_last_error() indicates that the lower limits are not being exceeded. Thus the reason why raising the limits causes the script to time out remains unclear.
  11. that's true, is there some way to benchmark different regexes in PHP so we can see the difference? now that you mention it, that is kind of disturbing. based on your observation, I ran my script with if (preg_last_error() == PREG_NO_ERROR) { print 'There is no error.'; } else if (preg_last_error() == PREG_INTERNAL_ERROR) { print 'There is an internal error!'; } else if (preg_last_error() == PREG_BACKTRACK_LIMIT_ERROR) { print 'Backtrack limit was exhausted!'; } else if (preg_last_error() == PREG_RECURSION_LIMIT_ERROR) { print 'Recursion limit was exhausted!'; } else if (preg_last_error() == PREG_BAD_UTF8_ERROR) { print 'Bad UTF8 error!'; } else if (preg_last_error() == PREG_BAD_UTF8_ERROR) { print 'Bad UTF8 offset error!'; } which gave 'no error' as a result. so it seems like something else must be going on? it seems odd that reducing the limits would cause the script to run to completion if the reduced limits were not being reached.
  12. I ended up replacing the negative lookaheads as suggested and it solved the problem. I'm not sure why the negative lookaheads were a problem, since it seems like they should be just as efficient as using lazy stars, but clearly they are not as efficient. I could remove the names of the captured groups, but this makes the code that uses the resulting array slightly more confusing, since I have to keep track of which number corresponds to which type of information. Another thing I noticed is that increasing the recursion backtrack and recursion limits by 1000X actually causes the script to take far longer to execute, to the point where it was timing out before conclusion. I had to remove these increases to get the script to complete. Incidentally, if anyone has a favorite tutorial on domdocument() to recommend, I would love to hear about it.
  13. I am using this regex to parse html: $total_matches = preg_match_all('{ <a\shref=" (?<link>[^"]+) "(??!src=).)+src=" (?<image>[^"]+) (??!designer-name">).)+designer-name"> (?<brand>[^<]+) (??!title=).)+title=" (?<title>((?!">).)+) (??!"price">).)+"price">\$ (?<price>[\d.,]+) }xsi',$output,$all_matches,PREG_SET_ORDER); This seems to work fine when I try to parse several matches in a row. However when I try parsing the full page these matches come from (I have permission to parse/use these data) http://www.mytheresa.com/us_en/new-arrivals/what-s-new-this-week-1.html?limit=12 the regex fails (I actually get a 500 error). (I have confirmed that my script scrapes the page fine.) I've tried increasing the backtrack/recursion limits using ini_set('pcre.backtrack_limit',100000000); ini_set('pcre.recursion_limit',100000000); but this does not solve the problem. I am wondering what I am doing wrong that is causing the regex to fail via PHP when it seems to be valid, and match code on the relevant page. Fiddling with it seems to suggest the negative lookaheads (in conjunction with the page length) are causing problems, but I'm not sure how I screwed them up. I am running PHP 5.2.17. My PCRE_VERSION is 8.02 2010-03-19. (I've recently been informed that I should be using DomDocument to parse html, but I'd like to fix this for my own edification if nothing else.)
  14. I looked for the 'solved' button as per http://www.phpfreaks.com/forums/phpfreaks-com-questions-comments-suggestions/%28solved%29-how-do-you-mark-a-topic-as-solved/ but I don't see it.. searching html for 'solve' doesn't help either
  15. figured I'd note that this problem is solved, for some reason the query hangs in phpMyAdmin run against a table with more than about 65K rows, but works fine when run through mysql_query() as two separate queries for SET and SELECT. so the lesson seems to be to try running the query in the command-line or through a script if it doesn't work in phpMyAdmin
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.