learn2dev Posted November 29, 2013 Share Posted November 29, 2013 Hello all, I currently have a php file that is pulling in url data from another table as its source, it uses those urls to go back to the original site to pull in additional product information. To do this, I am developing regular expressions that will gather certain site information based on the certain patterns that are found within the source code of the site. I currently have 6 pattern sequences working but when I try to add this one particular pattern to the mix, its breaks my inserts, the code still runs but no records are inserted into my mysql table. The newly added line of code works by itself, and the existing code works by itself (works as in, the data is inserted into the table) but when I bring the code together is when the inserts stop working I have highlight the "starringPattern", this is the line of regEx that is problematic. Can anyone tell me what I am doing wrong? Why do I lose my inserts when this code is added? Note:when I comment out the items pertaining to starring / sPattern, the inserts will update. Also if I load the starring code in a file identical to the code below, BUT load it by itself, it too will insert values in the $url, $starring, and the $default_ts fields of the table. while ($row = $rows->fetch_row()) { $url = $row[0]; //echo $url . "\n\n"; $sPattern = ""; $dPattern = ""; $gPattern = ""; $rPattern = ""; $rTPattern = ""; $starringPattern = ""; if ($source = file_get_contents($url)) { $source= preg_replace('/\s\s+/', ' ', )$source; $source = preg_replace('/\n\r|\r\n/', '', $source); $sPattern = '<span class="label">Summary:<\/span>.* <span class="data" itemprop="description">.*'; $sPattern .= '(<span class="blurb blurb_collapsed">)?(.*)<\/span>.*'; $dPattern = '<span class="label">Director:<\/span>.*<span class="data" itemprop="name">(.*)<\/span>.*'; $gPattern = '<span class="label">Genre\(s\): <\/span>.*<span class="data" itemprop="genre">(.*)<\/span>.*'; //backslashes before and after parenthisis here is to escape out without making () another pattern $rPattern = '<span class="label">Rating:<\/span>.*<span class="data" itemprop="contentRating">(.*)<\/span>.*'; $rTPattern = '<span class="label">Runtime:<\/span>.*<span class="data">(.*)<\/span>'; $starringPattern = '<span class="label">Starring:<\/span>.*<span class="data">.*<span itemprop="name">(.*)<\/span><\/a><\/span>.*'; $pattern = '/' . $sPattern . $dPattern . $gPattern . $rPattern . $rTPattern . $starringPattern . '/siU'; echo $pattern ."\n"; date_default_timezone_set("America/Chicago"); preg_match_all($pattern, $source, $results); if (isset($results[0][0])) { $summary = str_replace("'", "-", $results[2][0]); $summary = preg_replace('/(<.*span.*>|<.*\/span.*>|<.*a .*rel\s*=.*> |<.*\/a.*>)/U', ' ', $summary); $summary = preg_replace('/\s\s+/', ' ', $summary); $dir = str_replace("'", "-", $results[3][0]); $starring = str_replace("'", "-", $results[4][0]); $gen = str_replace("'", "-", $results[5][0]); $rating = str_replace("'", "-", $results[6][0]); $duration = str_replace("'", "-", $results[7][0]); $default_ts = date("Y-m-d h:i:s", time()); $sqlString = "insert into myMovies values('$url', '$summary', '$dir', '$starring', '$gen', '$rating', '$duration', '$default_ts')"; } } } $conn->close(); } ?> Keep in mind that this is my very first post pertaining to programming so if I have not provided sufficient information or if its still unclear the help I am asking for please let me know. Thanks in advanced Link to comment https://forums.phpfreaks.com/topic/284378-after-adding-a-new-regex-pattern-to-my-code-mysql-inserts-stopped-working/ Share on other sites More sharing options...
requinix Posted November 29, 2013 Share Posted November 29, 2013 You're missing a .* between the $rTPattern and your new $starringPattern. That's basically a "delimiter" between the different parts; your $starringPattern is at the end and doesn't need it, but $rTPattern is no longer the last one so it does need it. Link to comment https://forums.phpfreaks.com/topic/284378-after-adding-a-new-regex-pattern-to-my-code-mysql-inserts-stopped-working/#findComment-1460626 Share on other sites More sharing options...
learn2dev Posted November 29, 2013 Author Share Posted November 29, 2013 It got wrapped around (copy and paste) and is part of the next line Link to comment https://forums.phpfreaks.com/topic/284378-after-adding-a-new-regex-pattern-to-my-code-mysql-inserts-stopped-working/#findComment-1460641 Share on other sites More sharing options...
learn2dev Posted November 29, 2013 Author Share Posted November 29, 2013 Thank you Zane for editing the post and making it more readable. Will you share with me how I can do this for future posts? Link to comment https://forums.phpfreaks.com/topic/284378-after-adding-a-new-regex-pattern-to-my-code-mysql-inserts-stopped-working/#findComment-1460647 Share on other sites More sharing options...
requinix Posted November 30, 2013 Share Posted November 30, 2013 It got wrapped around (copy and paste) and is part of the next lineUh, no? $rTPattern = '<span class="label">Runtime:<\/span>.*<span class="data">(.*)<\/span>'; $starringPattern = '<span class="label">Starring:<\/span>.*<span class="data">.*<span itemprop="name">(.*)<\/span><\/a><\/span>.*'; $pattern = '/' . $sPattern . $dPattern . $gPattern . $rPattern . $rTPattern . $starringPattern . '/siU';There is no .* between the $rTPattern and the $starringPattern. You put one at the end of $starringPattern but not at the end of $rTPattern. There very likely needs to be one. Link to comment https://forums.phpfreaks.com/topic/284378-after-adding-a-new-regex-pattern-to-my-code-mysql-inserts-stopped-working/#findComment-1460677 Share on other sites More sharing options...
learn2dev Posted November 30, 2013 Author Share Posted November 30, 2013 Oh yes, good catch, you are right and I have corrected this, thanks for pointing this out to me. This change did not fix my issue but I have made some other changes that have resulted in progress. I added a seperate preg_match_all for the starringPattern in an effort to get it to run independently of the other patterns and now the program runs but I am only getting the first result from the first pass of the pattern (if that makes sense). Here are the added lines below $actorsPattern = '/' . $starringPattern . '/siU'; preg_match_all($actorsPattern, $source, $actors); if (isset($stars[1]))$actorsString= implode (',', $stars[1]); $starString = preg_replace('/(<.*span.*>|<.*\/span.*>|<.*a .*rel\s*=.*> |<.*\/a.*>)/U', ' ', $starString); The code apprears to be skipping over the repeating patterns, only giving me output from the first one. For example, I need all of the following but it only gives me the first one (Apple Wheat) and then skips on to the next pattern. This is true for each product type. <span class="label">Actors:</span> <span class="data"> <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/apple-wheat" itemprop="url"><span itemprop="name">Apple Wheat</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/banana-cumcumer" itemprop="url"><span itemprop="name">Banana Cumcumber</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/boat-ski" itemprop="url"><span itemprop="name">Boat Ski</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/rain-snow" itemprop="url"><span itemprop="name">Rain Snow</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/great-house" itemprop="url"><span itemprop="name">Great Horse</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/mickey-mouse" itemprop="url"><span itemprop="name">Mickey Moose</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/dallas-k-colorado" itemprop="url"><span itemprop="name">Dallas Colorado</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/Pala-Alto" itemprop="url"><span itemprop="name">Pala Alto</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/orange-sleet" itemprop="url"><span itemprop="name">Orange Sleet</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/please-help" itemprop="url"><span itemprop="name">Please Help</span></a></span> </span> Perhaps I want the new pattern definition to be "greedy" insteat of using '/siU' in this case or perhaps I need to iterate over the newly added array to make sure I am getting all the results for this particular pattern before going on to the next one? Current status; I am receiving the first expected value for the following pattern... $starringPattern = '<span class="label">Actors:<\/span>.*<span class="data">.*<span itemprop="name">(.*)<\/span><\/a><\/span>.*'; My issue is that when the source output repeats with identical output, my code does not, instead it goes on to the next pattern. Good thing is, I am getting some output now, its just now complete. Link to comment https://forums.phpfreaks.com/topic/284378-after-adding-a-new-regex-pattern-to-my-code-mysql-inserts-stopped-working/#findComment-1460758 Share on other sites More sharing options...
learn2dev Posted November 30, 2013 Author Share Posted November 30, 2013 Sorry, there is a typo in my previous post. The last line should read; its just not complete Link to comment https://forums.phpfreaks.com/topic/284378-after-adding-a-new-regex-pattern-to-my-code-mysql-inserts-stopped-working/#findComment-1460761 Share on other sites More sharing options...
learn2dev Posted December 3, 2013 Author Share Posted December 3, 2013 It was determined that the regex pattern was not adequate to pull in the data. Once this was corrected, I was able to see the completed string listing all stars for the first record. I now just need to figure out how to interate over the array so that I can pull in all of the data for every item. Link to comment https://forums.phpfreaks.com/topic/284378-after-adding-a-new-regex-pattern-to-my-code-mysql-inserts-stopped-working/#findComment-1461131 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.