Jump to content

learn2dev

New Members
  • Posts

    6
  • Joined

  • Last visited

learn2dev's Achievements

Newbie

Newbie (1/5)

0

Reputation

  1. It was determined that the regex pattern was not adequate to pull in the data. Once this was corrected, I was able to see the completed string listing all stars for the first record. I now just need to figure out how to interate over the array so that I can pull in all of the data for every item.
  2. Sorry, there is a typo in my previous post. The last line should read; its just not complete
  3. Oh yes, good catch, you are right and I have corrected this, thanks for pointing this out to me. This change did not fix my issue but I have made some other changes that have resulted in progress. I added a seperate preg_match_all for the starringPattern in an effort to get it to run independently of the other patterns and now the program runs but I am only getting the first result from the first pass of the pattern (if that makes sense). Here are the added lines below $actorsPattern = '/' . $starringPattern . '/siU'; preg_match_all($actorsPattern, $source, $actors); if (isset($stars[1]))$actorsString= implode (',', $stars[1]); $starString = preg_replace('/(<.*span.*>|<.*\/span.*>|<.*a .*rel\s*=.*> |<.*\/a.*>)/U', ' ', $starString); The code apprears to be skipping over the repeating patterns, only giving me output from the first one. For example, I need all of the following but it only gives me the first one (Apple Wheat) and then skips on to the next pattern. This is true for each product type. <span class="label">Actors:</span> <span class="data"> <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/apple-wheat" itemprop="url"><span itemprop="name">Apple Wheat</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/banana-cumcumer" itemprop="url"><span itemprop="name">Banana Cumcumber</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/boat-ski" itemprop="url"><span itemprop="name">Boat Ski</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/rain-snow" itemprop="url"><span itemprop="name">Rain Snow</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/great-house" itemprop="url"><span itemprop="name">Great Horse</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/mickey-mouse" itemprop="url"><span itemprop="name">Mickey Moose</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/dallas-k-colorado" itemprop="url"><span itemprop="name">Dallas Colorado</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/Pala-Alto" itemprop="url"><span itemprop="name">Pala Alto</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/orange-sleet" itemprop="url"><span itemprop="name">Orange Sleet</span></a></span>, <span itemprop="star" itemscope itemtype="http://schema.org/Person"> <a href="/person/please-help" itemprop="url"><span itemprop="name">Please Help</span></a></span> </span> Perhaps I want the new pattern definition to be "greedy" insteat of using '/siU' in this case or perhaps I need to iterate over the newly added array to make sure I am getting all the results for this particular pattern before going on to the next one? Current status; I am receiving the first expected value for the following pattern... $starringPattern = '<span class="label">Actors:<\/span>.*<span class="data">.*<span itemprop="name">(.*)<\/span><\/a><\/span>.*'; My issue is that when the source output repeats with identical output, my code does not, instead it goes on to the next pattern. Good thing is, I am getting some output now, its just now complete.
  4. Thank you Zane for editing the post and making it more readable. Will you share with me how I can do this for future posts?
  5. Hello all, I currently have a php file that is pulling in url data from another table as its source, it uses those urls to go back to the original site to pull in additional product information. To do this, I am developing regular expressions that will gather certain site information based on the certain patterns that are found within the source code of the site. I currently have 6 pattern sequences working but when I try to add this one particular pattern to the mix, its breaks my inserts, the code still runs but no records are inserted into my mysql table. The newly added line of code works by itself, and the existing code works by itself (works as in, the data is inserted into the table) but when I bring the code together is when the inserts stop working I have highlight the "starringPattern", this is the line of regEx that is problematic. Can anyone tell me what I am doing wrong? Why do I lose my inserts when this code is added? Note:when I comment out the items pertaining to starring / sPattern, the inserts will update. Also if I load the starring code in a file identical to the code below, BUT load it by itself, it too will insert values in the $url, $starring, and the $default_ts fields of the table. while ($row = $rows->fetch_row()) { $url = $row[0]; //echo $url . "\n\n"; $sPattern = ""; $dPattern = ""; $gPattern = ""; $rPattern = ""; $rTPattern = ""; $starringPattern = ""; if ($source = file_get_contents($url)) { $source= preg_replace('/\s\s+/', ' ', )$source; $source = preg_replace('/\n\r|\r\n/', '', $source); $sPattern = '<span class="label">Summary:<\/span>.* <span class="data" itemprop="description">.*'; $sPattern .= '(<span class="blurb blurb_collapsed">)?(.*)<\/span>.*'; $dPattern = '<span class="label">Director:<\/span>.*<span class="data" itemprop="name">(.*)<\/span>.*'; $gPattern = '<span class="label">Genre\(s\): <\/span>.*<span class="data" itemprop="genre">(.*)<\/span>.*'; //backslashes before and after parenthisis here is to escape out without making () another pattern $rPattern = '<span class="label">Rating:<\/span>.*<span class="data" itemprop="contentRating">(.*)<\/span>.*'; $rTPattern = '<span class="label">Runtime:<\/span>.*<span class="data">(.*)<\/span>'; $starringPattern = '<span class="label">Starring:<\/span>.*<span class="data">.*<span itemprop="name">(.*)<\/span><\/a><\/span>.*'; $pattern = '/' . $sPattern . $dPattern . $gPattern . $rPattern . $rTPattern . $starringPattern . '/siU'; echo $pattern ."\n"; date_default_timezone_set("America/Chicago"); preg_match_all($pattern, $source, $results); if (isset($results[0][0])) { $summary = str_replace("'", "-", $results[2][0]); $summary = preg_replace('/(<.*span.*>|<.*\/span.*>|<.*a .*rel\s*=.*> |<.*\/a.*>)/U', ' ', $summary); $summary = preg_replace('/\s\s+/', ' ', $summary); $dir = str_replace("'", "-", $results[3][0]); $starring = str_replace("'", "-", $results[4][0]); $gen = str_replace("'", "-", $results[5][0]); $rating = str_replace("'", "-", $results[6][0]); $duration = str_replace("'", "-", $results[7][0]); $default_ts = date("Y-m-d h:i:s", time()); $sqlString = "insert into myMovies values('$url', '$summary', '$dir', '$starring', '$gen', '$rating', '$duration', '$default_ts')"; } } } $conn->close(); } ?> Keep in mind that this is my very first post pertaining to programming so if I have not provided sufficient information or if its still unclear the help I am asking for please let me know. Thanks in advanced
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.