Jump to content

After adding a new RegEx pattern to my code, MySQL INSERTS stopped working


Go to solution Solved by learn2dev,

Recommended Posts

Hello all,

 

I currently have a php file that is pulling in url data from another table as its source, it uses those urls to go back to the original site to pull in additional product information.  To do this, I am developing regular expressions that will gather certain site information based on the certain patterns that are found within the source code of the site.  

 

I currently have 6 pattern sequences working but when I try to add this one particular pattern to the mix, its breaks my inserts, the code still runs but no records are inserted into my mysql table.  The newly added line of code works by itself, and the existing code works by itself (works as in, the data is inserted into the table) but when I bring the code together is when the inserts stop working

 

I have highlight the "starringPattern", this is the line of regEx that is problematic.   

 

Can anyone tell me what I am doing wrong?  Why do I lose my inserts when this code is added? 

Note:when I comment out the items pertaining to starring / sPattern, the inserts will update.  Also if I load the starring code in a file identical to the code below, BUT load it by itself, it too will insert values in the $url, $starring, and the $default_ts fields of the table.

    while ($row = $rows->fetch_row()) {
        $url = $row[0];
        //echo $url . "\n\n";
        $sPattern = "";
        $dPattern = "";
        $gPattern = "";
        $rPattern = "";
        $rTPattern = "";
        $starringPattern = "";
        if ($source = file_get_contents($url)) {
            $source= preg_replace('/\s\s+/', ' ', )$source;
            $source = preg_replace('/\n\r|\r\n/', '', $source);
            $sPattern = '<span class="label">Summary:<\/span>.* <span class="data" itemprop="description">.*';
            $sPattern .= '(<span class="blurb blurb_collapsed">)?(.*)<\/span>.*';
            $dPattern = '<span class="label">Director:<\/span>.*<span class="data" itemprop="name">(.*)<\/span>.*';
            $gPattern = '<span class="label">Genre\(s\): <\/span>.*<span class="data" itemprop="genre">(.*)<\/span>.*'; //backslashes before and after parenthisis here is to escape out without making () another pattern
            $rPattern = '<span class="label">Rating:<\/span>.*<span class="data" itemprop="contentRating">(.*)<\/span>.*';
            $rTPattern = '<span class="label">Runtime:<\/span>.*<span class="data">(.*)<\/span>';
            $starringPattern  = '<span class="label">Starring:<\/span>.*<span class="data">.*<span itemprop="name">(.*)<\/span><\/a><\/span>.*';                        
           $pattern = '/' . $sPattern . $dPattern . $gPattern . $rPattern . $rTPattern . $starringPattern . '/siU';
            echo $pattern ."\n";
            date_default_timezone_set("America/Chicago");
            preg_match_all($pattern, $source, $results);
            if (isset($results[0][0])) {
                $summary = str_replace("'", "-", $results[2][0]);
                $summary = preg_replace('/(<.*span.*>|<.*\/span.*>|<.*a .*rel\s*=.*> |<.*\/a.*>)/U', ' ', $summary);
                $summary = preg_replace('/\s\s+/', ' ', $summary);
                $dir = str_replace("'", "-", $results[3][0]);
                $starring = str_replace("'", "-", $results[4][0]);
                $gen = str_replace("'", "-", $results[5][0]);
                $rating = str_replace("'", "-", $results[6][0]);
                $duration = str_replace("'", "-", $results[7][0]);
                $default_ts = date("Y-m-d h:i:s", time());       
                $sqlString = "insert into myMovies values('$url', '$summary', '$dir', '$starring', '$gen', '$rating', '$duration', '$default_ts')";
                
              }
            }
          }
       
     $conn->close();
    }
   
?>

 Keep in mind that this is my very first post pertaining to programming so if I have not provided sufficient information or if its still unclear the help I am asking for please let me know.

 

Thanks in advanced

Edited by Zane

You're missing a .* between the $rTPattern and your new $starringPattern. That's basically a "delimiter" between the different parts; your $starringPattern is at the end and doesn't need it, but $rTPattern is no longer the last one so it does need it.

It got wrapped around (copy and paste) and is part of the next line

Uh, no?

$rTPattern = '<span class="label">Runtime:<\/span>.*<span class="data">(.*)<\/span>';
$starringPattern  = '<span class="label">Starring:<\/span>.*<span class="data">.*<span itemprop="name">(.*)<\/span><\/a><\/span>.*';                        
$pattern = '/' . $sPattern . $dPattern . $gPattern . $rPattern . $rTPattern . $starringPattern . '/siU';
There is no .* between the $rTPattern and the $starringPattern. You put one at the end of $starringPattern but not at the end of $rTPattern. There very likely needs to be one.

Oh yes, good catch, you are right and I have corrected this, thanks for pointing this out to me. 

This change did not fix my issue but I have made some other changes that have resulted in progress.

I added a seperate preg_match_all for the starringPattern in an effort to get it to run independently of the other patterns and now the program runs but I am only getting the first result from the first pass of the pattern (if that makes sense). 

 

Here are the added lines below

 

$actorsPattern = '/' . $starringPattern . '/siU';

preg_match_all($actorsPattern, $source, $actors);

if (isset($stars[1]))$actorsString=  implode (',', $stars[1]);

            $starString = preg_replace('/(<.*span.*>|<.*\/span.*>|<.*a .*rel\s*=.*> |<.*\/a.*>)/U', ' ', $starString);     

 

The code apprears to be skipping over the repeating patterns, only giving me output from the first one.  For example,  I need all of the following but it only gives me the first one (Apple Wheat) and then skips on to the next pattern.  This is true for each product type.

 

<span class="label">Actors:</span>
                <span class="data">
                     <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                        <a href="/person/apple-wheat" itemprop="url"><span itemprop="name">Apple Wheat</span></a></span>,
                         <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                         <a href="/person/banana-cumcumer" itemprop="url"><span itemprop="name">Banana Cumcumber</span></a></span>,
                         <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                         <a href="/person/boat-ski" itemprop="url"><span itemprop="name">Boat Ski</span></a></span>,                                            
                         <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                         <a href="/person/rain-snow" itemprop="url"><span itemprop="name">Rain Snow</span></a></span>,                                            
                         <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                         <a href="/person/great-house" itemprop="url"><span itemprop="name">Great Horse</span></a></span>,                                            
                         <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                         <a href="/person/mickey-mouse" itemprop="url"><span itemprop="name">Mickey Moose</span></a></span>,                                            
                         <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                         <a href="/person/dallas-k-colorado" itemprop="url"><span itemprop="name">Dallas Colorado</span></a></span>,                                            
                         <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                         <a href="/person/Pala-Alto" itemprop="url"><span itemprop="name">Pala Alto</span></a></span>,                                            
                         <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                         <a href="/person/orange-sleet" itemprop="url"><span itemprop="name">Orange Sleet</span></a></span>,                                            
                         <span itemprop="star" itemscope itemtype="http://schema.org/Person">
                         <a href="/person/please-help" itemprop="url"><span itemprop="name">Please Help</span></a></span>                                    
                         </span>

 

 

 

Perhaps I want the new pattern definition to be "greedy" insteat of using '/siU' in this case or perhaps I need to iterate over the newly added array to make sure I am getting all the results for this particular pattern before going on to the next one?

 

Current status; I am receiving the first expected value for the following pattern...

 

$starringPattern = '<span class="label">Actors:<\/span>.*<span class="data">.*<span itemprop="name">(.*)<\/span><\/a><\/span>.*';

 

My issue is that when the source output repeats with identical output, my code does not, instead it goes on to the next pattern.

Good thing is, I am getting some output now, its just now complete.

Edited by learn2dev
  • Solution

It was determined that the regex pattern was not adequate to pull in the data.

Once this was corrected, I was able to see the completed string listing all stars for the first record.

I now just need to figure out how to interate over the array so that I can pull in all of the data for

every item.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.