Jump to content

Search the Community

Showing results for tags 'php;regex;mysql;site scraping'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • Welcome to PHP Freaks
    • Announcements
    • Introductions
  • PHP Coding
    • PHP Coding Help
    • Regex Help
    • Third Party Scripts
    • FAQ/Code Snippet Repository
  • SQL / Database
    • MySQL Help
    • PostgreSQL
    • Microsoft SQL - MSSQL
    • Other RDBMS and SQL dialects
  • Client Side
    • HTML Help
    • CSS Help
    • Javascript Help
    • Other
  • Applications and Frameworks
    • Applications
    • Frameworks
    • Other Libraries
  • Web Server Administration
    • PHP Installation and Configuration
    • Linux
    • Apache HTTP Server
    • Microsoft IIS
    • Other Web Server Software
  • Other
    • Application Design
    • Other Programming Languages
    • Editor Help (PhpStorm, VS Code, etc)
    • Website Critique
    • Beta Test Your Stuff!
  • Freelance, Contracts, Employment, etc.
    • Services Offered
    • Job Offerings
  • General Discussion
    • PHPFreaks.com Website Feedback
    • Miscellaneous

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


AIM


MSN


Website URL


ICQ


Yahoo


Jabber


Skype


Location


Interests


Age


Donation Link

Found 1 result

  1. Hello all, I currently have a php file that is pulling in url data from another table as its source, it uses those urls to go back to the original site to pull in additional product information. To do this, I am developing regular expressions that will gather certain site information based on the certain patterns that are found within the source code of the site. I currently have 6 pattern sequences working but when I try to add this one particular pattern to the mix, its breaks my inserts, the code still runs but no records are inserted into my mysql table. The newly added line of code works by itself, and the existing code works by itself (works as in, the data is inserted into the table) but when I bring the code together is when the inserts stop working I have highlight the "starringPattern", this is the line of regEx that is problematic. Can anyone tell me what I am doing wrong? Why do I lose my inserts when this code is added? Note:when I comment out the items pertaining to starring / sPattern, the inserts will update. Also if I load the starring code in a file identical to the code below, BUT load it by itself, it too will insert values in the $url, $starring, and the $default_ts fields of the table. while ($row = $rows->fetch_row()) { $url = $row[0]; //echo $url . "\n\n"; $sPattern = ""; $dPattern = ""; $gPattern = ""; $rPattern = ""; $rTPattern = ""; $starringPattern = ""; if ($source = file_get_contents($url)) { $source= preg_replace('/\s\s+/', ' ', )$source; $source = preg_replace('/\n\r|\r\n/', '', $source); $sPattern = '<span class="label">Summary:<\/span>.* <span class="data" itemprop="description">.*'; $sPattern .= '(<span class="blurb blurb_collapsed">)?(.*)<\/span>.*'; $dPattern = '<span class="label">Director:<\/span>.*<span class="data" itemprop="name">(.*)<\/span>.*'; $gPattern = '<span class="label">Genre\(s\): <\/span>.*<span class="data" itemprop="genre">(.*)<\/span>.*'; //backslashes before and after parenthisis here is to escape out without making () another pattern $rPattern = '<span class="label">Rating:<\/span>.*<span class="data" itemprop="contentRating">(.*)<\/span>.*'; $rTPattern = '<span class="label">Runtime:<\/span>.*<span class="data">(.*)<\/span>'; $starringPattern = '<span class="label">Starring:<\/span>.*<span class="data">.*<span itemprop="name">(.*)<\/span><\/a><\/span>.*'; $pattern = '/' . $sPattern . $dPattern . $gPattern . $rPattern . $rTPattern . $starringPattern . '/siU'; echo $pattern ."\n"; date_default_timezone_set("America/Chicago"); preg_match_all($pattern, $source, $results); if (isset($results[0][0])) { $summary = str_replace("'", "-", $results[2][0]); $summary = preg_replace('/(<.*span.*>|<.*\/span.*>|<.*a .*rel\s*=.*> |<.*\/a.*>)/U', ' ', $summary); $summary = preg_replace('/\s\s+/', ' ', $summary); $dir = str_replace("'", "-", $results[3][0]); $starring = str_replace("'", "-", $results[4][0]); $gen = str_replace("'", "-", $results[5][0]); $rating = str_replace("'", "-", $results[6][0]); $duration = str_replace("'", "-", $results[7][0]); $default_ts = date("Y-m-d h:i:s", time()); $sqlString = "insert into myMovies values('$url', '$summary', '$dir', '$starring', '$gen', '$rating', '$duration', '$default_ts')"; } } } $conn->close(); } ?> Keep in mind that this is my very first post pertaining to programming so if I have not provided sufficient information or if its still unclear the help I am asking for please let me know. Thanks in advanced
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.