Acheron Posted July 7, 2007 Share Posted July 7, 2007 Hi all, Would appreciate it if someone a little better at regex could assist with this one. I'm parsing a TVRage RSS feed and I want to split off the title from the ep and network. I hit a little snag on titles with parens though. Here's an example of what I have thus far: $fulltitle = 'Doctor Who (2005) (S02-Special) [sciFi]'; // network works fine $net = preg_match('/\[(.+)\]/', $fulltitle, $network) ? $network[1] : ''; If I apply the same type of regex to the ep as the network, it works fine for titles like: $fulltitle = 'Standoff (01x16) [FOX]'; // this works works fine $ep = preg_match('/\((.+)\)/', $fulltitle, $episode) ? $episode[1] : ''; Doctor Who throws it off though of course though b/c of the (2005) in the title, so I figured I should match the parens unless they contain a 4-digit year (going on the assumption that parens are never used for anything else in the title). I seem to be having some problems with that though so hopefully someone can assist - thereby saving me from adding a 3rd regex to remove year 1st out of frustration. Quote Link to comment Share on other sites More sharing options...
Acheron Posted July 8, 2007 Author Share Posted July 8, 2007 Well I wouldn't mind finding out the solution to the above but I just learned I can't use it in this case anyway, as they put all kinds of things in the parens or sometimes no parens at all. Even the ep parens are "unreliable" ... usually they look like (01x16) but they can also come out as (S02-Special). I'm not really sure how to approach this. One thought I have is to use different methods depending on whether there are 1 set of parens or 2. Quote Link to comment Share on other sites More sharing options...
Acheron Posted July 8, 2007 Author Share Posted July 8, 2007 Not allowed to edit other posts but thought maybe the solution might help someone in the future who wants to parse out a TVRage feed. This may not be the absolute best way but it works nicely for all the different <title> tag formats I have come across on TVR so far. // the title we get from the feed $rss_channel['ITEMS'][$i]['TITLE'] = '- Doctor Who (2005) (S02-Special) [sciFi]'; // strip the useless - that they always stick in there $fulltitle = substr($rss_channel['ITEMS'][$i]['TITLE'], 2); // match all parens preg_match_all('/\((.+?)\)/',$fulltitle,$epcheck); // if there are 2, the 2nd one is our ep else use the 1st $ep = $epcheck[1][1] ? $epcheck[1][1] : $epcheck[1][0]; $net = preg_match('/\[(.+)\]/',$fulltitle,$network) ? $network[1] : ''; // search-replace the ep-network so we are left with the title only $search = array('(' . $ep . ')', '[' . $net . ']'); $title = trim(str_replace($search, '', $fulltitle)); So it started like this: - Doctor Who (2005) (S02-Special) [sciFi] ... and now we have this: Title: Doctor Who (2005) Episode: S02-Special Network: SciFi Quote Link to comment Share on other sites More sharing options...
Wildbug Posted July 8, 2007 Share Posted July 8, 2007 Would this work? <?php preg_match('/^(.*)\s*\((.*?)\)\s*\[(.*?)\]$/',$fulltitle,$matched); print_r($matched); ?> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.