Jump to content

[SOLVED] Regex assistance needed


Acheron

Recommended Posts

Hi all,

 

Would appreciate it if someone a little better at regex could assist with this one. :)

 

I'm parsing a TVRage RSS feed and I want to split off the title from the ep and network.

 

I hit a little snag on titles with parens though. Here's an example of what I have thus far:

 

$fulltitle = 'Doctor Who (2005) (S02-Special) [sciFi]';
// network works fine
$net = preg_match('/\[(.+)\]/', $fulltitle, $network) ? $network[1] : '';

 

If I apply the same type of regex to the ep as the network, it works fine for titles like:

 

$fulltitle = 'Standoff (01x16) [FOX]';
// this works works fine
$ep = preg_match('/\((.+)\)/', $fulltitle, $episode) ? $episode[1] : '';

 

 

Doctor Who throws it off though of course though b/c of the (2005) in the title, so I figured I should match the parens unless they contain a 4-digit year (going on the assumption that parens are never used for anything else in the title). I seem to be having some problems with that though so hopefully someone can assist - thereby saving me from adding a 3rd regex to remove year 1st out of frustration.

 

 

 

Link to comment
Share on other sites

Well I wouldn't mind finding out the solution to the above but I just learned I can't use it in this case anyway, as they put all kinds of things in the parens or sometimes no parens at all.

 

Even the ep parens are "unreliable" ... usually they look like (01x16) but they can also come out as (S02-Special).  I'm not really sure how to approach this. One thought I have is to use different methods depending on whether there are 1 set of parens or 2.

Link to comment
Share on other sites

Not allowed to edit other posts but thought maybe the solution might help someone in the future who wants to parse out a TVRage feed. This may not be the absolute best way but it works nicely for all the different <title> tag formats I have come across on TVR so far.

 

// the title we get from the feed
$rss_channel['ITEMS'][$i]['TITLE'] = '- Doctor Who (2005) (S02-Special) [sciFi]';
// strip the useless - that they always stick in there
$fulltitle = substr($rss_channel['ITEMS'][$i]['TITLE'], 2);
// match all parens
preg_match_all('/\((.+?)\)/',$fulltitle,$epcheck);
// if there are 2, the 2nd one is our ep else use the 1st
$ep = $epcheck[1][1] ? $epcheck[1][1] : $epcheck[1][0];
$net = preg_match('/\[(.+)\]/',$fulltitle,$network) ? $network[1] : '';
// search-replace the ep-network so we are left with the title only
$search = array('(' . $ep . ')', '[' . $net . ']');
$title = trim(str_replace($search, '', $fulltitle));

 

So it started like this:

 

- Doctor Who (2005) (S02-Special) [sciFi]

 

... and now we have this:

 

Title: Doctor Who (2005)

Episode: S02-Special

Network: SciFi

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.