Jump to content

Recommended Posts

Hi All

 

I'm after some advice about how to write a parser to match many different naming formats of a tv episode, so the one thing we can rely on is that a episodes are inside a folder named after the show, for example "House MD\episode name. s1e07.mkv" so we know the show name, that part is easy, the bit I need to match is the season and episode number.

 

The format of the file name can differ is so many ways, here are some examples:

 

- S01E01

- s1e10

- S3e6

- 105 (for season 1, episode 5)

- EP01 (usually there's one season in this case, but not 100% of the time)

 

So as you can see there are a few variations, almost always the show name is in there too, and sometimes they contain resolution, so: House.MD S1E09 720p HD.mkv, there are many combinations but since we already know the show name it's not something I think we need to worry *too* much about.

 

My question relations to how you would approach this? You can see from the examples above sometimes this would be hard to match and work out.

 

My initial idea would be to have a class called 'tvmatcher', or something, which has match handlers, one match handler for each format we need match, a handler would be a class that extends tvmatcher and have the same method, like $handler->match($string); the first one to match would be the winner.

 

This could be extended to sanity check the result and ensure that the season/ep actually exists.

 

I really don't know how to go about this, the idea above is my best so far, so again my questions really are:

 

How would you approach this?

Is there a pattern that would help?

any other ideas about how this could be acheived?

 

It's worth noting that I'd be using external APIs to get show information, but would be out of the scope of this library, but could help with said sanity checks.

 

Cheers,

Billy

 

 

Well the first three are easy enough:

$file = 'House MD\episode name. s1e07.mkv';


if (preg_match('/s([0-9]+)e([0-9]+)/i', $file, $matches)) {
array_shift($matches);

print_r($matches);

list($season, $episode) = $matches;
}

 

The last two might be tricky because they're too ambiguous. For example, 105 could either be season 1 episode 05, or season 10 episode 5. How do you distinguish that? Also it is impossible to derive the season number from "EP01".

Hi Scootstah,

 

Thanks for your time :)

 

I agree that the regex for some of them is quite easy, but the list provided isn't all of the actual formats you see, I think my question is more about how you deal with matching or parsing such a wide variety of formats in a nice, maintainable & reliable way.

 

I could just put lots of regex patterns in an array and loop through them and run them against a file name to see if they match then have lots of if statements or a switch statement, but I think this will become quite complicated as you start matching 20+ formats, especially if you need more logic surrounding them, take the EP01 as an example, it could be there's only one season, so you could call out an API and check how many seasons the show had, if it's only 1, then assume it's season 1.

 

I might be wrong, and as always willing to be proved wrong.

 

Thanks!

 

Billy

Without seeing all of the formats I can't really give a solid answer.

 

But I would probably just have a few if/elseif statements to check for the different formats. Clever use of regex should cut it down a bit. For example, one pattern matches the first three formats you listed.

You should add more responsibilities to your object, currently it does nearly nothing:

 

$tv = new tvfilename\tvfilename('/path/to/dir', array('*.txt', '*.db', '*.url'));
foreach ($tv as $episode) {
    echo 'SEASON: ', $episode->season, "\r\n",
         'EPISODE: ', $episode->episode, "\r\n"
         'SERIE: ', $episode->serie;
}

 

You could abstract what it traverses and get something like:

 

$tv = new tvfilename\tvfilename(new tvfilename\RecursiveDirectoryFilterIterator('/path/to/dir', array('*.txt', '*.db', '*.url')));

 

Or in case of a DB:

 

$tv = new tvfilename\tvfilename(new tvfilename\DbRecordIterator($dbAdapter));

Hi Ignace

 

I'm sort of going with a single responsibility theme, so this library is only responsible for the matching the episode/season, so by adding more responsibility I would be going away from this?.

 

Maybe adding adapters for different types of data isn't going away from that theory?

 

I want it to be loosely coupled, so they can be used in other projects without any modification, it's also worth noting that this library can match strings weather they be from a filename or some other source.

 

Given that, would you still say parsing directory/string adapter objects through?

 

I do sort of like the idea of it :)

 

Cheers

Billy

Well we can reasonably assume it will only be used to traverse directories, so i see no need to divving up the responsibilities. In what other ways are you planning on using it? Why have you started this project in the first place?

You can always at a later point refactor your code and create re-usable components out of it. For example:

 

class tvfilename extends RecursiveIteratorIterator {

 

When a string is provided we assume it's a directory path:

 

public function __construct($input, $filter = null) {
  if (is_string($input)) {
    parent::__construct(new RecursiveDirectoryIterator($input, $filter));
  } else {
    parent::__construct($input);
  }
}

 

When traversing you would pass the filename/whatever to a parser:

 

public function current() {
    return $this->_parser->parse(parent::current());
}

 

So at this point you should get an StdClass from current() with properties, if you want it to be of a certain class then you should also supply a factory to your parser:

 

class EpisodeParser {
  private $_factory;
  
  public function __construct(Factory $factory = null) {
    $this->_factory = $factory;
  }
  
  public function parse($string) {
    // parsing
    if (!is_null($this->_factory)) {
      $parsed = $this->_factory->make($parsed);
    }
    return $parsed;
  }
}

 

All the responsibilities divvid up :) You would use it like this:

 

$tv = new tvfilename\tvfilename('/path/to/dir', array('*.txt', '*.db', '*.url'));
foreach ($tv as $episode) {
    echo 'SEASON: ', $episode->season, "\r\n",
         'EPISODE: ', $episode->episode, "\r\n"
         'SERIE: ', $episode->serie;
}

 

A simple intuitive interface :D

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.