huddy Posted May 15, 2012 Share Posted May 15, 2012 Hi All I'm after some advice about how to write a parser to match many different naming formats of a tv episode, so the one thing we can rely on is that a episodes are inside a folder named after the show, for example "House MD\episode name. s1e07.mkv" so we know the show name, that part is easy, the bit I need to match is the season and episode number. The format of the file name can differ is so many ways, here are some examples: - S01E01 - s1e10 - S3e6 - 105 (for season 1, episode 5) - EP01 (usually there's one season in this case, but not 100% of the time) So as you can see there are a few variations, almost always the show name is in there too, and sometimes they contain resolution, so: House.MD S1E09 720p HD.mkv, there are many combinations but since we already know the show name it's not something I think we need to worry *too* much about. My question relations to how you would approach this? You can see from the examples above sometimes this would be hard to match and work out. My initial idea would be to have a class called 'tvmatcher', or something, which has match handlers, one match handler for each format we need match, a handler would be a class that extends tvmatcher and have the same method, like $handler->match($string); the first one to match would be the winner. This could be extended to sanity check the result and ensure that the season/ep actually exists. I really don't know how to go about this, the idea above is my best so far, so again my questions really are: How would you approach this? Is there a pattern that would help? any other ideas about how this could be acheived? It's worth noting that I'd be using external APIs to get show information, but would be out of the scope of this library, but could help with said sanity checks. Cheers, Billy Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/ Share on other sites More sharing options...
scootstah Posted May 15, 2012 Share Posted May 15, 2012 Well the first three are easy enough: $file = 'House MD\episode name. s1e07.mkv'; if (preg_match('/s([0-9]+)e([0-9]+)/i', $file, $matches)) { array_shift($matches); print_r($matches); list($season, $episode) = $matches; } The last two might be tricky because they're too ambiguous. For example, 105 could either be season 1 episode 05, or season 10 episode 5. How do you distinguish that? Also it is impossible to derive the season number from "EP01". Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/#findComment-1345595 Share on other sites More sharing options...
huddy Posted May 15, 2012 Author Share Posted May 15, 2012 Hi Scootstah, Thanks for your time I agree that the regex for some of them is quite easy, but the list provided isn't all of the actual formats you see, I think my question is more about how you deal with matching or parsing such a wide variety of formats in a nice, maintainable & reliable way. I could just put lots of regex patterns in an array and loop through them and run them against a file name to see if they match then have lots of if statements or a switch statement, but I think this will become quite complicated as you start matching 20+ formats, especially if you need more logic surrounding them, take the EP01 as an example, it could be there's only one season, so you could call out an API and check how many seasons the show had, if it's only 1, then assume it's season 1. I might be wrong, and as always willing to be proved wrong. Thanks! Billy Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/#findComment-1345609 Share on other sites More sharing options...
scootstah Posted May 15, 2012 Share Posted May 15, 2012 Without seeing all of the formats I can't really give a solid answer. But I would probably just have a few if/elseif statements to check for the different formats. Clever use of regex should cut it down a bit. For example, one pattern matches the first three formats you listed. Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/#findComment-1345612 Share on other sites More sharing options...
ignace Posted May 15, 2012 Share Posted May 15, 2012 Now only if those pirates would just name those files consistently... Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/#findComment-1345712 Share on other sites More sharing options...
huddy Posted May 16, 2012 Author Share Posted May 16, 2012 Hi All So I wrote this library last night regarding this subject: https://github.com/huddy/tvfilename It would be great to get some feedback on my code. Billy Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/#findComment-1345868 Share on other sites More sharing options...
ignace Posted May 16, 2012 Share Posted May 16, 2012 You should add more responsibilities to your object, currently it does nearly nothing: $tv = new tvfilename\tvfilename('/path/to/dir', array('*.txt', '*.db', '*.url')); foreach ($tv as $episode) { echo 'SEASON: ', $episode->season, "\r\n", 'EPISODE: ', $episode->episode, "\r\n" 'SERIE: ', $episode->serie; } You could abstract what it traverses and get something like: $tv = new tvfilename\tvfilename(new tvfilename\RecursiveDirectoryFilterIterator('/path/to/dir', array('*.txt', '*.db', '*.url'))); Or in case of a DB: $tv = new tvfilename\tvfilename(new tvfilename\DbRecordIterator($dbAdapter)); Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/#findComment-1345902 Share on other sites More sharing options...
huddy Posted May 16, 2012 Author Share Posted May 16, 2012 Hi Ignace I'm sort of going with a single responsibility theme, so this library is only responsible for the matching the episode/season, so by adding more responsibility I would be going away from this?. Maybe adding adapters for different types of data isn't going away from that theory? I want it to be loosely coupled, so they can be used in other projects without any modification, it's also worth noting that this library can match strings weather they be from a filename or some other source. Given that, would you still say parsing directory/string adapter objects through? I do sort of like the idea of it Cheers Billy Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/#findComment-1345917 Share on other sites More sharing options...
ignace Posted May 16, 2012 Share Posted May 16, 2012 Well we can reasonably assume it will only be used to traverse directories, so i see no need to divving up the responsibilities. In what other ways are you planning on using it? Why have you started this project in the first place? Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/#findComment-1346015 Share on other sites More sharing options...
ignace Posted May 17, 2012 Share Posted May 17, 2012 You can always at a later point refactor your code and create re-usable components out of it. For example: class tvfilename extends RecursiveIteratorIterator { When a string is provided we assume it's a directory path: public function __construct($input, $filter = null) { if (is_string($input)) { parent::__construct(new RecursiveDirectoryIterator($input, $filter)); } else { parent::__construct($input); } } When traversing you would pass the filename/whatever to a parser: public function current() { return $this->_parser->parse(parent::current()); } So at this point you should get an StdClass from current() with properties, if you want it to be of a certain class then you should also supply a factory to your parser: class EpisodeParser { private $_factory; public function __construct(Factory $factory = null) { $this->_factory = $factory; } public function parse($string) { // parsing if (!is_null($this->_factory)) { $parsed = $this->_factory->make($parsed); } return $parsed; } } All the responsibilities divvid up You would use it like this: $tv = new tvfilename\tvfilename('/path/to/dir', array('*.txt', '*.db', '*.url')); foreach ($tv as $episode) { echo 'SEASON: ', $episode->season, "\r\n", 'EPISODE: ', $episode->episode, "\r\n" 'SERIE: ', $episode->serie; } A simple intuitive interface Quote Link to comment https://forums.phpfreaks.com/topic/262560-designarchitecture-code-for-matching-shows-from-filenames/#findComment-1346255 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.