This is a Rust crate to parse strings for identifying media information commonly found in filenames, such as season and episode numbers. Given input like [Tag 1] Series Name - S01E02 1080p (Tag 2), this crate can detect the season number as 1 and the episode number as 2.
- Fast. Parsing is done using the winnow crate, and allocations are avoided unless they are absolutely necessary.
- The internal parser combinators used by this crate are publicly exported under the
parsemodule and and can be used to build custom parsers for more esoteric naming layouts.
Parsing episode filenames from as many esoteric naming layouts as possible is the primary goal of this crate.
Some common filename layouts supported by the episode parser are as follows:
Series Name - S01E02Series Name - 01[Tag] Series Name - Episode 101(Tag) 01 - Series NameSeries Name - Season 1 Episode 1
More filename layouts are explicitly supported, and can be found in the test suite for the episode parser here.
Besides season and episode numbers, the episode parser can also find identifying series format information (ex. to indicate if the episode is for a TV season or movie) by providing an enum that implements this crate's Format trait to the episode parser.
In addition to parsing episodes, this crate also contains a (basic) parser for series information that is often found in directory names. Currently, its only functionality is to trim tags that may be present.
use medinpar::{Episode, Format, Series};
// you can define formats that will be looked for when parsing an episode
#[derive(Debug, Copy, Clone, PartialEq)]
enum BasicFormat {
Tv,
Movie,
}
impl Format for BasicFormat {
const VARIANT_MAPPINGS: &[(&'static str, Self)] = &[
// case-insensitive
("tv", Self::Tv),
("movie", Self::Movie),
];
}
// parse an episode using the formats provided by `BasicFormat`
//
// note that if you are parsing filenames, you should only provide its stem
// to the parser
let episode = Episode::<BasicFormat>::parse("[Tag 1] Series Title 1080p TV - S01E02");
assert_eq!(episode, Ok(Episode {
// the episode number is generally required for parsing to succeed
//
// if a matching format is found in the input, the episode number can be missing
// and the parser will assume this is a one-off episode by returning `1` for the
// episode number
number: 2,
// season numbers can be omitted
season: Some(1),
// as well as the format
format: Some(BasicFormat::Tv),
}));
// parse an episode without any formats defined
let episode = Episode::<()>::parse("[Tag 1] Series Title 1080p TV Season 2 - 01");
assert_eq!(episode, Ok(Episode {
number: 1,
season: Some(2),
// since no formats were defined, `TV` is just interpreted as noise in the input
format: None,
}));
// parse a one-off episode
//
// this would fail if no formats were given to the parser
let episode = Episode::<BasicFormat>::parse("Series Title - Movie");
assert_eq!(episode, Ok(Episode {
number: 1,
season: None,
format: Some(BasicFormat::Movie),
}));
// parse a series title
let series = Series::parse("(Tag 1) (Tag 2) Series Title - A New Hope 2160p [Tag 3]");
assert_eq!(series, Ok(Series {
trimmed_name: "Series Title - A New Hope",
}));Contributions are welcome as long as they do not slow down any existing parser significantly, and they are not AI slop1.
This crate was made for my own anup project that is currently being rewritten. Additions to functionality / data extracted are generally based off that project's requirements, although I am personally open to adding more extracted data to each parser upon request, as long as it's reasonable.
Footnotes
-
More specifically, please do not make a PR if you do not fully understand the code generated by a LLM or could not write it better yourself. ↩