Skip to content

jonathanlmc/medinpar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

medinpar

This is a Rust crate to parse strings for identifying media information commonly found in filenames, such as season and episode numbers. Given input like [Tag 1] Series Name - S01E02 1080p (Tag 2), this crate can detect the season number as 1 and the episode number as 2.

Features

  • Fast. Parsing is done using the winnow crate, and allocations are avoided unless they are absolutely necessary.
  • The internal parser combinators used by this crate are publicly exported under the parse module and and can be used to build custom parsers for more esoteric naming layouts.

Episode parser

Parsing episode filenames from as many esoteric naming layouts as possible is the primary goal of this crate.

Some common filename layouts supported by the episode parser are as follows:

  • Series Name - S01E02
  • Series Name - 01
  • [Tag] Series Name - Episode 1
  • 01
  • (Tag) 01 - Series Name
  • Series Name - Season 1 Episode 1

More filename layouts are explicitly supported, and can be found in the test suite for the episode parser here.

Besides season and episode numbers, the episode parser can also find identifying series format information (ex. to indicate if the episode is for a TV season or movie) by providing an enum that implements this crate's Format trait to the episode parser.

Series parser

In addition to parsing episodes, this crate also contains a (basic) parser for series information that is often found in directory names. Currently, its only functionality is to trim tags that may be present.

Example Usage

use medinpar::{Episode, Format, Series};

// you can define formats that will be looked for when parsing an episode
#[derive(Debug, Copy, Clone, PartialEq)]
enum BasicFormat {
    Tv,
    Movie,
}

impl Format for BasicFormat {
    const VARIANT_MAPPINGS: &[(&'static str, Self)] = &[
        // case-insensitive
        ("tv", Self::Tv),
        ("movie", Self::Movie),
    ];
}

// parse an episode using the formats provided by `BasicFormat`
//
// note that if you are parsing filenames, you should only provide its stem
// to the parser
let episode = Episode::<BasicFormat>::parse("[Tag 1] Series Title 1080p TV - S01E02");

assert_eq!(episode, Ok(Episode {
    // the episode number is generally required for parsing to succeed
    //
    // if a matching format is found in the input, the episode number can be missing
    // and the parser will assume this is a one-off episode by returning `1` for the
    // episode number
    number: 2,
    // season numbers can be omitted
    season: Some(1),
    // as well as the format
    format: Some(BasicFormat::Tv),
}));

// parse an episode without any formats defined
let episode = Episode::<()>::parse("[Tag 1] Series Title 1080p TV Season 2 - 01");

assert_eq!(episode, Ok(Episode {
    number: 1,
    season: Some(2),
    // since no formats were defined, `TV` is just interpreted as noise in the input
    format: None,
}));

// parse a one-off episode
//
// this would fail if no formats were given to the parser
let episode = Episode::<BasicFormat>::parse("Series Title - Movie");

assert_eq!(episode, Ok(Episode {
    number: 1,
    season: None,
    format: Some(BasicFormat::Movie),
}));

// parse a series title
let series = Series::parse("(Tag 1) (Tag 2) Series Title - A New Hope 2160p [Tag 3]");

assert_eq!(series, Ok(Series {
    trimmed_name: "Series Title - A New Hope",
}));

Contributions

Contributions are welcome as long as they do not slow down any existing parser significantly, and they are not AI slop1.

This crate was made for my own anup project that is currently being rewritten. Additions to functionality / data extracted are generally based off that project's requirements, although I am personally open to adding more extracted data to each parser upon request, as long as it's reasonable.

Footnotes

  1. More specifically, please do not make a PR if you do not fully understand the code generated by a LLM or could not write it better yourself.

About

A Rust crate to parse identifying media information (such as episode and season numbers) commonly used in filenames.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages