VobSub2SRT is a simple command line program to convert .idx / .sub subtitles
into .srt text subtitles by using OCR. It is based on code from the
MPlayer project - a really really great movie player. Some minor parts are
copied from ffmpeg/avutil headers. Tesseract is used as OCR software.
Maintained by Christopher Ogloff
vobsub2srt is released under the GPL3+ license. The MPlayer code included is GPL2+ licensed.
The quality of the OCR depends on the text in the subtitles. Currently the code
does not use any preprocessing. But I’m currently looking into adding filters
and scaling options to improve the OCR. You can try adjusting –min-width and
–min-height, or –y-threshold. Otherwise correct mistakes in the .srt
files with a text editor or a special subtitle editor.
**legacy: This branch will likely get less attention than master. Bugs (or more of them) may be here.
- Debian GNU/Linux 5 “Lenny” (fully updated final point release, details slipped my mind)
- Debian GNU/Linux 6.0.10 “Squeeze”
- Fedora 33
**master
- Arch Linux
- Fedora 34-44, and Rawhide (Fedora 39 requires cmake downgrade to 3.27 if using 3.30)
- Debian 9-14
You will need Tesseract, CMake 2.6.0+ (legacy, CMake 3.5 for master) and GCC 4.3+ (legacy, tested with GCC 15.2.1 on master) to build it, as well as a few imaging libraries. On a Debian-based system you can install the dependencies with:
Note: Substitute libtiff-dev with libtiff4(or 5, etc)-dev if the package is not found!
sudo apt-get install libtesseract-dev build-essential cmake pkg-config
You should also install the tesseract data for the languages you want to use!
sudo apt-get install tesseract-ocr-deu tesseract-ocr-nor tesseract-ocr-fra
Note that support for tesseract below version 2.4 is deprecated and may not work.
./configure make sudo make install
This should install the program vobsub2srt to /usr/local/bin. You can
uninstall vobsub2srt with sudo make uninstall.
I recommend using the dynamic binary! However if you really need a static binary
you can add the flag -DBUILD_STATIC=ON to the ./configure call. But be
aware that building static binaries can be quite troublesome. You need the
static library files for tesseract, libtill, libavutils, and for their
dependencies as well. On Ubuntu 12.04 the static libraries are only included in
the dev packages! You may also need the Gold linker.
For Debian you may need the following extra packages:
sudo apt-get install libpng-dev libwebp-dev zlib1g-dev libjpeg-dev
Work needs to be done for auto-detection.
If linking fails with undefined references then checking what other dependencies
your version of leptonica has is a good starting point. You can do this by
running ldd /usr/lib/liblept.so (or whatever the path to leptonica is on your
system). Add those dependencies to CMakeModules/FindTesseract.cmake.
NOTE: This PPA has not been updated since 2013
I have created a PPA (Personal Package Archive) to make installation on
Ubuntu easy. Simply add the PPA to your apt-get sources and run an update and
you can install the vobsub2srt package:
sudo add-apt-repository ppa:ruediger-c-plusplus/vobsub2srt sudo apt-get update sudo apt-get install vobsub2srt
You can build a *.deb package (Debian and its derivatives) with make package. The package
is created in the build directory.
You can also create a source package and upload it to your own PPA by using the
UploadPPA.cmake. But this is only recommended for people experienced with
cmake and creating Debian packages.
Vobsub2srt contains a formula for Homebrew (a package manager for OS X). It can be installed by using the following commands:
brew install --with-all-languages tesseract brew install --HEAD https://github.com/BuiltOnLogic/VobSub2SRT/raw/master/packaging/vobsub2srt.rb
An ebuild for Gentoo Linux is also available. You can make it available to emerge with the following steps
sudo mkdir -p /usr/local/portage/media-video/vobsub2srt/ wget https://github.com/BuiltOnLogic/VobSub2SRT/raw/master/packaging/vobsub2srt-9999.ebuild sudo mv vobsub2srt-9999.ebuild /usr/local/portage/media-video/vobsub2srt/ cd /usr/local/portage/media-video/vobsub2srt/ sudo ebuild vobsub2srt-999.ebuild digest
You should be able to install vobsub2srt with emerge vobsub2srt now. If you
want to use a newer version (3+) of tesseract you have to use layman.
See #13 for details.
There also exist a PKGBUILD file for Arch Linux in AUR: https://aur.archlinux.org/packages/vobsub2srt-git
vobsub2srt converts subtitles in VobSub (.idx / .sub) format into subtitles
in .srt format. VobSub subtitles consist of two or three files called
Filename.idx, Filename.sub and optional Filename.ifo. To convert subtitles
simply call
$ vobsub2srt Filename Wrote subtitles to 'Filename.srt' $
If it exits cleanly but produces no text, you need to specify an IFO file.
vobsub2srt --ifo <IFO-file-path> Filename
and/or a language (--tesseract-lang <lang>) if your DVD has undefined language metadata (most are like this).
with Filename being the file name of the subtitle files WITHOUT the
extension (.idx / .sub). vobsub2srt writes the subtitles to a file called
Filename.srt.
If a subtitle file contains more than one language you can use the --lang or
--index parameter to set the correct language (Use --langlist to find out about
the languages in the file). For some languages you might need to set the tesseract
language yourself (e.g., chi_tra/chi_sim for traditional or simplified chinese
characters). You can use --tesseract-lang to do this. In most cases this
should however be autodetected.
If you want to dump the subtitles as images (e.g. to check for correct ocr) you
can use the --dump-images flag.
Use --help or read the manpage to get more information about the options of
vobsub2srt.
Please submit bug reports or feature requests to the issue tracker on GitHub.
This fork is currently maintained at https://github.com/BuiltOnLogic/.
If you have problems with a specific subtitle file then please check if it works in a video player first.
For bug reports please run vobsub2srt with the --verbose and --debug
option and copy and paste the full output to the bug report.
Most code is from the MPlayer project.
- Christopher Ogloff [fixed code and tested working from kernel 2.6 to 7.0 on multiple distributions]
- Armin Häberling <armin.aha@gmail.com> wrote a patch to fix an issue with multiple instances of the same subtitle in result file (21af426)
- James Harris <jimmy@jamesharris.org> wrote the formula for Homebrew (54f311d6)
- Leo Koppelkamm reported and fixed issue #5 and problems with long filenames (b903074c, 36ec8da, d3602d6)
- Till Korten <webmaster@korten-privat.de> wrote the ebuild script (#13)
- Andreasf fixed missing libavutil include path (3a175eb, #15)
- Michal Gawlik fixed the overlapping issue (5b2ccabc55f, #29, #32)
- “bit” made sure no trailing whitespace are written to the SRT (3a59dc278abc2, #38)
- Baudouin Raoult for various fixes (028f742, #44, b722a03, #42, 7293ac2, #40)
- Justyn Butler added the y-threshold support (f873761, #43)
- James Laird-Wah added min-width/height support and fixed other issues (41c6844, #48, #46)
- Filirom1 fixed a minor issue (4ed58c2, #49)
- implement preprocessing (first step scaling. Code available in
spudec.c)