-
Notifications
You must be signed in to change notification settings - Fork 0
Compiling
Note: This wiki expects you to be familiar with compiling software on your operation system.
Autotools Leptonica
If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04):
sudo apt-get install autoconf automake libtool
sudo apt-get install pkg-config
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev
if you plan to install the training tools, you also need the following libraries:
sudo apt-get install libicu-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install libcairo2-dev
You also need to install Leptonica.
One option is to install the distro's Leptonica package:
apt-get install libleptonica-dev
but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source.
Tesseract versions and the minimum version of Leptonica required:
| Tesseract | Leptonica | Ubuntu |
|---|---|---|
| 3.04 | 1.71 | Ubuntu 16.04 |
| 3.03 | 1.70 | Ubuntu 14.04 |
| 3.02 | 1.69 | Ubuntu 12.04 |
| 3.01 | 1.67 |
The sources are at http://www.leptonica.org/. The instructions at Leptonica README are clear, but basically it is as described in Compilation below.
Ensure that the development headers for Leptonica are installed before compiling Tesseract. Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at Stackoverflow is very helpful.
Tesseract uses a standard autotools based build system, so the compilation process should be familiar.
./autogen.sh
./configure
make
sudo make install
sudo ldconfig
Since we have to compile leptonica in ubuntu 14.04, we should use LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make instead of make.
On some systems autotools does not create m4 directory automatically (giving the error: "configure: error: cannot find macro directory 'm4'"). In this case you must create m4 directory (mkdir m4), and then rerun the above commands starting with ./configure.
If you want the training tools (3.03), you will also need to run the following commands:
make training
sudo make training-install
Build of training tools is not available if you do not have necessary dependencies (pay attention to messages from ./configure script).
Tesseract can be configured to install anywhere, which makes it possible to install it without root access.
To install it in $HOME/local:
./autogen.sh
./configure --prefix=$HOME/local/
make install
To install it in $HOME/local using Leptonica libraries also installed in $HOME/local:
./autogen.sh
LIBLEPT_HEADERSDIR=$HOME/local/include ./configure \
--prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
make install
- Download language data file (e.g. 'https://github.com/tesseract-ocr/tessdata/archive/master.zip' for 3.01 version)
- Decompress it
- Move it to installation of
tessdata(e.g. 'mv tessdata $TESSDATA_PREFIX' if definedTESSDATA_PREFIX)
You can also use:
export TESSDATA_PREFIX=/some/path/to/tessdata
to point to your tessdata directory (example: if your tessdata path is '/usr/local/share/tessdata' you have to use 'export TESSDATA_PREFIX='/usr/local/share/').
-
Download and install Git, CMake and put them in PATH.
-
Download the latest CPPAN (C++ Archive Network
https://cppan.org/) client fromhttps://cppan.org/client/. CPPAN is a source package distribution system. Add CPPAN client in PATH too. (VS2015 redist is required.) -
If you have a release archive, unpack it to
tesseractdir. If you're using master branch rungit clone https://github.com/tesseract-ocr/tesseract tesseract -
Run
cd tesseract cppan mkdir build && cd build cmake .. -DSTATIC=1 -
Build a solution (
tesseract.sln) in your Visual Studio version.
To use tesseract in your application see this very simple example https://github.com/tesseract-ocr/tesseract/wiki/User-App-Example.
If you have Visual Studio 2015, checkout the repository at [Leptonica 1.73 for Visual Studio 2015 which has the solution for Tesseract also] (https://github.com/peirick/VS2015_Tesseract) and click on build_tesseract.bat. After that you still need to download the language packs.
Have a look at blog How to build Tesseract 3.03 with Visual Studio 2013.
For tesseract-ocr 3.02 please follow instruction in Visual Studio 2008 Developer Notes for Tesseract-OCR.
Download these packages from the Downloads Archive on SourceForge page:
-
tesseract-3.01.tar.gz- Tesseract source -
tesseract-3.01-win_vs.zip- Visual studio (2008 & 2010) solution with necessary libraries -
tesseract-ocr-3.01.eng.tar.gz- English language file for tesseract (or download other language training file)
Unpack them to one directory (e.g. tesseract-3.01). Note that tesseract-ocr-3.01.eng.tar.gz names the root directory 'tesseract-ocr' instead of 'tesseract-3.01'.
Windows relevant files are located in vs2008 directory (e.g. 'tesseract-3.01\vs2008'). The same build process as usual applies: Open tesseract.sln with VC++Express 2008 and build all (or just Tesseract.) It should compile (in at least release mode) without having to install anything further. The dll dependencies and Leptonica are included. Output will be in tesseract-3.01\vs2008\bin (or tesseract-3.01\vs2008\bin.rd or tesseract-3.01\vs2008\bin.dbg based on configuration build).
For Mingw+Msys have a look at blog Compiling Leptonica and Tesseract-ocr with Mingw+Msys.
Download and install Msys2 as given at MSYS2 installation Also read instructions at Contributing to MSYS2
The core packages groups you need to install if you wish to build from PKGBUILDs are:
- base-devel for any building
- msys2-devel for building msys2 packages
- mingw-w64-i686-toolchain for building mingw32 packages
- mingw-w64-x86_64-toolchain for building mingw64 packages
To build the release package, use PKGBUILD from https://github.com/Alexpux/MINGW-packages/tree/master/mingw-w64-tesseract-ocr
To build from the github source, create a PKGBUILD with the following commands.
#
_realname=tesseract-ocr
pkgbase=mingw-w64-${_realname}-git
pkgname="${MINGW_PACKAGE_PREFIX}-${_realname}"
provides=("${MINGW_PACKAGE_PREFIX}-${_realname}")
replaces=("${MINGW_PACKAGE_PREFIX}-${_realname}")
pkgver=1310.60176fc
pkgrel=1
pkgdesc="Tesseract OCR (mingw-w64)"
arch=('any')
url="https://github.com/tesseract-ocr/tesseract"
license=("Apache License 2.0")
makedepends=("${MINGW_PACKAGE_PREFIX}-gcc" "${MINGW_PACKAGE_PREFIX}-pkg-config")
depends=(${MINGW_PACKAGE_PREFIX}-cairo
${MINGW_PACKAGE_PREFIX}-cairomm
${MINGW_PACKAGE_PREFIX}-fontconfig
${MINGW_PACKAGE_PREFIX}-gcc-libs
${MINGW_PACKAGE_PREFIX}-icu
icu-devel
git
${MINGW_PACKAGE_PREFIX}-leptonica
${MINGW_PACKAGE_PREFIX}-pango
${MINGW_PACKAGE_PREFIX}-pangomm
${MINGW_PACKAGE_PREFIX}-tesseract-data-eng
${MINGW_PACKAGE_PREFIX}-zlib)
options=('!libtool' '!emptydirs' '!strip' 'debug')
source=("tesseract"::"git+https://github.com/tesseract-ocr/tesseract.git#branch=master"
https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata)
sha256sums=('SKIP'
'9cf5d576fcc47564f11265841e5ca839001e7e6f38ff7f7aacf46d15a96b00ff')
pkgver() {
cd "${srcdir}/tesseract"
printf "%s.%s" "$(git rev-list --count HEAD)" "$(git rev-parse --short HEAD)"
}
prepare() {
cd "${srcdir}/tesseract"
./autogen.sh
}
build() {
[[ -d "${srcdir}/build-${MINGW_CHOST}" ]] && rm -rf "${srcdir}/build-${MINGW_CHOST}"
mkdir "${srcdir}/build-${MINGW_CHOST}"
cd "${srcdir}/build-${MINGW_CHOST}"
local -a extra_config
if check_option "debug" "y"; then
extra_config+=( --enable-debug )
fi
"${srcdir}/tesseract"/configure \
--build=${MINGW_CHOST} \
--host=${MINGW_CHOST} \
--target=${MINGW_CHOST} \
--prefix=${MINGW_PREFIX} \
LIBLEPT_HEADERSDIR=${MINGW_PREFIX}/include \
"${extra_config[@]}"
make
}
package() {
cd "${srcdir}/build-${MINGW_CHOST}"
make DESTDIR="${pkgdir}" install
make training
make DESTDIR="${pkgdir}" training-install
mkdir -p ${pkgdir}${MINGW_PREFIX}/share/tessdata
install -Dm0644 ${srcdir}/osd.traineddata ${pkgdir}${MINGW_PREFIX}/share/tessdata/osd.traineddata
}
build and install as follows:
cd MINGW-packages/mingw-w64-tesseract-ocr
makepkg-mingw -sLf
pacman -U mingw-w64-*-tesseract-ocr-*-any.pkg.tar.xz
For Cygwin have a look at blog How to build Tesseract on Cygwin. Simon Eigeldinger has provided binaries tesseract compiled by cygwin.
On cygwin Marco Atzeri has packaged Tesseract as well as the training utilities for 3.04.00 along with some training data. Instruction for cygwin installation is here: https://cygwin.com/cygwin-ug-net/setup-net.html
Tesseract specific packages to be installed:
tesseract-ocr 3.04.01-1
tesseract-ocr-eng 3.04-1
tesseract-training-core 3.04-1
tesseract-training-eng 3.04-1
tesseract-training-util 3.04.01-1
Mingw-w64 allows building 32- or 64-bit executables for Windows. It can be used for native compilations on Windows, but also for cross compilations on Linux (which are easier and faster than native compilations). Most large Linux distributions already contain packages with the tools need for a cross build. Before building Tesseract, it is necessary to build some prerequisites.
For Debian and similar distributions (e. g. Ubuntu), the cross tools can be installed like that:
# Development environment targeting 32- and 64-bit Windows (required)
apt-get install mingw-w64
# Development tools for 32- and 64-bit Windows (optional)
apt-get install mingw-w64-tools
These prerequisites will be needed:
- libpng, libtiff, zlib (binaries for Mingw-w64 available as part of the GTK+ bundles)
- libicu
- liblcms2
- openjpeg
- leptonica
sudo port install automake autoconf
sudo port install pkgconfig
sudo port install leptonica
git clone git@github.com:tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure --with-extra-libraries=/opt/local/lib
make
sudo make install
In the above training tools are not installed. You can install not only tesseract but also training tools like below.
sudo port install cairo pango
sudo port install icu +devel
git clone https://github.com/tesseract-ocr/tesseract/
cd tesseract
./autogen.sh
./configure \
--with-extra-libraries=/opt/local/lib \
--with-extra-includes=/opt/local/include \
LDFLAGS=-L/opt/local/lib \
CPPFLAGS=-I/opt/local/include
make
sudo make install
make training
sudo make training-install