1. About tesseract OCR lib :
-Tesseract is probably the most accurate open source OCR engine available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google. It is released under the Apache License 2.0.
- Website : http://code.google.com/p/tesseract-ocr/
2. Support platforms :
- Tesseract works on Linux, Windows (with VC++ Express or CygWin) and Mac OSX.
3. Download tutorial source : Page RESOURCES
II. How to compile & use tesseract OCR for iOS ?
1. Resource :
- Tesseract source code : http://code.google.com/p/tesseract-ocr/downloads/list ( File : tesseract-x.xx.tar.gz and tesseract-ocr-x.xx.<lang>.tar.gz ).
- Leptonica source code : http://leptonica.org/download.html ( newest version 1.69 ).
2. Dependencies resource :
To build tesseract-ocr we need some tools.
- “make” command line. ( Xcode – Command line tools ).
- AutoTools :
+ autoconfo
+ automake
+ libtool
+ m4
+ Download :http://psrchive.sourceforge.net/third/autotools/
+ Script download &install :install_essential.sh
3. Compile :
Step 1 : Install command line tools
1. “make” command line.
- Download &install :
+ Xcode preferences -> tab Download -> line Command line tools -> install.
+ Or direct download from apple developer :https://developer.apple.com/downloads/index.action
- After install, open terminal , type : make – v to check.
2. Autotools.
- Download &install :
+ Direct download :http://psrchive.sourceforge.net/third/autotools/
+ Or Run script auto download &install :install_essential.sh. Open Terminal and type : cd <path_to_folder_content_sh_file>, then, type ./install_essential.sh or bash install_essential.sh.
Step 2 :Build tesseract&leptonica library.
1. Create folder with name is tesseract-build.
2. Extract source code file tesseract-3.0.1.tar.gz to tesseract-build folder.
3. Extract leptonica-1.68.tar.gz to tesseract-build folder.
4. Copy script build_dependencies.sh to tesseract-build folder.
5. Open script build-dependencies.sh at line 9 , change IOS_BASE_SDK to your SDK version.
6. Open terminal, type : cd to tesseract-build folder and type : bash build_dependencies.sh
7. After complete script. Go to tesseract-build/dependencies, we have:
9. Finish!
Step 3 : Make a simple project.
1. Create one new iOS project at Xcode( or just open your existing one).
2. Add the generated ./tesseract-build/dependencies folder to your project. It contains the needed .h Header and lib*.a Library files.
3. Add the tessdata folder to your project. It contains preprocessed data for a certain language so Tesseract can recognize that language. See, below:
+ Right-click your project/group at Xcode
+ Choose “add files to your project”
+ select the “tessdata” is extracted in tesseract-build folder above.
+ At the same window, check the “create folder references for any added folders”. This is the most important step, as it instructs Xcode to add your “tessdata” folder as a regular folder ( a resource, as well), not as a Xcode project group.
5. Create TessBaseAPI object with the code below to start with it.
6. See simple project TesseractOCRTestLib
Yeah!!! That's all, I hope you could compile tesseract ocr lib by yourself. Any question, plz contact me via my email [email protected] or leave comment below. ^^!