Mar 15, 2008

OCR Base Engine Part 2- CCD/Blob Libraries Review

Connected Component Detection and Analysis Libraries


As a preliminary step towards building the OCR engine, we will look many libraries that consist of implementation of CCD algorithm.


Fast CCs on Image - http://people.csail.mit.edu/rahimi/connected/

Opencv Based Quasi CCs - http://www.marcad.com/cs584/proj3/default.htm

OpenCV.Net - http://code.google.com/p/opencvdotnet/

Vigra - http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/

CCLib - http://mathieu.delalandre.free.fr/projects/CCLib.html

EPIDaure - http://www-sop.inria.fr/epidaure/software/basics/connexe.php

Blob Extraction Library (CC Labeling) - http://opencvlibrary.sourceforge.net/cvBlobsLib

General CC implementations - http://www2.toki.or.id/book/AlgDesignManual/WEBSITE/FILES/DFS_BFS4.HTM

BOOST based CC graph - http://docs.huihoo.com/boost/1-33-1/libs/graph/doc/connected_components.html

QGar CC - http://www.qgar.org/doc/QgarLib/html/group__GRAPHPROC__CC.html

Opencv - http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm
http://www.cs.iit.edu/~agam/cs512/lect-notes/opencv-intro/opencv-intro.html
www710.univ-lyon1.fr/~bouakaz/OpenCV-0.9.5/docs/ref/OpenCVRef_ImageProcessing.htm
http://homepages.inf.ed.ac.uk/rbf/CVonline/unfolded.htm

Other General Libraries
RestoreInPaint - http://restoreinpaint.sourceforge.net/
GIL (Adobe) - http://opensource.adobe.com/wiki/display/gil/Generic+Image+Library
Many Libraries - http://www.oonumerics.org/oon/

Mar 13, 2008

OCR Base Engine Part 1 - A Preliminary Analysis

The Fourier Transform is way of decomposing the given input image into its sine and cosine components. The decomposed or rather the transformed image is said to in what is called the Fourier or frequency domain and input image is said to be in the spatial domain equivalent.

The frequency domain is nothing but a transformation of the original image with each point representing a particular frequency taken from the spatial domain image or the original image. Various operations are then applied to transformed image to get the desired image or output. This is the basis or beginning for any image processing application.

The Fourier Transform is used in a wide range of applications, such as image analysis, image filtering, image reconstruction and image compression. Fortunately today we have so really good implementation of most advanced image processing algorithms that we just need to know a way of building a system using them and for even using them, we need to know the basic theory. Atleast a few terms.

The concept of convolution is one of the many kind of "mathematical operations" that are applied in the Fourier domain to represent the amount of overlap between say an original image and a reversed image. Enough said for the required basic theory. I will introduce theory as and when needed, let us move and think how we can recognize characters in a basic sense. When one thinks of recognizing characters, the first thing that comes to mind is you can

1) Get an image
2) Read that image and divide it into a series of individual images and store them in an array . Each image here is holding one alphabet.

Once you have those series of individual images you can compare them to prebuilt templates of a shape or a single character, and you can then associate a tag say TELUGU_LETTER_NO_20 and do the following for the final Unicode generation whenever a certain tag occurs


foreach(tag in the xml file)
{
-> find the corresponding Unicode character from the predefined set/vector/array
->write the character to text or word
}

The need initially is to ask yourself a fundamental question. Are all the alphabet letters continuous or do they have any spacing between them?

For example consider the telugu language [http://en.wikipedia.org/wiki/Proto-Canaanite_alphabet], clearly there is some sort of spacing between individual alphabets in telugu and there is word to word spacing as well. Therefore character segmentation or symbol segmentation becomes relative simpler. The following are the telugu related alphabetic inheritance and this technique can be applied to all languages belonging to the proto-Canaanite series.

Proto-Canaanite alphabet
Phoenician
Aramaic
Brāhmī
Kadamba
Telugu

Now assume you actually have an image, there is bound to be noise in forms of smudges etc. The first image is the scanned image. The next image is the same image applied with edge enhance filter + difference filter(Each value=100).

Filter is the same as convolution. Its a "mathematical operation". Infact convolution is a kind of filter. Notice that the image is like an X-RAY, this makes very easy for us to recognize text, since we dont have to bother about noise to a very large extend.

The third image has a Gaussian filter (Value=0) applied to it. Notice the smoothing effect after you apply this filter.

Once you have this kind of "overlayed" image, you can use AI or any path finding algorithms to actually divide the whole image into a series of horizontal lines (which will consist of words). And then apply AI again to get vertical individual character division.

Many techniques in this regard like Thinning For character detection also called skeletonization in certain cases can be used. Sometimes if we use other techniques then it may require converting color images to grayscale or to the so called binary images before applying various operations or character segmentation.

Finally various techniques based on neural networks and AI can be used to automatically train the base engine so that more robust rules can established. The training could be interactive using a GUI for user enabled input for error detection and re-assignment to corrected characters.

Finding blobs is another related image processing technique used to finding individual images. Another similar technique is called the connected components detection or analysis. As the name suggests CCDs algorithms can detect individual symbols.

Typical free OCR SDKs/Softwares include

A) GRAMPS

B) Tesseract - may be a good solution for english reader but it currently only recognizes US-ASCII characters

C) GOCR/JOCR - used by xsane and kooka. It can generate custom database characters from a picture with the command

D) Ocrad - you need to use pgm file format

E) Kooka - People using KDE will probably know Kooka the standard KDE scanning tool with builtin OCR (using GOCR, Ocrad, which are OSS, or the commercial KADMOS)

F) Claraocr - seems to be able to learn. It opens natively PNM file formats

G) Conjecture - An OCR third party tool who incorporate both open sources programs code bases.

In addition various commercial level projects exist like FineReader, ReadIris, OmniPage, Adlib, Leadtools, Pegasus

Mar 12, 2008

About FanoPlane and areas of development

FanoPlane - the Fano plane derived after Gino Fano is the projective plane with the least number of points and lines: 7 each

1) Development of OCR (Base Engine + Applicability) for Indian Languages
a) Evaluation of various related softwares/SDKs
b) Evaluation of new image processing algorithms and application of the same to OCR base
c) Evaluation of CCDs (Connected Component Detection) Algorithms
d) Evaluation of Blobs and Image Segmentation algorithms

2) Game Programming
a) Mobile OS Based Game Programming
b) Focus on Inverse Kinematics and custom 3d file loaders and 3d animation of models
c) Focus on Advanced Game Physics and game physics engine development

3) Focus on Internet/CHM/HTML tools