Another activity applying various image processing techniques is done here. This is divided into two major parts. First, a handwritten text from a scanned image is preprocessed. In the second part, incidences of a typewritten word are found in the same scanned image.
The image of a scanned demo checklist form is initially downloaded given by the activity. This is shown below.
A portion of handwritten text along the horizontal lines is then cropped from this image. Since the text image is tilted, its Fourier transform (FT) is determined in order to obtain the angle of rotation. Either way, this angle can just be set through trial and error. In this case, the text image is rotated by 1 degree in the clockwise direction using the function mogrify in Scilab. The following are the cropped text image and its rotation in grayscale.
The text image arranged from top reads: VGA Cable, Power Cord, Remote Control, RCA Cable, USB Cable. The primary reason of rotation is for the ease enhancement of the text image since the horizontal lines must be eliminated and hence a filter that removes this pattern can easily be created. Next, the FT of the rotated text image is attained and this serves as the template for making the filter. Henceforth, frequencies of unwanted pattern can be blocked here. The FT of the rotated text image and the filter created are shown as follows.
Since the horizontal lines are to be diminished, the central vertical line frequencies must be blocked. However, its center must not be obstructed because it contains the primary information of the rotated text image.
Below is the filtered rotated text image.
The text can now be extracted from the background after filtering. From the histogram plot of the grayscale values of the filtered rotated text image using GIMP, the threshold value for its conversion to binary type is obtained. The following are the binarized filtered rotated text image and its inversion.
The threshold value used for binarizing is 0.275. The inversion of the binarized filtered rotated text image is essential since the final step in preprocessing the text is the utilization of image morphological operations. During these operations, the region of interest, which is the text, must be the foreground with pixel value 1 while the background must be 0. Notice that some traces of the horizontal lines reside in the inverted binarized filtered rotated text image. Therefore, a closing operation is needed for these to be removed. Recall that this morphological operation is equivalent to the dilation of the eroded image by a structuring element to that same structuring element. Its implementation to the inverted filtered rotated text image is applied with a 3 x 1 sized matrix of ones as the structuring element. Its effect is shown below.
The image above is now the preprocessed handwritten text from the scanned image of a demo checklist form. It can be noticed that the remnants of horizontal lines are eliminated through the closing operation. This is because the structuring element is enough to reconnect two clusters along the vertical direction. Hence, increasing the number of rows in the matrix of the structuring element makes the letters that are few pixels adjacent vertically to each other cluster and this is not good for preprocessing the text.
For further analysis of the preprocessed text, the built-in function bwlabel in Scilab is again used here for labeling the clusters which are the letters reconstructed. There are 54 clusters detected and originally there are only 46 letters in the text image. This result is fair good enough because in the first place, some handwritten letters in the original text image are not distinctive. Some also coincide with adjacent letters and are not written in the same magnitude.
The incidences of the word DESCRIPTION from the same scanned image of a demo checklist form are found in the second part of this activity by correlation.
This time the whole image is needed since it is required to locate all incidences of the word from the scanned image. Now, this is rotated so that a sample image of the word can easily be extracted that is to be used in correlation. The original scanned image and its rotation in grayscale are as follows.
Again, the whole image is rotated by 1 degree in the clockwise direction using the function mogrify in Scilab like the text image in the first part. Then, the rotated image is binarized base to the threshold value from the histogram of its grayscale values obtained using GIMP. This is illustrated below.
The threshold value employed in binarizing the rotated image is 0.49.
Due to the fact that it is customary in a binary image where the region of interest must be white value and the background to be black, then the binarized rotated image is inverted. Shown in the following are the inverted binarized rotated image and the sample image of the word DESCRIPTION extracted from it.
In getting a sample image of the word DESCRIPTION from the inverted binarized rotated image, it is important that it is placed in an image with black background and this sample image must have the same size as the inverted binarized rotated image because correlation occurs at the frequency domain of their FTs.
Recall that correlation works here when the FT of the sample image is multiplied element-by-element to the conjugate of the FT of the inverted binarized rotated image. Then, the modulus of the shifted inverse FT of their product displays the correlated image. Now, the following is the result of correlation of the sample image of the word DESCRIPTION to the inverted binarized rotated image.
Apparently from the correlated image above, there are three incidences of the word DESCRIPTION in the inverted binarized rotated image which indeed agrees comparing from the scanned image of a demo checklist form. The incidences are marked by the bright spots as observed in the correlated image. One is found at the upper left and two are located below it and are arranged in a same line horizontally.
Although the preprocessing of a handwritten text in the first part is not perfectly done, I have applied various techniques of image processing successfully in this part also with the correlation of the word DESCRIPTION to the scanned image in the second part, hence I grade myself 10/10 in this activity.
This activity is successfully completed through my collaboration with Gary and Ed.
Appendix
The following is the source code for this activity.
stacksize(4e7);
// Preprocessing Handwritten Text
text = gray_imread('htext.bmp');
//scf(0);
//imshow(text);
textrot = mogrify(text, ['-rotate', '1']);
//scf(1);
//imshow(textrot);
//imwrite(textrot, 'htextrot.bmp');
Ftext = log(abs(fftshift(fft2(textrot))));
//scf(2);
//imshow(Ftext, []);
//imwrite(Ftext/max(Ftext), 'Fhtextrot.bmp');
filter = gray_imread('filterhtextrot.bmp');
filtext = abs(ifft(fftshift(filter).*fft2(textrot)));
//invfiltext = max(filtext) - filtext;
//scf(3);
//imshow(filtext, []);
//imwrite(filtext/max(filtext), 'filteredhtextrot.bmp');
bintext = im2bw(filtext, 0.275);
//imwrite(bintext, 'bintext.bmp');
invbintext = 1 - bintext;
//scf(4);
//imshow(invbintext);
//imwrite(invbintext, 'invbintext.bmp');
SE = ones(3, 1);
morphtext = dilate(invbintext, SE);
morphtext = erode(morphtext, SE);
//scf(5);
//imshow(morphtext);
//imwrite(morphtext, 'morphtext.bmp');
[L, n] = bwlabel(morphtext);
// Typewritten Text Correlation
checklist = gray_imread('Untitled_0001.jpg');
checklistrot = mogrify(checklist, ['-rotate', '1']);
//scf(6);
//imshow(checklistrot);
//imwrite(checklistrot, 'checklistrot.bmp');
binchecklist = im2bw(checklistrot, 0.49);
//scf(7);
//imshow(binchecklist);
//imwrite(binchecklist, 'binchecklist.bmp');
invbinchecklist = 1 - binchecklist;
//scf(8);
//imshow(invbinchecklist);
//imwrite(invbinchecklist, 'invbinchecklist.bmp');
description = imread('description.bmp');
cor = abs(fftshift(fft2(fft2(description).*(conj(fft2(invbinchecklist))))));
//scf(9);
//imshow(cor, []);
//imwrite(cor/max(cor), 'correlated text.bmp');
The image of a scanned demo checklist form is initially downloaded given by the activity. This is shown below.
A portion of handwritten text along the horizontal lines is then cropped from this image. Since the text image is tilted, its Fourier transform (FT) is determined in order to obtain the angle of rotation. Either way, this angle can just be set through trial and error. In this case, the text image is rotated by 1 degree in the clockwise direction using the function mogrify in Scilab. The following are the cropped text image and its rotation in grayscale.
The text image arranged from top reads: VGA Cable, Power Cord, Remote Control, RCA Cable, USB Cable. The primary reason of rotation is for the ease enhancement of the text image since the horizontal lines must be eliminated and hence a filter that removes this pattern can easily be created. Next, the FT of the rotated text image is attained and this serves as the template for making the filter. Henceforth, frequencies of unwanted pattern can be blocked here. The FT of the rotated text image and the filter created are shown as follows.
Since the horizontal lines are to be diminished, the central vertical line frequencies must be blocked. However, its center must not be obstructed because it contains the primary information of the rotated text image.
Below is the filtered rotated text image.
The text can now be extracted from the background after filtering. From the histogram plot of the grayscale values of the filtered rotated text image using GIMP, the threshold value for its conversion to binary type is obtained. The following are the binarized filtered rotated text image and its inversion.
The threshold value used for binarizing is 0.275. The inversion of the binarized filtered rotated text image is essential since the final step in preprocessing the text is the utilization of image morphological operations. During these operations, the region of interest, which is the text, must be the foreground with pixel value 1 while the background must be 0. Notice that some traces of the horizontal lines reside in the inverted binarized filtered rotated text image. Therefore, a closing operation is needed for these to be removed. Recall that this morphological operation is equivalent to the dilation of the eroded image by a structuring element to that same structuring element. Its implementation to the inverted filtered rotated text image is applied with a 3 x 1 sized matrix of ones as the structuring element. Its effect is shown below.
The image above is now the preprocessed handwritten text from the scanned image of a demo checklist form. It can be noticed that the remnants of horizontal lines are eliminated through the closing operation. This is because the structuring element is enough to reconnect two clusters along the vertical direction. Hence, increasing the number of rows in the matrix of the structuring element makes the letters that are few pixels adjacent vertically to each other cluster and this is not good for preprocessing the text.
For further analysis of the preprocessed text, the built-in function bwlabel in Scilab is again used here for labeling the clusters which are the letters reconstructed. There are 54 clusters detected and originally there are only 46 letters in the text image. This result is fair good enough because in the first place, some handwritten letters in the original text image are not distinctive. Some also coincide with adjacent letters and are not written in the same magnitude.
The incidences of the word DESCRIPTION from the same scanned image of a demo checklist form are found in the second part of this activity by correlation.
This time the whole image is needed since it is required to locate all incidences of the word from the scanned image. Now, this is rotated so that a sample image of the word can easily be extracted that is to be used in correlation. The original scanned image and its rotation in grayscale are as follows.
Again, the whole image is rotated by 1 degree in the clockwise direction using the function mogrify in Scilab like the text image in the first part. Then, the rotated image is binarized base to the threshold value from the histogram of its grayscale values obtained using GIMP. This is illustrated below.
The threshold value employed in binarizing the rotated image is 0.49.
Due to the fact that it is customary in a binary image where the region of interest must be white value and the background to be black, then the binarized rotated image is inverted. Shown in the following are the inverted binarized rotated image and the sample image of the word DESCRIPTION extracted from it.
In getting a sample image of the word DESCRIPTION from the inverted binarized rotated image, it is important that it is placed in an image with black background and this sample image must have the same size as the inverted binarized rotated image because correlation occurs at the frequency domain of their FTs.
Recall that correlation works here when the FT of the sample image is multiplied element-by-element to the conjugate of the FT of the inverted binarized rotated image. Then, the modulus of the shifted inverse FT of their product displays the correlated image. Now, the following is the result of correlation of the sample image of the word DESCRIPTION to the inverted binarized rotated image.
Apparently from the correlated image above, there are three incidences of the word DESCRIPTION in the inverted binarized rotated image which indeed agrees comparing from the scanned image of a demo checklist form. The incidences are marked by the bright spots as observed in the correlated image. One is found at the upper left and two are located below it and are arranged in a same line horizontally.
Although the preprocessing of a handwritten text in the first part is not perfectly done, I have applied various techniques of image processing successfully in this part also with the correlation of the word DESCRIPTION to the scanned image in the second part, hence I grade myself 10/10 in this activity.
This activity is successfully completed through my collaboration with Gary and Ed.
Appendix
The following is the source code for this activity.
stacksize(4e7);
// Preprocessing Handwritten Text
text = gray_imread('htext.bmp');
//scf(0);
//imshow(text);
textrot = mogrify(text, ['-rotate', '1']);
//scf(1);
//imshow(textrot);
//imwrite(textrot, 'htextrot.bmp');
Ftext = log(abs(fftshift(fft2(textrot))));
//scf(2);
//imshow(Ftext, []);
//imwrite(Ftext/max(Ftext), 'Fhtextrot.bmp');
filter = gray_imread('filterhtextrot.bmp');
filtext = abs(ifft(fftshift(filter).*fft2(textrot)));
//invfiltext = max(filtext) - filtext;
//scf(3);
//imshow(filtext, []);
//imwrite(filtext/max(filtext), 'filteredhtextrot.bmp');
bintext = im2bw(filtext, 0.275);
//imwrite(bintext, 'bintext.bmp');
invbintext = 1 - bintext;
//scf(4);
//imshow(invbintext);
//imwrite(invbintext, 'invbintext.bmp');
SE = ones(3, 1);
morphtext = dilate(invbintext, SE);
morphtext = erode(morphtext, SE);
//scf(5);
//imshow(morphtext);
//imwrite(morphtext, 'morphtext.bmp');
[L, n] = bwlabel(morphtext);
// Typewritten Text Correlation
checklist = gray_imread('Untitled_0001.jpg');
checklistrot = mogrify(checklist, ['-rotate', '1']);
//scf(6);
//imshow(checklistrot);
//imwrite(checklistrot, 'checklistrot.bmp');
binchecklist = im2bw(checklistrot, 0.49);
//scf(7);
//imshow(binchecklist);
//imwrite(binchecklist, 'binchecklist.bmp');
invbinchecklist = 1 - binchecklist;
//scf(8);
//imshow(invbinchecklist);
//imwrite(invbinchecklist, 'invbinchecklist.bmp');
description = imread('description.bmp');
cor = abs(fftshift(fft2(fft2(description).*(conj(fft2(invbinchecklist))))));
//scf(9);
//imshow(cor, []);
//imwrite(cor/max(cor), 'correlated text.bmp');
No comments:
Post a Comment