Back >
< Back
Contains 0 items
Subtotal: $0.00



Converting Documents into Editable Text Using Optical Character Recognition

After scouring all the programs in my computer, I came up empty.

I needed a tool to convert printed documents, pictures, and handwriting into editable text. But I didn’t have it. I also did not want to pay for a tool that may already exist for free online.

Converting Documents into Editable Text Using Optical Character Recognition

By Shon Roti

After scouring all the programs in my computer, I came up empty.

I needed a tool to convert printed documents, pictures, and handwriting into editable text. But I didn’t have it. I also did not want to pay for a tool that may already exist for free online.

The ability to convert scanned documents or images into editable text, known as optical character recognition (OCR), has existed for some time, but my experience with it was limited. 

So, the search began. I was motivated by an old project that I wanted to update.

In 2001, my love of blues music and graphic design converged. I created an image of a blues artist composed entirely of text (FIGURE A). It took me nearly a year researching books, magazines, and online information about various blues bands. I copied, pasted, and typed out more than 2,000 blues bands and musicians, alphabetized them, formatted them, and created a layout in Adobe Illustrator. I positioned a scanned image of Jimmy Rogers, a member of the Muddy Waters Blues Band, beneath the text and changed each letter’s grayscale level to match a small area of the image below to mimic the photo itself. A close-up portion of this art is shown in FIGURE B.

Figure A

Figure B

It was a colossal investment of time in a piece of art I would never sell due to copyright issues connected to the original image. It simply exists for me to enjoy, framed and hanging on my office wall.

Twenty years later, with most of my time dedicated to a daughter, a wife, my own business, a new dog, and home improvement projects, I can’t imagine trying to scrape together enough time to re-create this today. Which is exactly what I would have had to do—I lost the original text file many years ago.

It did occur to me recently, however, that all that effort could be recycled into another art project, if only the art could be converted back into editable text.

My search started by looking for an open source (free) option for OCR tools and software. I tested several options. Due to the scale of the job, most of those online options failed. Whether the file was vector or raster, the effort was going to push any software to its limits with a file that contained more than 25,000 characters. Many websites I tested simply stopped working or estimated the conversion to text would take hours. And even if they did complete the task, I had no idea what level of accuracy could be expected in the conversion until it finished.

At last, I stumbled on a website called “Mashtips.com.” I had never heard of the website, but I was excited to see that it had a review of the “8 Best Free Online OCR Tools for Extracting Text from an Image.” The first one I tested from the list worked amazingly fast. And after proofing the new text versus the original print, I determined that it was extremely accurate. I was over-the-moon happy.

The website? Onlineocr.net. I didn’t feel the need to test the remaining seven websites, but if you would like review them yourself, go to www.mashtips.com/online-ocr-tools/.

The OCR Test

For this demonstration, I’d chosen a file (FIGURE C) that I had downloaded a few weeks previously—a document regarding a fireplace we had recently installed in our basement. In what I think might be a typical scenario for the experiment, I used a file found online that was, I believe, originally a printed document that was converted to a PDF. I downloaded that file, reprinted it, and scanned it at 300 dpi using my EPSON printer/scanner and saved the file as a JPEG. These steps, I felt, would be a good challenge—using a file that had several steps of generational loss.

Figure C

Here’s how the website (onlineocr.net) works (FIGURE D).

  1. Upload file
  2. Choose the language
  3. Choose the output format
  4. Click the CONVERT button

Figure D

The file populates below (FIGURE E) in just a few seconds and is ready to be downloaded and used. This website even offers an online preview of the conversion below the file created so that you can decide whether the conversion was accurate before downloading the file.

Figure E

After proofing the new text, I found zero errors. I did notice, however, that it ignored my handwriting at the top of the page. Although the website was not noted for recognizing handwriting, I decided to test this as well. I uploaded another file—a grocery list in my handwriting (FIGURE F) that was scanned at 300 dpi in grayscale mode and saved as a JPEG file. I followed the previously listed steps to produce a Text Plain file. It did not do well. I then changed the grayscale image to a black/white bitmap and tried it again. More data was created, but it was completely incorrect. I then rewrote the grocery list in my best handwriting, saved the file at 600 dpi and tried the conversion again. This time it responded with the words “No Recognized Text!” as if to say, “stop trying to upload your personal scribbling!”

Figure F

So OK, it didn’t recognize my handwriting. But honestly, I often have trouble reading my handwriting as well. And, the website never toted that ability to begin with.

Onlineocr.net’s capabilities are solid, however. It can recognize 46 different languages, handles multiple file formats, such as TIFF, JPEG/JPG, PDF, PNG, BMP, PCX, and ZIP files.

It can convert file sizes up to 15 MB in the guest mode (free) and 200 MB as a registered user and can export to Microsoft Word (docx), Microsoft Excel (xlsx), or Text Plain (txt).

OCR exists in many various software programs you may already have, including some scanners, Photoshop, Illustrator, Acrobat Pro, and ABBYY Fine Reader PDF, to name a few. But if you don’t have those programs, free help is available.

Having used this website’s OCR tools to convert my 25,000 character-based artwork into editable text, I’m now free to use my previous design efforts once again for a new project and one that I will surely document in another future article.

Shon Roti is the owner of 9th Street Designs, a sublimation and graphic design consulting and promotional products business. A graphic designer, Shon has spent more than two decades working as a production artist and instructor in the awards and promotional products industry. In 2014, ARA named him Speaker of the Year. You can find him at www.9thsd.com or contact him at shon@sublimationconsultant.com.

Awards and Personalization Association

The Awards and Personalization Association is the organization for retailers and suppliers of personalized and customized items. By providing education, meetings, and access to a vibrant network of professionals, the Awards and Personalization Association is the one place to ensure the growth of your talent, your business, and your professional community.

Learn More

© Awards and Personalization Association
Contact Us
Awards and Personalization Association
8735 W. Higgins Road, Suite 300

Chicago, IL 60631

info@awardspersonalization.org
847.375.4800
(Fax) 847.375.6480

Connect with Us