Computer Vision Challenge 4: OCR
This is a challenge we’re working on in the Silicon Valley Computer Vision Meetup. This challenge is to use OCR to read a receipt. Specifically, this receipt:
We’ll be using an OCR engine called Tesseract. To get started with Tesseract:
1. Install Tesseract using the instructions. Be sure to install the appropriate language training data.
2. Download the full-size receipt image.
3. Enter the command line:
tesseract IMG_2288.jpg out
4. Look at file “out.text”. You should see (among other things) the text:
SANTA CRUZ HOTEL
Red Restaurant and Bar
Congratulations, you’ve got Tesseract up and running!
Along with the text, you’ll see a lot of garbage. The next step is to tune Tesseract so that it captures all of the text.