After coronavirus stranded me and my girlfriend in New Zealand in February, we had to make some difficult decisions. Should we come home or apply for a working visa here in NZ? We had only been away from the UK for 2 months so we decided we weren’t ready to go home and applied for the 1-year working holiday visa which we were eventually granted in May of this year. After weeks of job searching and rejection, I was offered a job doing data entry for the Electoral Commission of New Zealand. This role involved processing thousands of handwritten forms to enter that data into a database. I have always sought an easier way of doing a job, so my journey with optical character recognition (the fancy name for handwriting recognition) began.
Optical character recognition (OCR) tools typically take in an image (JPEG, PNG, or even PDF) and use pattern recognition, often powered by machine learning, to magically “read” each word and newline. For each of these words, the tool has a confidence threshold that if it is met, the word is deemed accurate and is returned to the user. If the confidence is not met, no word is returned.
After researching and experimenting with several tools, I found Google’s Vision API the most accurate, returning about 80–90% of my example text words accurately (see example text left). This is most likely due to the processing power that Google has behind this tool, utilising some of the best machine learning in the world to empower its handwriting recognition. By contrast, an alternative tool that I used — SimpleOCR — returned about 10% of the words from my example. Another benefit of using Google Vision was that it had a publicly-accesible API, so I would be able to interact with the tool through a simple request and response.
As every project starts out, the optimistic and naive side of me thought this would take me a week, maybe 2 at tops. It's as simple as sending an image to the API, maybe un-mangling the response, and injecting that into the fields. How hard could it be? As work began, I realised all of the extraneous work that would be required for this to become a viable and useful tool.
Python attempt no.1
Well at this stage I thought I should probably commit something to Github as I didn’t want to lose this work if my laptop decided to commit seppuku. I committed and pushed my work to this repo and went to sleep (my code is probably ugly and inefficient, but it works). I awoke to a bombardment of emails from Google saying my API key has been compromised and is now being used for Crypto-mining. Nice one Rhys. I had committed my API key to a public Github repo which had promptly been stolen, and now my Google API account had been revoked. After 2 weeks of appeals to get my account reinstated, I was eventually allowed back in (after removing the key from my repo obviously).
This had given me a chance to think about the practical use of this tool. Running a command from the terminal to start a script is not “user-friendly” and the data entry operators who would be using it are not exactly the most tech-savvy people you’ve ever met. I made a GUI for my script using PySimpleGui which did the job, but this still had to be run from the command line and started its own Chrome browser.
Back to Python
Before chucking everything into a Lambda, I wanted to tidy the script up to remove junk and make it cleaner, so I managed to reduce my 200 line script down to about 85 lines. After racking my brains trying to remember how to create the deployment package for the Lambda (why is there a difference between compressing a folder and compressing all of the contents of the folder?!) I altered my script to be able to ingest a JSON request body (just the URL of the image I want to convert to text) and produce a JSON response. Upon testing the Lambda, I keep getting an error saying that one of Google’s modules can’t be found (even though its right there in the package). Investation found that for certain projects, particularly ones that use Google Cloud Platform, you can’t just compile the deployment package from your machine, “you have to do it using a specific OS/”setup”, the same one that AWS Lambda uses to run your code”. Great. Enter the third and hopefully final Amazon Web Service, EC2.
This StackOverflow post described my situation perfectly and there is a handy guide to starting up your EC2 instance to create your deployment package. There lay some trouble in the fact that this post was written for Python 2.7, whilst my project and Lambda were configured for Python 3.8. After I installed 3.8 on the EC2 instance using this guide (you can only install up to 3.6 using the normal
sudo yum commands) I also ensured the pip version was up to date. This enabled me to create a working deployment package! Yaay! But not yaay. I accidentally posted my API key to Github again and my account was blocked (I still don't know how I did this again as it was in the .gitignore, but oh well). Another 3 weeks went by as I appealed to have my account reinstated but to no avail so I used another Google account that I had to get a new API key.
I felt so close at this point, as my deployment package was working, testing my endpoint through Postman and API Gateway returned the response I wanted. My final hurdle was CORS. For those of you who don’t know, CORS stands for Cross-Origin-Resource-Sharing and is a mechanism that allows resources on a webpage to be shared by an external domain. The below image describes the mechanism of CORS for browser security.
My problem lay in the bottom right corner of the above image, as I couldn’t get the server to respond with the appropriate headers to allow the request to be made from any origin. After probably 6–7 frustrating hours of seeing this incredibly frustrating error, I managed to configure my API to respond with the correct headers and return my values. The returned values are then injected directly into the fields and the work is complete.
At 3.5 seconds from execution to completion, running my tool significantly speeds up the process of keying the forms. We can allow several more seconds to change any errors that have been made by the OCR, but this still falls far short of the 35 seconds it took me to manually type the data. Additionally, it is easy to setup as a backend service or everything could be done client-side.
My tool is not 100% accurate, and will not be able to read much of the illegible handwriting that comes our way. It's smart but doesn't have the thousands of years of pattern-recognition that our brains are trained to detect. Through improvements in machine-learning, this accuracy will increase in the future and the OCR will be able to recognise increasingly unintelligible writing.
This tool uses a 3rd party API to process the forms. These forms contain sensitive data of the electoral roll and the commission would have no say in how this information was maintained or how often it was “torn-down” from Google’s logs. Google is also offshore from New Zealand, so the government would not be able to regulate the use of this PII.
The ideal situation would be that the Electoral Commission develop their own OCR tool that all data would flow through, however, this would require many resources and would likely produce inaccurate results (until the Commission develops pre-trained machine learning models as Google can, I doubt a highly accurate tool could be built). Another alternative would be that the Commission enters into an agreement with Google in the regulation and preservation of private data to enable the use of their Vision API.
Even if this project goes on a shelf to gather dust and the dev team for the Commission does nothing with it, I thoroughly enjoyed this project. I built upon my Python knowledge, gained experience working with multiple AWS tools I had never used before, and had exposure to Google’s powerful APIs. I also learned the importance of the .gitignore file to stop me from pushing private credentials to public spaces (after 7 total weeks of being locked out of my Google Cloud Platform account, I’ve definitely learned my lesson). If you fancy checking out my project feel free to visit my repo here.