Recognize Text in Images with ML Kit on Android (Firecasts)


looking to bring the power of machine learning to your Android app I'll show you how to get started with ml kit on today's fire cast ml kit is a mobile sdk that brings Google's machine learning expertise to Android and iOS apps in a powerful yet easy to use

package whether you're new or experienced in machine learning you can implement the functionality you need in just a few lines of code there's no need to have a deep knowledge of neural networks or model optimization to get started I'd like to start out this video series on ml

kit by showcasing one of the models provided by the SDK ml kits text recognition API for Android you can use this API to recognize text in any latin-based language text recognition can automate tedious data entry for credit cards receipts business cards and more I'm using a sample app

from the text detection code lab for today's example so if you'd like to follow along with the steps you can find the code lab link below at this point I've already created a firebase project added an Android app to the project and included the Google services Jason in

my project if you're not familiar with these steps you can check out the getting started fire cast the getting started guide or follow the initial steps of the code lab ready let's get started the Google services plug-in uses the Google services JSON file to configure your application to

use firebase and the M LK dependencies allow you to integrate the ml kit sdk into your app the following lines should already be added to the end of the build Gradle file in your app directory of your project you'll see that there are other firebase ml dependencies in

this directory we will talk about those in a future fire cast but for our uses you don't need them right now to use the ml kits text detection this is the dependency we need to be sure that all dependencies are available to your app i sync my project

with Gradle files now that I've imported the project into Android studio and configured the Google services plug-in with my JSON file and out of the dependencies for Emmel kit I'm ready run the app for the first time I start the Android studio emulator and click run in the

Android studio toolbar at this point I see a basic layout that has a drop down field which allows me to select between three images before I add the code to my app to enable text recognition I want to explain how the SDK handles it first I start with

a bitmap of the selected image then I initialize a fire based vision image object passing the bitmap to the constructor a fire based vision image represents an image object that can be used for both on device and cloud API detectors from there I process the image using firebase

vision text recognizer which is an on device or cloud text recognizer that recognizes text in an image I call process image on the firebase vision text recognizer which passes the firebase vision image as a parameter I use an unsuccessful astir to determine when detection is complete if successful

I can access a firebase vision text object which is recognized text in an image I'll get into the details of this object a little later finally I pass firebase vision text from my onsuccess function to a function to process the text and display it on the screen let's

see what these steps look like in code I create a firebase vision image from the selected image in my case I selected one of the three images but the process would work the same for an image selected from taking a photo or from your photo library then I

get an instance of firebase vision text recognizer I disable my button while the image is being processed and then call process image on the vision image since process image is asynchronous I add a non-success listener why non-success is called I'll enable the button again and call processed text

recognition on the firebase vision text produced by the function I'll also add an unfailing listener so I can handle if process image fails for now I'll just print the error when I check out the text recognition result function notice that it's rather empty right now let's talk about

what this function does in order to better understand how the process result function works I'll examine the vision text object that was passed to the closure when I called text recognizer dot process firebase vision text has two functions for getting its properties yet text which returns a single

string of all the texts recognized in the image and get text blocks which gets a list of type firebase vision text dot text block a firebase vision text text block is a text block recognized in an image think of blocks as paragraphs or groups of text that are

similar text box have several functions for reading properties like firebase vision text they have a get text function for getting the recognized text of the block they also have functions like get bounding box and get corner points to indicate where the block of text is located in the

image there's also a list of languages recognized in the image and a float indicating the model's confidence level in the text also we have a function called get lines which returns a list of firebase vision text line objects a firebase vision text outline is a text line recognized

in an image noticing a pattern here you see firebase vision text outline objects have similar properties to a text block the difference is that they have a list of type firebase vision text element which can be accessed using the get elements function a firebase vision text element is

a text element recognized in an image an element is roughly equivalent to a space separated word in most Latin script languages so essentially an element is basically a word in many cases you may just need the text of a photo so you can just access it using the

get text function of firebase vision text other times you may want to know the specific location of the words in the image which is when you'll need to access the arrays of text blocks lines and elements in the process text recognition result function I want the label words

over where they appear in the image so I iterate through blocks lines and finally elements to find the specific location for each word text overlay may not be the use case for your app but it's good to see how to iterate through those elements let's dive into the

code for this okay here I am back at processed text recognition results first I get the text blocks from the firebase vision text object if no text is found I display a message to the user that no text was detected and then return out of the function then

I'm clearing away any text I'd previously displayed on the screen time to iterate through the blocks in the firebase vision text then the lines then the elements to get to the individual frames for individual words now I'm going to create a text graphic object with the text so

it can be displayed over the image I'm not going to go into the text graphic class right now but if you'd like to see how it works check out the code from the code lab which I've linked below let's build and run the app so we can see

some text recognition whoa it works Congrats you've now added text recognition capabilities to an Android app feeling inspired to make a new project with ml kit I'd love to hear about it tell me in the comments below what you'd like to build with text recognition and if you

liked what you saw here be sure to LIKE subscribe and share so you can be the first to know about new ml kit videos as well as the other great content on the firebase Channel thanks for watching and I'll see you on a future fire cast you

Leave A Reply

Your email address will not be published.