Scanning Credit Cards with Computer Vision on iOS | by Anupam Chugh


Leverage Imaginative and prescient’s Rectangle Detection and Textual content Recognizer to detect credit score and different enterprise playing cards in a stay digicam feed

Photograph by Ales Nesetril on Unsplash

Pictures has been Apple’s central focus for the reason that inception of the iPhone. Over time, they’ve launched wonderful new options that make the world a troublesome place to stay in with out an iPhone. Customers are capable of seize higher and higher photographs, due to state-of-the-art picture intelligence options Apple has steadily added.

Particularly, Apple has been investing closely within the subject of laptop imaginative and prescient, rolling out main updates yearly by its Imaginative and prescient framework, which was launched in 2017.

With face detection, object monitoring, capture quality, and image similarity options, Apple has been making it attainable for cell builders to combine complicated laptop imaginative and prescient algorithms to construct AI-powered photo-based purposes.

Among the many WWDC 2019 new releases, Imaginative and prescient’s Text Recognition and Saliency options stood out.

However that’s not the place we’re heading on this article.

  • The concept of this piece is to dig deep into Imaginative and prescient’s rectangle detection request.
  • We’ll be exploring the varied configurations attainable with the VNDetectRectanglesRequest.
  • Over the course of this text, we’ll be creating an software that scans bank cards or different enterprise playing cards of comparable dimensions and crops the picture out of the appliance’s stay digicam feed.
  • Lastly, we’ll use Imaginative and prescient’s Textual content Recognition request to parse solely the specified values from the cardboard. Values that the person picks by gestures.

Want For Rectangle Detection

Should you’ve received an opportunity to play with iOS 13’s Doc Digital camera for scanning paperwork — it’s built-into the Notes, and Recordsdata Apps as effectively, you’ll discover that it requires you to set the corners of the doc manually.

By leveraging Imaginative and prescient’s Rectangle Detection, we will routinely detect the corners of a doc that usually are rectangular in form.

Imaginative and prescient’s rectangular detection request is an image-based request that appears for rectangular areas in a picture. Apart from specifying the arrogance threshold, we will customise this request with the next properties:

  • VNAspectRatio — By specifying the minimal and most facet ratios on the Imaginative and prescient request, we will limit the kind of rectangles we wish to detect. Setting minimumAspectRatio and maximumAspectRatio to 1 would set the request to detect sq. shapes solely.
  • Minimal Measurement — We will specify the minimal measurement of the rectangle we want to detect. It must be specified between 0 and 1.0, with the default being 0.2.
  • Most Observations — An integer property specifying the utmost variety of rectangles the Imaginative and prescient request can return.
  • The diploma between the edges — Through the use of the property quadratureTolerance, we will specify the quantity by which the perimeters of the rectangle can deviate from 90 levels.

Within the subsequent few sections, we’ll be creating an iOS software that makes use of the Imaginative and prescient framework with AVFoundation to scan paperwork from the customized digicam.

We’ll do perspective correction earlier than cropping the detected area and saving it as a picture. Let’s get began!

Launch your Xcode and create a brand new single view software. Ensure that so as to add an outline for the digicam’s privateness coverage by including the important thing NSCameraUsageDescription in your information.plist file.

Within the following code, we’ve arrange our again digicam, with the media sort as video, and added it to the AVCaptureSession:

personal let captureSession = AVCaptureSession()personal func setCameraInput() guard let gadget = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera, .builtInDualCamera, .builtInTrueDepthCamera],mediaType: .video, place: .again).units.first else 
fatalError("No again digicam gadget discovered.")
let cameraInput = strive! AVCaptureDeviceInput(gadget: gadget)self.captureSession.addInput(cameraInput)

Subsequent up, we have to add the digicam feed to the view of our ViewController.

The next code incorporates the features that show the stay digicam feed and units up the output. The output video frames will finally be fed to the Imaginative and prescient request:

personal lazy var previewLayer = AVCaptureVideoPreviewLayer(session: self.captureSession)personal let videoDataOutput = AVCaptureVideoDataOutput()personal func showCameraFeed() 
self.previewLayer.videoGravity = .resizeAspectFill
self.previewLayer.body = self.view.body

personal func setCameraOutput()
self.videoDataOutput.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString) : NSNumber(value: kCVPixelFormatType_32BGRA)] as [String : Any]
self.videoDataOutput.alwaysDiscardsLateVideoFrames = true
self.videoDataOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "camera_frame_processing_queue"))
guard let connection = self.videoDataOutput.connection(with:,
connection.isVideoOrientationSupported else return
connection.videoOrientation = .portrait

To obtain the digicam frames, we have to conform to the AVCaptureVideoDataOutputSampleBufferDelegate protocol and implement the captureOutput operate.

Our digicam is prepared! Add the three features to the viewDidLoad technique of the ViewController and easily invoke the startRunning operate on the AVCaptureSession occasion.

override func viewDidLoad() tremendous.viewDidLoad()self.setCameraInput()

Now it’s time to arrange our Imaginative and prescient rectangle detection request. Within the following operate detectRectangles, we arrange our VNDetectRectanglesRequest and move it to the picture request handler to begin processing:

Just a few issues to notice within the above code:

  • We’ve set the minimumAspectRatio and maximumAspectRatios to 1.3 and 1.7. respectively, since most credit score and enterprise playing cards fall in that vary.
  • The above operate is invoked within the following operate:
func captureOutput(
_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection)

guard let body = CMSampleBufferGetImageBuffer(sampleBuffer) else
debugPrint("unable to get picture from pattern buffer")

self.detectRectangle(in: body)

  • The end result returned by the Imaginative and prescient request within the completion handler is of sort VNRectangleObservation, which consists of the boundingBox and the confidence worth.
  • Utilizing the bounding field property, we’ll draw a layer on prime of the digicam the place the rectangle is detected.
  • The doPerspectiveCorrection operate is used to repair the picture in case it’s distorted. We’ll look nearer at this shortly. This operate is invoked when the person faucets the “Scan” button to extract the fully-cropped card from the digicam feed.

Imaginative and prescient’s bounding field coordinates belong to the normalized coordinate system, which has the origin because the lower-left nook of the display.

Therefore, we have to rework the Imaginative and prescient’s bounding field CGRect into the picture coordinate system, as proven within the code under:

func drawBoundingBox(rect : VNRectangleObservation) 

let rework = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -self.previewLayer.body.peak)

let scale = CGAffineTransform.identification.scaledBy(x: self.previewLayer.body.width, y: self.previewLayer.body.peak)

let bounds = rect.boundingBox.making use of(scale).making use of(rework)

createLayer(in: bounds)personal func createLayer(in rect: CGRect) maskLayer = CAShapeLayer()
maskLayer.body = rect
maskLayer.cornerRadius = 10
maskLayer.opacity = 0.75
maskLayer.borderColor =
maskLayer.borderWidth = 5.0

previewLayer.insertSublayer(maskLayer, at: 1)

As an alternative of doing a CGAffineTransform to remodel the bounding field into the picture’s coordinate house, we will use the next built-in strategies obtainable with the Imaginative and prescient framework:

func VNNormalizedRectForImageRect(_ imageRect: CGRect, 
_ imageWidth: Int,
_ imageHeight: Int) -> CGRect

When the maskLayer is ready on the detected rectangle within the digicam feed, you’ll find yourself with one thing like this:

The job is simply half carried out! Our subsequent step includes extracting the picture throughout the bounding field. Let’s see how to try this.

The operate doPerspectiveCorrection takes the Core Picture from the buffer, converts its corners from the normalized to the picture house, and applies the attitude correction filter on them to provide us the picture. The code is given under:

The UIImageWriteToSavedPhotosAlbum operate is used to save lots of the picture within the Pictures library of a person’s gadget.

The picture doesn’t present in your album if you happen to straight move the CIImage into the UIImage initializer. Therefore, it’s essential that you just convert the CIImage to a CGImage first, after which ship it to the UIImage.

Let’s take a look at the extracted picture with perspective correction utilized:

It’s evident from the above illustration that making use of a perspective correction filter on the Core Picture fixes the orientation of the mentioned picture.

Subsequent, let’s take a look at extracting solely the specified textual content from the scanned picture.

iOS 13’s Imaginative and prescient was bolstered with the inclusion of textual content identifiers within the VNRecognizeTextRequest, which beforehand solely advised us whether or not the textual content was current or not. We had to make use of Core ML fashions to parse the values mentioned textual content.

Particularly, we had to make use of regex, which not solely is inefficient but in addition doesn’t work universally. Utilizing a regex sample would require supporting numerous edge instances for filtering various kinds of bank cards (as an example, not all playing cards have 16 digits. AMEX has 15).

As an alternative, we’ll enable the person to make use of gestures to create a movable rectangle on the picture. Subsequently, we’ll parse the textual content from the area chosen by the person. This not solely makes it environment friendly, however it additionally provides the person management over the information they’re sharing.

To create a movable rectangle, we’ll observe the person’s contact on the scanned picture and redraw the chosen area, as proven under:

As soon as the person selects an oblong area and presses the “Extract” button, we’ll crop the picture from the rectangle and move it on to the Imaginative and prescient Request.

As an alternative of pasting the finishing supply code of the TextExtractorVC.swift, I’ve simply shared the related snippets under:

Right here’s the output of the ultimate software in motion:

As an alternative of extracting and gifting away the digits of my bank card quantity, the above illustration does the extraction on the entrance aspect of the cardboard.

Imaginative and prescient’s rectangle detection doesn’t work one of the best on hand-drawn shapes. Right here’s a glimpse at an experiment I ran on the PencilKit framework whereas utilizing the above Imaginative and prescient request:

On this publish, we labored by one other traditional software of laptop imaginative and prescient, whereby we used the rectangle detection request in a stay digicam feed to detect a bank card and crop it out, whereas making an allowance for its orientation and distortion.

Transferring ahead, we carried out the textual content recognizer on the picture we’ve simply scanned to extract the bank card’s digits. By letting the person choose the area of curiosity to be parsed, we’re not solely giving them the management on their info but in addition eliminating the knowledge that’s not wanted.

Setting a bounding field on the acknowledged texts and doing hit checks on them may very well be one other method of extracting the knowledge you need—from a card!

The total supply code of this venture is on the market on this GitHub Repository.

That’s it for this one. Thanks for studying.

More Posts