Fast Image Processing in Android With Halide | by Minhaz | May, 2022

I’ve written about how Halide permits us to jot down each quick and maintainable code. This one will present its energy with Android purposes.

Picture by Dan Smedley on Unsplash.

Halide is an open-source area particular language designed to make it simpler to jot down and preserve high-performance picture processing or array processing code on trendy machines.

I’ve been writing a sequence on Halide and this text is third one within the sequence. Within the final two articles, I talked about

On this article I can be writing about how you can use Halide with Android and how much efficiency increase it will possibly supply us.

To point out the efficiency advantages I’m going to reuse the issue assertion of YUV to RGB colour format conversion. I’ve written a few articles up to now exhibiting alternative ways to do picture processing in Android with this instance utilizing this instance.

Disclaimer: Any opinion known as out on this article are my very own and don’t mirror opinion or stance of the organizations I work with.

I work on options like HDR Mode and Night time Mode in Camera from Google. It’s a tough downside assertion — we have to course of excessive decision pictures with complicated algorithms on useful resource constrained gadgets. That is the place Halide actually shines!

If you’re purely considering studying how you can use Halide with Android, you possibly can skip this sub-section.

Picture by Kir Simakov on Unsplash.

The issue assertion is to transform an 8MP (8 Mega Pixels = ~ 8,000,000 pixels) picture in a sure format known as YUV_420_888 which has one planar Y channel and two semi-planar subsampled UV channels to ARGB_8888 format which is usually supported with Bitmap in Android. You may learn extra about YUV format on Wikipedia. Additionally, the articles beneath have a greater description of the issue assertion.

The explanation I selected this as the issue assertion, is as a result of YUV_420_888 is likely one of the most typical OUTPUT codecs supported from Android Digital camera APIs and pictures are generally consumed as Bitmap in Android — thus making this a reasonably frequent downside assertion to handle.

I’ve been experimenting with the efficiency of various frameworks/applied sciences to know how they carry out with picture processing algorithms on Android. I’ve used the identical downside assertion to match them with one another. Listed here are some examples of the identical utilizing different methods I’ve examined

I’ve recorded following numbers for the issue assertion with completely different options listed above by working on a Pixel 4A machine

To this point, the RenderScript primarily based strategy was noticed to be the quickest. Nevertheless, RenderScript has been deprecated beginning Android 12. You may learn extra about it here. The event staff has shared some alternate options that are anticipated to be way more performant on new {hardware}.

Within the following part, I’ll share the Halide-based answer for this downside after which have a look at the benchmark consequence utilizing this strategy.

As talked about within the previous article Halide permits us to separate the algorithm from the schedule. So first let’s have a look at the algorithm for YUV to RGB conversion.

On this case we will assume that the enter picture format is YUV_420_888. Some key facets of this picture format are

Luma channel (Y channel) is full-resolution and planar.

  • It means Y-plane is assured to not be interleaved with the U/V airplane.

Chroma channel (UV channel) is subsampled and will be interleaved.

  • By subsampled, it means there’s one UV pixel for 4 Y pixels.
  • By interleaved it means the UV knowledge will be packed in UVUVUVUVUVUV sample within the reminiscence for every row of the picture.

Within the examples thus far, the ARGB outputs have been has every channel (R or G or B or A = alpha) with uint8 knowledge saved in single int32 worth. We’d proceed to do the identical.

The Halide generator could look one thing like this

Halide generator for YUV to RGB colour conversion

In Halide if we don’t write any specific schedule, the whole lot is computed inline. Writing schedule typically contain experience with Halide, the goal {hardware} and a few degree of hit and trial. IMO, step one ought to all the time be to jot down benchmarks and run it on course {hardware}. You should utilize the open supply benchmarking framework at google/benchmark for this goal.

Let’s begin with the very first schedule

Default schedule

Default schedule for YUV to RGB Halide

This may do all of the compute inline i.e. inside the 2 for-loops for every pixel.

Operating it on Pixel 4A, I get following outcomes for the above talked about generator.

This quantity is just like what we bought with normal native code. However I’ve larger expectations with the numbers we will squeeze out with Halide.

Break up, parallelize and vectorize

To be taught extra about these primitives you possibly can the article 2 on this sequence — Understanding the General Concepts of Halide Programming Language.

Optimized schedule for YUV to RGB Halide

This schedule splits the loop into additional components, vectorize the directions in xi loop and parallelize y loop. Let’s have a look at the benchmark outcomes

Increase! Do you see the loopy speedup over the default schedule?

There could also be additional optimizations attainable like lowering the variety of time uv_centered is computed or attempting completely different break up issue however it appears good thus far.

The above said generator will generate a C++ technique like this

Generated technique for YUV to RGB Halide generator

Which can be utilized instantly from C++ library or JNI code.

In future, if wanted I’ll write about how you can setup Halide in Android studio and use it finish to finish. LMK if it could assist over feedback.

To attach the whole lot collectively we have to get the entire finish to finish pipeline linked which suggests

Java --> JNI --> Halide --> Java --> Bitmap

This results in an total latency of ~28ms. So if we have a look at results of completely different approaches thought-about thus far

Newest consequence indicating Halide implementation to be the quickest and up to date comparability with the prevailing outcomes.

This provides us each excessive efficiency + simple to keep up code! What else does an engineering staff need?

For this downside I’ve discovered one other answer which is even sooner (latency of roughly ~12ms), however requires {hardware} particular implementation (leveraging NEON intrinsic) and dealing with parallelization and all. It’s not as simple to jot down or preserve however positively price an article later.

Individuals who write very environment friendly code say they spend at the very least twice as lengthy optimizing code as they spend writing code.

— Some one on the Web

Halide makes it a lot simpler to attempt to tune completely different schedules. Simpler than manually altering loops order, splitting logic, threading and so forth. And removes the ache for writing & sustaining ABI particular vectorized code.

Approaches like Halide, Auto vectorized C++ code are transportable and simpler to keep up.

  • As in comparison with explicitly hand tuned, CPU particular code

At all times, take into account your use-case earlier than optimizing

  • Instance — library developer vs app developer
  • Perceive the actual bottlenecks like if the total operation takes 2s, optimizing between 28 ms and 12 ms won’t give an enormous benefit.

In case your software is efficiency crucial: Benchmark → Breakdown → Optimize (repeat)

More Posts