Write quick and maintainable code

Final time I offered the way it’s attainable to attain excessive efficiency with easy and maintainable code with Halide. This time, I’ll share in regards to the common ideas in Halide.
Halide is an open-source programming language designed to make it simpler to write down high-performance picture processing or array processing code on fashionable machines.
Within the last article of this series, I defined:
- What’s Halide?
- What’s the want for Halide? (Recap: quick code is commonly complicated and exhausting to write down and keep)
- How Halide addresses the necessity.
On this article, I’ll dig deeper and share some common ideas in Halide — key knowledge varieties and totally different schedule primitives. For various schedule primitives — an instance of generated code and a demo of code execution is included.
This text is a summarized model of content material obtainable within the documentation and tutorials shared by the Halide improvement staff.
I’ve tried to summarize among the major ideas for the readers and my very own future reference. This appeared crucial to share earlier than speaking in-depth about among the subjects.
Necessary Disclaimer: Any opinion known as out on this article are my very own and don’t replicate opinion or stance of the organizations I work with.
I’ll be protecting this subject simply on the floor, for studying extra please checkout the tutorials at halide-lang.org/tutorials.
A Halide program sometimes have
- Some
enter(s)
and a number ofoutput(s)
. - Some expressions or algorithms that that expresses the conversion of
enter(s)
tooutput(s)
. - Schedules that outline how these expressions can be evaluated on a sure machine. We will additionally outline a number of schedules every concentrating on a unique subset of {hardware}.
By way of these expressions or algorithms we’ve got
Func
: AFunc
object represents a pipeline stage. It is a pure perform that defines what worth every pixel ought to have. You may consider it as a computed picture.Var
:Var
objects are names to make use of as variables within the definition of aFunc
. They don’t have any which means by themselves. We typically usex
andy
as variables that outline thex axes
andy axes
of photographs.Expr
:Expr
permits us to outline complicated expressions utilizingVar
s.
I’ll use the gradient instance shared within the tutorials as instance right here
This could generate a picture, one thing like this

Schedule is Halide’s manner of defining “ run the algorithm effectively on a selected machine”.
Following content material is abstract of the bigger tutorial at halide-lang.org > tutorials > lesson 5
For the above gradient()
instance, by default the code can be executed in sequence in row-major iteration. The generated code shall look one thing like

Let’s check out totally different schedule choices.
Reorder
This reorders the sequence of iteration, on this the loop shall be executed in column main vogue. The generated code will as a substitute seem like

This type of change often have a big influence on the efficiency based mostly on how the CPU leverages caches.
Cut up
This can be a very highly effective primitive and permits us to separate a loop in sure dimension in a number of components. The final argument right here is known as the “cut up issue”. On this case the generated code adjustments to one thing like
Fuse
That is reverse of splitting, on this case we fuse two variables and merges the 2 loop right into a single for loop over the product of the extents. The generated code seems to be like
Tiling
With each cut up and reorder in hand, we will now do tiled evaluations. The above schedule permits us to separate each x
and y
loop by an element of 4 and reorder the order of execution. The generated code is now complicated however ought to look one thing like

Vectorization
We will instruct the compiler to now generate directions to vectorize the interior loops in x
for loops. This generates code like

Unrolling a loop
We will simply information the compiler to unroll the loop through the use of unroll
primitive.
Fusing, tiling, and parallelizing
We will principally mix all of those constructs relying on the info independence to attain the very best efficiency. That is the place tiling and fusing shine. We will parallelize throughout the tiles.
The above-mentioned schedule generates code like:

Additional combining all of the constructs for a big picture:
This video exhibits a visualization of how the complicated schedule will get computed.
https://halide-lang.org/tutorials/figures/lesson_05_fast.mp4
An enormous shout out to Jonathan Ragan-Kelly and the staff for developing with Halide and making it open supply. A lot of the content material on this article is derived from their work printed as tutorials at halide-lang.org/tutorials.
twentieth April 2022
This text is 2nd article of a collection of articles on this style. In case you missed the primary article, yow will discover it right here:
Within the subsequent few articles, I plan to write down about
- Article 3: Quick picture processing on Android with Halide.
- Article 4: Looking for most optimized schedules — Auto Scheduler.
If there’s something particular about this subject you’d wish to be taught, please let me know
- Halide — halide-lang.org
- Halide tutorials
- Halide: decoupling algorithms from schedules for high-performance image processing — Open entry analysis article by Jonathan Ragan-Kelley et. al.
- Processing images fast with native code in Android
- How to use RenderScript to convert YUV_420_888 YUV Image to Bitmap
- Faster image processing in Android Java using multi threading