Image Processing in Java

Chad Arimura
7 min readAug 5, 2019

Introduction

This is a post about developing jtrack, an image processing application in Java using OpenCV that utilizes some new and upcoming Java language features.

The application grabs animated images from GIPHY, runs them through an OpenCV frame processor using JavaCV (a Java wrapper to OpenCV/JNI) to detect a face and recompose the animated image, then posts the results to Slack.

The deploy artifact is a Docker image with custom JRE built using jlink and included with source.

click for animated result

Note: There are always easier, better, faster ways of doing things. Please comment if something bothers you. I’m happy to revise the code and post.

My goals were simple:

  1. Portability: Despite platform-specific dependencies like OpenCV
  2. Speed: Relatively fast builds
  3. Maintainability: Keep things as “native” as possible without introducing lots of custom scripts, upstream Docker images, etc.

In this post you’ll learn to:

  1. Use Maven to drive simple, automated, and fast Docker builds
  2. Build and include a custom runtime using jlink
  3. Semi-painlessly include and use OpenCV for computer imaging
  4. Incorporate a few upcoming Java language features like fibers

But first, since when does Chad write specifically about Java?

My New Role

I’m back from paternity leave where I hinted at a new role in my Tweet from a few weeks ago. I’m excited to officially announce that I’ve joined the Java Platform Group, where I’m humbled to call many of the worlds best language designers, architects, stewards, and leaders, my teammates.

My “devrel” team will be helping drive a number of ongoing initiatives centered around developer and customer engagement, telling the story of Java while listening to our community and customers as a representative of the group that is, in large part, writing the story.

I’m sure I’ll talk a lot more about this, but for now, let’s dive into jtrack.

Maven and Docker

I wanted a familiar, streamlined and automated workflow while not introducing lots of custom stuff, with the resulting artifact containing everything needed to run the application on any platform.

Jtrack’s entire build process is a simple and familiar mvn package which compiles and builds a Docker image. I didn’t add push-to-registry or deploy steps, but those are fairly straightforward.

Starting with a simple Maven POM, I added the exec-maven-plugin which executes a shell script to build the Docker image.

<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<executions>
<execution>
<id>docker-build</id>
<phase>package</phase>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>docker-build.sh</executable>
<arguments>
<argument>${docker.image}</argument>
<argument>${project.version}</argument>
</arguments>
<skip>${skipDockerBuild}</skip>
</configuration>
</plugin>

I would have preferred to use one of the Docker plugins out there (like the Spotify one) but that added about 40 seconds to my build time. I’m not certain why, but I think using the Docker client was much slower sending all the necessary files (~300 megs of JavaCV libs) as context to the build step.

The Docker build script is quite simple, taking the arguments you see in the plugin config above.

docker build -t $1:$2 .

Finally just note the <skip> configuration, I do this as an option to skip the build in cases where you don’t need it — in my case I didn’t want to build an image when you run libs.sh (more on that below).

Now the actual Dockerfile itself:

FROM oraclelinux:7-slim

RUN yum update -y && yum install -y gtk2.x86_64 && yum install -y tar
RUN curl -L https://raw.githubusercontent.com/denismakogon/oraclelinux-opencv/master/apply_binaries.sh | /bin/bash

ADD java-runtime /usr/share/jtrack/java-runtime/
ADD lib/*.jar /usr/share/jtrack/
ADD src/main/resources/* /usr/share/jtrack/
ADD target/classes /usr/share/jtrack/
ADD oci_key.pem /usr/share/jtrack/oci_key.pem

ENTRYPOINT ["/usr/share/jtrack/java-runtime/bin/java", "-cp", "/usr/share/jtrack/*:/usr/share/jtrack", "com.pinealpha.demos.jtrack.App"]

We build from a native upstream Oracle Linux image, not some unmaintained custom image, but in order to do this, Denis Makogon helped me by including pre-built OpenCV binaries for the platform with a script to install them. This is critical because otherwise actually building OpenCV using Dockerfile RUN commands takes over 30 minutes!

Next up you’ll notice lib/*.jar which includes all our Maven dependencies. I went this route because the Docker build is able to cache them as a layer, as opposed to using some Maven dependency assembler which would change the layer hash after every compile, having to recreate the layer adding a lot to our build time.

Finally the entrypoint is not a JAR, because an executable JAR doesn’t allow the classpath to include a list of JAR’s without a hacky script to explode out each one and build a super long classpath.

Astute readers will notice that there’s no Java runtime yet. There’s none installed into Oracle Linux by default, and none built in the Dockerfile. That’s where jlink comes in.

Custom Runtime using JLink

Simply put, the Java runtime is now included with the application code itself. Specifically, I include a custom build of OpenJDK 14’s “fibers” branch:

bash-4.2# /usr/share/jtrack/java-runtime/bin/java --versionopenjdk 14-internal 2020-03-17
OpenJDK Runtime Environment (build 14-internal+0-adhoc.opc.loom)
OpenJDK 64-Bit Server VM (build 14-internal+0-adhoc.opc.loom, mixed mode)

The beauty of including the runtime is it’s bundled right in there with the code on GitHub. Updating it is a snap.

I̵n̵ ̵t̵h̵i̵s̵ ̵c̵a̵s̵e̵ ̵I̵ ̵c̵r̵e̵a̵t̵e̵d̵ ̵a̵n̵ ̵O̵r̵a̵c̵l̵e̵ ̵C̵l̵o̵u̵d̵ ̵i̵n̵s̵t̵a̵n̵c̵e̵ ̵w̵i̵t̵h̵ ̵O̵r̵a̵c̵l̵e̵ ̵L̵i̵n̵u̵x̵ ̵7̵.̵6̵,̵ ̵S̵S̵H̵’̵d̵ ̵i̵n̵,̵ ̵a̵n̵d̵ ̵b̵u̵i̵l̵t̵ ̵t̵h̵e̵ ̵J̵D̵K̵ ̵v̵e̵r̵s̵i̵o̵n̵ ̵I̵ ̵w̵a̵n̵t̵e̵d̵,̵ ̵t̵h̵e̵n̵ ̵r̵a̵n̵ ̵t̵h̵e̵ ̵f̵o̵l̵l̵o̵w̵i̵n̵g̵ ̵J̵L̵i̵n̵k̵ ̵c̵o̵m̵m̵a̵n̵d̵:̵

[edit] Thanks Alan Bateman for pointing out that jlink supports multi-platform type capability by simply pointing to the JDK distribution for that particular platform. So I skipped the SSH step above, downloaded the latest early-access build of Loom Linux build (onto my mac), then ran the following (notice the — module-path):

jlink --no-header-files --no-man-pages --compress=2 --strip-debug \
--module-path /Users/chad/lib/jdk-14-loom-linux.jdk/jmods \
--add-modules java.base,java.compiler,java.desktop,java.instrument,java.management,java.naming,java.sql,jdk.attach,jdk.jdi,jdk.unsupported \
--output java-runtime

Then I just checked that folder into the git repo.

In order to figure out what modules I needed to include, I used jdeps as follows:

jdeps --multi-release 14 --ignore-missing-deps --print-module-deps lib/* target/jtrack-2.3.jarjava.base,java.compiler,java.desktop,java.instrument,java.management,java.naming,java.sql,jdk.attach,jdk.jdi,jdk.unsupported

Important caveat: things got a little wonky with jdeps where at first it returned some errors, so I had to keep deleting dependencies from the lib directory until it worked. I still ended up with a comprehensive set of modules, but of course this is not ideal. I’ll come back and figure this out later.

So that’s the build process. Let’s cover the OpenCV dependency now.

OpenCV

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. It has evolved a lot over the years, and forms the basis of quite a lot of interesting use cases including generalized object detection (faces, cars, license plates, etc.), video analysis (motion detection), and more. It’s also big, platform dependent, and written in C++, so it makes a nice example to support the case for portability using Docker.

This is also the only custom script we chose to use because building OpenCV takes >30 minutes, but including pre-built binaries with the code itself would cause a lot of bloat in the repo. Thus, we pre-built and just download during the Docker build, caching the layer for fast builds.

The Dockerfile includes a apply_binaries.sh, which is as follows:

#!/usr/bin/env bash

set -xe

mkdir -p /usr/local/include/opencv4
curl -L https://github.com/denismakogon/oraclelinux-opencv/raw/master/release/include_opencv4.tar.gz | \
tar xvz -C /

mkdir -p /usr/local/lib64
curl -L https://github.com/denismakogon/oraclelinux-opencv/raw/master/release/lib64.tar.gz | \
tar xvz -C /

Denis has been playing around trying to modularize OpenCV, but to no such luck yet. For now, this works just fine, and entropy shouldn’t have too much of an impact as long as the base Docker image is updated (and it’s an official Oracle Linux build so it will be) as well as the runtime which is pretty straightforward and bundled nicely with the code itself.

And finally, a few pointers in the code.

Java Tidbits

The code should be pretty readable, so have at it, but I’ll point out a few things specifically.

Local-Variable Type Inference

Introduced in JDK 10 as part of project Amber and aptly known as “vars”, I believe is understated. It can make code more succinct while also more readable. Take for example:

FileImageOutputStream outputStream = new FileImageOutputStream(finalGIF);
FileInputStream inputStream = new FileInputStream(originalGIF);
GifDecoder gifDecoder = new GifDecoder();

vs

var outputStream = new FileImageOutputStream(finalGIF);
var inputStream = new FileInputStream(originalGIF);
var gifDecoder = new GifDecoder();

The code is more succinct because the variable type on the left is inferred from the expression on the right. Also the first code block w/ explicit typing often encourages variables like “os” instead of outputStream which then results in usages of “os” that become less understandable throughout the program.

Stuart Marks has a great style guideline that everyone interested in vars (aka everyone) should read: https://openjdk.java.net/projects/amber/LVTIstyle.html.

Fibers

The introduction of Fibers to Java, codenamed Project Loom, aims to make concurrent programming in Java simpler, faster, and much more efficient. Fibers are lightweight efficient threads managed by the Java Virtual Machine. There are early access Loom builds available here. (oh and thanks Stuart for the correction on availability)

We used fibers in jtrack initially to replace some async code because nobody wants to actually write async code. The case is pretty straightforward:

private Fiber downloadFiber;FaceDetect() {
this.downloadFiber = FiberScope.background().schedule(() -> {
try {
this.setupClassifier();
} catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
});
}

And to ensure the fiber executed, we put in a validation step that turns the fiber into a CompletableFuture, which calls a blocking get() for the result:

if (this.classifier == null) {
this.downloadFiber.toFuture().get();
}

Unfortunately the underlying OpenCV native code doing the frame detection itself isn’t theadsafe so we were unable, as of yet, to gain much speed up there.

Conclusion

Hopefully this post had some useful tips on how to package Java applications with Maven and Docker, while reducing build times, and using some pretty shiny new platform and language features like jlink, local-variable type inference, and fibers.

Java is the most successful development platform in the world with over 12 million developers worldwide and a majority of the enterprise relying on it to run business-critical applications. The history of how this all came together is incredibly interesting, but the story is just getting started and me and my team are extremely excited to be a part of it!

To Learn More…

--

--

Chad Arimura

Former founder & CEO, Iron.io, now VP Serverless Advocacy at Oracle. Programmer, cover band keyboardist.