Each Image Matters, Even Among Millions: Scaling up QoE-driven Delivery of Image-rich Web Applications

Parvez Ahammad

It comes as no surprise that when it comes to image-rich web applications, every single image matters in defining the quality of experience (QoE) for the end user. So how does one offer individually-tuned settings for optimal image delivery while being able to scale up to millions of images across the entire web delivery pipeline? There’s some fun math that goes into answering this question; at Instart Logic, we call it SmartVision technology. Today, aligned with the public release of our first formal academic publication describing SmartVision technology, let me use this blog post to give you the basic ideas behind the technical core of this technology and how it enables optimized delivery of image-rich web applications as a whole, while selecting individually-tuned settings for each image within a given web application.

Intuitively speaking, the key to optimal delivery of an image is to have a content-dependent signature (or hash code) for computing the impact of web delivery on the given image, and using said signature to prioritize various constituent parts of the image file. In our work, we developed a simple computational signature that captures the impact of web delivery pipeline on image quality; we call it VoQS (variation of quality signature). In our experiments, we also discovered that large corpuses of images can be effectively split into coherent clusters based on the VoQS similarity. Taken together, these two simple insights combine to result in an efficient algorithmic approach – SmartVision – for finding adaptive settings for each individual image, delivered via a web delivery service.

For technical details on the algorithm and experimental results on empirical datasets, please see the academic publication that we are presenting today at the ACM (Association for Computing Machinery) Multimedia Conference. While there is a large body of research out there on the topics of image categorization and computer vision-based image content analysis, our paper is one of the first publications (to our knowledge) where quality-dependent image categorization in the context of web delivery is directly addressed.

The following flowchart shows how the SmartVision algorithm works:

As you can see in the flowchart, the categorization part can be done offline (with intermittent updates) to adapt to a changing image corpus pooled across the web delivery service. The real-time aspect simply depends on efficient computation of VoQS and a nearest-neighbor lookup against the pre-stored exemplars, that were estimated during the offline categorization step.

While message-passing algorithms such as Affinity Propagation [Frey & Dueck, 2007] offer the advantage that one doesn’t need to pre-specify the number of expected clusters as well as get the cluster-specific exemplars as a side product, the algorithmic complexity of Affinity Propagation makes it impractical for really large image datasets (such as the ones we encounter with our Software-Defined Application Delivery service). In scenarios where the image corpus is very large, one can use faster algorithms such as K-means (with appropriate care and safety checks) for clustering, and choose the image exemplars by minimizing aggregate distance in the VoQS metric space. It is worth noting that the entire algorithmic flow (and the categorization aspect) happens in an unsupervised fashion – so it is highly amenable to automation in the context of an always-on web delivery service. In our experiments, we found that we could find optimal delivery thresholds for a large corpus of images quickly, while minimizing the loss of visual quality (see Figure-3 in our ACM-Multimedia paper). In addition, our approach is not really dependent on any particular image format; thus, we can apply it for most of the popular image formats used by the web community.

At Instart Logic, we use the SmartVision algorithmic pipeline in two related but different contexts. One application scenario (termed True Fidelity Image Streaming) is to divide the image into parts such that most relevant bits of the image file useful for optimizing user’s quality of experience (QoE) are delivered up-front in a first-pass. This quick first-pass allows an image-rich web application to load quickly and delivers fast user interaction. Meanwhile, Instart Logic’s client-cloud architecture continually works in the background to enable a seamless backfill so that the remaining details are incorporated into the image quickly, without impacting the interaction time, while ensuring that the full-quality of the original image is delivered. (Note though, that such a streaming approach requires the user to have our thin JavaScript-based client Nanovisor.js running in their web browser.)

So what can you do when the client isn’t installed on the target device, such as is the case with a native mobile application?

For users who do not have an environment that can run our JavaScript client, we can use the SmartVision technology to automatically determine the optimal threshold on the server-side, and just send the part of the image file that delivers a good QoE compared to the original. In congested mobile networks, or for users with low-complexity user-devices, or other scenarios where network footprint comes at a premium, such a server-side approach can deliver dramatic improvement in web application interactivity without significantly sacrificing the visual quality-of-experience (QoE). We term this application scenario Image Transcoding with SmartVision. This approach allows us to improve application delivery performance through a server-side transformation.

For further technical details and empirical experimental results, click on this link to access our ACM Multimedia publication.


Leave a Reply

Your email address will not be published. Required fields are marked *