When you navigate to a website on your expensive new Android device, or try to view an image that someone has sent you on your gorgeous Super AMOLED Quad HD display, the last thing you want is to find yourself standing there, waiting for a progress bar to crawl across the screen, or to squint angrily at the spinning loading icon as it sputters.
(Did you know that the loading icon is called a “throbber”? I just found out, and I’m now stuck on the idea of a “sputtering throbber.” That’s neither here nor there.)
If you have a crummy connection to the internet, you know all too well this pain. You curse your life and your #firstworldproblems, shouting “WHY??” to the computer gods, but the gods aren’t listening.
Well, I suppose that depends on what we mean by computer gods. If you’re imagining some laurel-crowned, besandled, Olympian deity giving life to smartphones from the clay, then no, you’re out of luck. But if you are thinking more metaphorically about a more terrestrial kind of computer superbeing, like Google, then I have good news for you. It has heard you, and it, too, curses the sputtering.
Google pins a large part of the blame for endlessly-churning websites and downloads on outdated image compression. The most common form of that compression is the familiar JPEG codec, which has been around since the mid-80s. In 2010 Google announced development of the WebP compression format, which could do the same job as JPEG with roughly 30% smaller file sizes, but the company still isn’t satisfied.
Researchers at Google have figured that there are still better ways to crunch an image to a much smaller size and then un-crunch it so that it retains most of its original visual glory. And this being Google, you can be sure that the path to better compression is one that is meticulously and iteratively laid out by machine learning.
“Images are still too big and slow to load on phones, so we’re hosting a competition to see how much smaller and faster you can make them,” tweeted Google CEO Sundar Pichai. “Are you the next Pied Piper?:)”
The competition to which Pichai is referring is the Workshop and Challenge on Learned Image Compression, which Google is sponsoring in collaboration with computer vision laboratory ETH Zurich and Twitter (now isn’t that interesting?). The challenge will be part of the 2018 Computer Vision and Pattern Recognition (CVPR) conference, taking place June 18-21 in Salt Lake City, where it plans to bring together experts in “traditional” methods of compression as well as those who are breaking new ground with the use of learning-based compression.
The subject of file compression doesn’t exactly send chills of excitement down one’s spine in the way that, say, the announcement of a fancy new piece of hardware does (like, for example, a toilet you can talk to), but it’s fundamental to the digital lives we lead. The pictures we take with our phones and share to Instagram, the music we listen to on Spotify or rip off of CDs (ask your parents), the videos we watch on every screen from TVs to mobile browsers, all of it relies on compression just to make that content fit through the internet’s series of tubes (again, ask your parents).
As Google research scientist Michele Covelle reminds us on the company’s blog, without compression, the 12MP photo you snap with your smartphone would take up a whopping 32MB of storage. Hooray for compression! But the hitch is that compressing a media file almost always means that something of the original is lost. Good compression is therefore not just about making images smaller, but choosing which aspects of the image can be fudged or disregarded in the process, and then how best to reinterpret the information that’s left so that it more closely resembles the original.
This can be a highly subjective process, making compression almost a kind of art form. Human beings are the ones who have to decide what information in an image makes the most sense for a compression format to keep, modify, and discard, but those decisions are guided by the final stop: the human eye. For example, humans perceive brightness more easily than color, and are more sensitive to some colors than others. If a human is incapable of perceiving some nuance of an image anyway, or perceives it poorly, why bother wasting valuable storage and bandwidth on it?
The same principle applies to music, where full-resolution WAV files are compressed into MP3s, for example, in part by removing what a human ear wouldn’t be able to notice. Of course, there is an army of audiophiles who would disagree, but again, that’s why so much of this is subjective. Each individual brain sees, hears, and processes images and sounds a little bit differently.
When you think of it that way, as presenting an image to a human brain that will do its own “decompression” using memories and the given context to comprehend it, the idea that artificial intelligence is the best means of improving this process begins to make much more sense.
This is where neural networks come in. Google has made some meaningful advances in learning-based compression in just the past couple of years. In 2016 it showed how two sets of neural networks can produce higher-quality compressed images with 25% smaller file sizes. This was accomplished by having a neural network run an initial compression and decompression of an image, and comparing the new image with the original. It then spits out the “residual,” the errors, where the second image differed from the original.
That residual is then fed back into the network. That’s right, the errors become the input, from which a higher-quality image is derived. What?!? Because as the network is comparing the errors to the original image, it becomes better able to predict what the next residual will be, which allows for a higher-quality reconstruction each time.
That brings us back to the challenge laid down by Covell and Pichai. Google believes it has just scratched the surface of what compression based on machine learning can achieve. “This rapid advance in the quality of neural-network-based compression systems, based on the work of a comparatively small number of research labs, leads us to expect even more impressive results when the area is explored by a larger portion of the machine-learning community,” writes Covell.
Google certainly has quite the incentive to push the envelope as far as it can go. The faster your images load and the nicer they look when they appear, the more pleasant your experience on the internet. The more you’re happily using the internet, the more you’re inevitably interacting with Google. But that’s not all the company is after.
“If we can improve image compression, the way we use images could fundamentally change,” Covell told Android Police. “We could enable better (and less expensive) medical diagnoses: just send a lot of pictures of the patients issues to the best doctors without a lot of delay or overhead, especially for patients in areas that are underserved (rural areas, for example).”
Learning-based systems for compression can be customized to suit particular needs, favoring the relevant information within an image over something less useful than something like aesthetics, such as in medical diagnoses. “For example, on mammograms, if the difference in tissue density and the boundaries between dense and not-dense tissues is important to diagnoses, we could include that in the measure in our quality metric but still be able to make x-ray files small enough to all for longer-term studies,” said Covell, adding, “We could provide more information in emergencies: we could send pictures of situations in disaster zones, even when the available bandwidth is limited.”
You can see why Twitter would be interested in this as well. Smaller file sizes mean a lighter load on Twitter’s infrastructure (I miss you, Fail Whale), and higher-quality images should, in theory, give users a better experience…but that’s assuming the images they’re seeing are at all pleasant, which is probably more the exception than the rule.
For the Challenge on Learned Image Compression, Google will make available a database of high-resolution images at compression.cc on February 15 for participants’ neural networks to chew on, and final results must be submitted by February 22.
You know, this all takes me back to the old 2400 baud telephone modem days, back when you hoped your mom wouldn’t pick up the receiver and break your connection to the local BBS, and you eagerly awaited your precious time with that dungeon-crawling door game, suffering as ASCII text crept by, line by painful line, and…well, those machines have learned a whole lot since then.