Fundamental problems & open source dread of making a social media image resizer

Last edited: September 1, 2021

On July 1st, Pixelhunter, an AI-based online image resizer, was launched on Product Hunt. Soon, it was upvoted to the #3 Product of the Day. At the core of the service is the Uploadcare image processing engine and its smart crop and smart resize features.

The goal of the service is to become the most convenient image resizer for social media. In just one click, you’ll create covers, profile and post images, Open Graph, and thumbnails for Facebook, Twitter, Instagram, Twitter, YouTube, TikTok, LinkedIn, Pinterest, and Snapchat.

In this article, Miloslav Voloskov, a software architect and Pixelhunter developer, explains how he solved finicky browser-specific problems.

The origins

I first needed social media images upon launching The Code of Conduct Generator a while ago. I resized them in Photoshop. One by one.

In my later projects, I tried using online tools, but again, I still needed to crop images one by one. The tools provided presets for common social media, but I needed to manually adjust the crop for every one of them.

I really wanted to solve this problem, but at that time I wasn't familiar with what modern products had to offer. Instead, I was instantly thinking of OpenCV, object recognition and other hard machine learning stuff. Of course, as I had zero experience with such problems, the idea of learning something like OpenCV just to crop images made me instantly put the whole thing into the “later” bin.

Fast forward three years and Uploadcare comes to me with the exact same idea. But this time, their Intelligence API is there to tackle the hardest part of the equation: actually cropping images.

I just needed to integrate their API, which is really nothing more complicated than installing an NPM library. I instantly agreed.

The art of creating problems

When designing products, there are usually two completely opposite extremes: explanatory design and encapsulating design.

The explanatory design philosophy states that a thing should provide the whole picture of how it functions upon just looking at it, so you need no instructions to operate it: you already know how it operates and what you can do with it.

The encapsulating design is the opposite; trying to encapsulate the implementation and essentially give you a magic button that does the thing you need. I think of coffee makers as the perfect analogy, so I made this infographic:

the evolution of coffee machines as a metaphor for encapsulated design

Cameras, kitchen and home appliances, cars — basically everything that was ever designed — is somewhere on this spectrum. Unix tools were heavily inspired by the former. Iconic Apple products were surely designed with the latter in mind.

Going to the Apple Store for the first time as a kid who learned Pascal was a career-defining experience for me. I still find ad-hocs beautiful — in fact here's a whole article about them — but after being mesmerized by how Apple products looked and worked back in 2009, I still use the memories as a reference. 12 years later, I still stand by this idea: a magic button with the least background required to operate. In fact, both image-butter and fast-image-zoom were made with that very principle in mind.

So, what functionality do we need? You just upload an image, the API crops it, and you display the results. That's basically it. I could do it in about three hours, just like with The Code of Conduct Generator.

But here's the thing.

Pixelhunter was meant to be a Big Party kind of product. A product that is essentially a celebration of its own existence. It needed to be perfect.

Keeping in mind that Uploadcare essentially crops pictures by pressing their own AI magic button, I rolled with encapsulating design, designing a magic button of my own. Unsurprisingly, instead of just three hours, it took forty.

The result

Configuration

I wanted the whole thing to be maintainable without the need for a programmer, so I created the simplest DSL possible (really just a config file syntax) that would allow me to extend the product without writing code, with the only limitation being that you needed to stay inside the domain area.

Here's how it goes:

[{
  "app": "Facebook",
  "description": "Because of the size of the social network, Facebook images are mostly utilitarian. Don't expect people to passionately explore them and pin them to their moodboards. However, Facebook supports a wide variety of sizes, so you can utilize them to build a solid brand image.",
  "logoSrc": "logos/facebook.svg",
  "sizes": [{
    "name": "Cover photo",
    "width": 820,
    "height": 312,
    "simple": true,
    "positionSrc": "positions/facebook-cover-photo.svg",
    "description": "Pro Tip: Treat this like a billboard. Changing this picture regularly is usually a good decision. Experiment with images to see what your audience responds to best."
  }]
}]

Everything in this product obeys the config. The grid, the sizes, the logos, the texts, even the sizes counter in the hero block and its pluralization — everything is recalculated when this config file is changed.

Just using a browser, you can edit this file right on GitHub, adding logo images there if necessary and clicking “commit.” Everything else is done automatically.

Colors

The background is #f2f1ed — white but not too white. Why? Social media images often have solid color backgrounds, and often that color is white. Dynamically adjusting the product background to act in contrast with the color of the image would be over-engineering. Why not just roll with a white-ish background, so both dark and bright images look good on it? Plus, the additional warmness of the not-quite-gray really adds to it.

not too white background color

The main text and border color is #37352f — again, black but not quite.

I'd like to put a particular emphasis on this color choice. By simply changing colors, I avoided the whole mess of recognizing the main colors and setting them with JavaScript. Just changing colors is robust, minimal, and flexible enough to do the job. It just can't break.

I keep doing this because of the Jira paradox. See, when every task is marked “important,” none of them are. It's like writing everything in caps. I keep true black and true white unoccupied, saving them for a time when I need extra emphasis. When the most black color you have, the #000, is already occupied, you lose the ability to make a color blacker than what you already use for each and every text element of your website.

Our brain seems to perceive “dark gray on bright gray” as “black on white” to simplify things, but still sees the true colors when you add just a touch of them to draw extra attention.

Tabs

Just transitioning between active and non-active states was easy enough to do by just changing the background color from whatever the active color is to transparent. Sure, this is the technique everybody uses.

But I wanted to go further. I wanted the tab background to act as its own entity, sliding around geometrically.

simple vs advanced modes of Pixelhunter

To make this work, you'll need a separate background element that dynamically measures the width of the target tab it's currently transitioning to and its position to make the transition smooth. Thanks to CSS, it just takes the shortest path.

Challenges arise when we come to making it responsive. As a background element that’s only connected with its corresponding button with JavaScript, it's unaware of the context change unless we do it.

We need to dynamically recalculate the background element position on a resize event. However, resize events can happen really often when you resize the browser window. So, we need debounce. Still, there’s one case when a singular resize event is triggered: when a user rotates their mobile device. So, we don't need just any debounce, we need a tailing mode one.

A tailing mode debounce executes the first call instantly and then goes into debouncing. So when a user rotates their device, a tab will be instantly called up to start recalculating its position.

Switching from basic to custom mode

Screenreader

I often hear about accessibility “preventing” UX and frontend developers from doing their job properly. As a matter of fact, yes, WCAG colors and placing labels above form fields may confuse less skilled and the most stubborn designers.

But accessibility is not just about that. There’s no excuse for not filling in alt attributes on meaningful images and not putting aria-hidden="true" on purely decorative elements. When done properly, ARIA makes for a solid screenreader experience. Look, VoiceOver reading the meaning of completely custom, complicated control elements that took five paragraphs to describe!

screenreader-friendly capabilities in Pixelhunter

Grid

This product is all about pictures. There are really huge ones. There are long ones and wide ones. So I came up with the idea of occupying the whole available width and decided to choose the widest container I use: the 80rem one.

To display the images beautifully, I use the flexbox grid, which wraps while JavaScript is sorting images in a smart way:

(a, b) => {
  const byRatio = (a.width / a.height) - (b.width / b.height)
  if (byRatio !== 0) return byRatio
  return b.height - a.height
}

Pictures are sorted by their aspect ratio. As larger pictures shrink because of their fundamental responsiveness with max-width: 100% and height: auto, this technique yields better results than just sorting them by their height or area.

sorting images of various sizes in Pixelhunter

Using just intrinsic grids and flexbox, I was able to achieve full width with no media queries. Yep, you read that correctly. As a matter of fact, I use only one media query in only one place, just to reposition it while keeping logos aligned. This block is entirely optional.

You can absolutely make a beautiful, 100% responsive UI without any width media queries at all. It's also faster and more responsive without them. Here's an amazing video by Heydon Pickering about such techniques.

Animations and perceived performance

Initially I wanted interactive 3D hero images like these:

the initial first screen of the Pixelhunter landing page included 3d animation

I did it and then I noticed something: my computer was getting really hot, and the website was laggy. It's a top of the line Apple M1, so even if this kind of computer is struggling, what would other users experience?

I decided to drop the tilt, and the simple static animations were working as smooth as butter. I'd trade an insignificant UX quirk for a huge performance boost any day.

Also, I designed a nice appear animation for the grid back when the hero block wasn't there. But later, as the hero block grew taller than a common screen height, I decided to turn the appear animation off because you really have to scroll your mouse wheel violently when the page loads in order to see it, and it also happens during the most intensive part of the webpage display lifecycle: upon first load, when everything is trying to load simultaneously.

Fallbacks

Blah blah blah, things break, blah blah. In the case of Pixelhunter, things started to break in development. You see, I use an ad blocker, and Pixelhunter supports common ad sizes, so my ad blocker was blocking some images even in the app served from my devserver.

Many people use ad blockers. Other than that, things really do break. What if the API doesn't respond for some reason? What if an image is not displayed because of a random connectivity problem?

A fallback solution is defnitely necessary. I chose this gorgeous lush GIF by Erica Anderson:

the lush gif

The problem is, despite being stated, the width and height of an image don't really do anything when the image itself is display: none. Since we’re guaranteed to know the necessary sizes (because we’re cropping images to fit them) we can always resize the fallback to whatever dimensions we like.

Here's the catch though: if we just apply width and height in pixels with CSS to a div, it will prevent that element from shrinking, breaking the grid. So I used the ancient aspect-ratio trick:

Look,

.ytVideo {
  margin-bottom: 56.25%;
}

a responsive 16 / 9 div!

Using that technique and always knowing the sizes, I can calculate the percentage dynamically:

paddingBottom: `${(props.height / props.width) * 100}%`

Since we don't use height: 0, there will always be room for the fallback download component:

a screen when there’s a problem displaying parts of images (they’re blurred)

When a pic won't load, I try to load its 10-by-10-pixel version and stretch it while preserving its aspect ratio, this time with background-size: cover, blur it and add some opacity. I also blur the GIF, which results in a gorgeous fallback that I don't want to hide. I'm not ashamed of displaying it in its full glory.

Every fallback dreams of being an easter egg:

lush gif for images that aren’t available for display

I really want to mention other tricks like asymmetrical transitions that have different timings in and out, but then this article would turn into a book.

The dread

Making magic happen is hard. UX magic is particularly fragile, frustrating and finicky under the hood.

I wanted vanilla-tilt and react-medium-image-zoom to work together smoothly. I didn't do all that obsessive research on Bezier timing functions and give that interview for nothing.

After tackling all those bugs with vanilla-tilt alone in Safari, initially just putting transform: translateZ(100px) everywhere but later establishing the system that I currently use with z-indices, I was facing a problem with RMIZ.

The latest published major version of RMIZ is v3. It didn't quite provide the control I needed, so I went to its GitHub and found out about v4. I installed it, and after four hours of straight up code atrocities just to make it work, I found out that Safari just wouldn’t cooperate and the animation was hopelessly slow.

I nosedived into all those issues just to find out about v5. As major versions commonly do, it featured an entirely new API. There was no documentation. So I installed it manually, specifying the version, and figured out how to use it just by reading the code.

It worked smoothly, but it wasn't quite what I needed. To improve the UX in my particular case, I needed to adjust some of the RMIZ code. So I went for it with patch-package.

After a whole new marathon of pure struggling to make it all work in Safari, I was helpless and decided to disable tilt in Safari altogether while keeping RMIZ. With Safari you try it, it works, and then it suddenly doesn't and you don't know why.

After all that I found out that patch-package somehow works in my environment but fails miserably on Netlify when patching RMIZ after successfully patching other packages. After another session of tampering with all this mess, I just downloaded RMIZ and put it there as-is.

Now everything was working smoothly. I definitely would recommend you try to integrate two complex, event-driven libs that both rely on 3D transitions together.

Summary

Pixelhunter is definitely much more complex than it needs to be. But through struggle, and fighting rapidly arising UX problems one by one, you can create a product that is nothing short of magical.

Of course Bootstrap works and it's much more stable. I just don't want to use it.

Privacy

All the pictures you upload are deleted after 24 hours. The source code is available on GitHub.

Credits

This project was sponsored by Uploadcare and it utilizes their Intelligence API. However, Uploadcare retained no control over this article, the product design, UX, code and other major decisions. They just came to me with the idea.