How to Implement Search By Color When All You Have is A Good Coffee

Hey there! Last week I started working on a rewrite of Shefa Products which is a wholesale merchandising e-commerce for Argentina.

The rewrite is not only to move it from Angular.JS (version 1, do you remember it?) to Next.JS and the Adonis Framework, but also to be able to add features faster and get the SEO benefits from SSG and great lighthouse scores.

In planning the rewrite, I also wanted to add some new features.

So after talking to some customers one of the findings that I’ve had is that

Client search for products that match the color of their brand.

That makes total sense. Merchandising is an industry where you basically pick a backpack or a pen, stamp your logo on it, and give it to your —future— customers or best employees. You totally need the backpack or pen colors to match your brands’.

So “let’s make it easier to find matching products for your brand” — I thought.

(Note to the readers: the rewrite isn’t live yet, but if you want to see the results of this feature I added a video at the end of this blogpost showcasing it. In case you want to learn about the difficulties I had and how I came up with a passable solution then read on:) )

How hard could it be?

What set this problem apart from other problems is that I stumbled with the the issue that the pictures of the product weren’t associated with the color information.

That means — unlike amazon, where you pick a product color from a dropdown list and the image of the product in that color gets automatically in focus — here that wasn’t possible: I had both a list of color variations for a product, and a list of images for the product; but they didn’t match.

  • I could have more pictures of the item than color variations.
  • I could have more color variations than pictures.
  • Both lists were completely unrelated

So this was a big problem and there wasn’t an easy heuristic to solve it.

What could I do?

Thinking about the problem domain

The only information we have about colors are the picture themselves. So we must infer it from there.

We need to tell if a picture is green or not.

For example for the following two images, the first one would be green and the second one would not.

But, what about this one?

Is the picture a green picture?

Not really, it’s mostly black!

But does it match with a client that has a green logo and brand?

Yes it does!

The zipper is green and would go great with a green logo on the bag.

Then we must reformulate what we stated earlier: we don’t want green images, we want images of products that go well with a green logo on them.

Maybe what makes a product go well with the color of a logo is a little detail, like a green zipper in this case. So counting “green” pixels would not be a good proxy for our goal, because the number of green pixels here would be low.

We could create a Deep Neural Network and teach it to classify which products we believe “would go ok when paired with a green logo”. But training a model and setting up a flask server seemed like a chore (maybe this is a great idea for a startup. I’d like something as easy as Fast.ai, but to deploy the models to a live server) and an overkill. I wanted this feature out in three days at most.

Can we implement something easier than a Convolutional Neural Network?

Well there is an amazing package called “color thief” that given an image, it tells you the palette of colors that appear in the image.

There we can see that a green color appears in the third position of the palette. Great!

Now instead of working with an image, we can work with a list of its predominant colors.

But how do we tell if a list of colors constitutes an image that goes well with a green color?

A good Idea would to try would be to check if any of those colors that appear in the list is “greenish“.

But how do we tell if a color is “greenish“?

From a classification problem to a regression problem

Telling if a random color is a kind of green or not is a much harder problem that telling how green it is.

Because telling if it’s green or not, not only implies that you know how close to the pure green that color is, but also forces you to define a kind of limit that distinguishes colors that classify as green from colors that don’t.

So instead of giving a yes or no answer —like “does this specific backpack match a green brand?”—, maybe we can formulate a score of the “greeness” of a color, and use it to sort the products by how close their pictures’ colors are to the color we are trying to match.

Now we are in a quest to create a formula that gives us how much green is a color. How do we do that?

Well there are many ways to state that as a color theory mathematical formula.

But for simplicity I just took the euclidean distance between the given color and (0,255,0) which is the green color in RGB values.

Why didn’t I just checked the green component of the vector?

Well because I have to give a formula that also allows me to sort by pink, or yellow or brown. And those colors aren’t represented by a unique field in the RGB vector.

I’m pretty sure there are better ways to do this which will output better results, but for what I’ve seen this is simple and good enough

Ok, but there are thousands of products, how do we make it fast?

Ok ok, I give it to you. Javascript is not the best language to do heavy processing.

And given that we have thousands of products, with dozens of images each and with dozens of colors for each image palette, implementing a search of that size in javascript would quickly bring down our performance.

My best alternative was doing it directly on PostgreSQL which is the database we will be using and it’s blazing fast.

How to Work with Colors in PostgreSQL

Once we’ve decided we will do all this processing on the DB, the next step is finding out how!

There isn’t really a color type in PostgreSQL to represent colors as there is a varchar type to represent strings.

BUT! There is the cube type!

The cube type let’s you work with coordinates in a euclidean space: calculate distances, check if a point is in a given space, etc…

It’s pretty amazing to be honest!

So how do we set it up?

The first step is activating the extension. In the postgresql console you can run.

CREATE EXTENSION IF NOT EXISTS cube;

The next step will be to create a table colors, and associate many colors to a single image. I won’t go into detail onto how to do that, but there are thousands of tutorials on how to create and link tables. I’ve also written a humble guide to database schema design which you may find helpful.

Now, in the colors table you can add a column of type cube which will hold the actual value of our colors, that is the (10,220,10) RGB vector for our “greenish” color

ALTER TABLE colors ADD COLUMN pantone cube;

Now, in javascript, you can save the list of colors for an image like this

//Create 4 rows in the colors table, one for each color in the image palette
await Promise.map(list_of_colors.slice(0, 4), (color) => Database.raw(`INSERT into colors(pantone, image_id) values('(${ color })'::cube, ${ imageInstance.id }`))

Notice the ::cube typecast that we are applying to the color variable which holds an array of length 3 with the RGB components in each respective position.

We must do this for each one of the images in our database. So create a script that process them. Personally I just added a custom hook that runs each time a new image is saved onto the database.

Finally, to query the data you can use the following SQL query, that not only will return you the products that are “greener”, but also for each product, the image that is the greenest among them

select *
from (
         SELECT DISTINCT ON (p.id) near_images.distance, p.*, pi.url
         from product_images pi
                  join (select sum(cube_distance('(0,255,0)'::cube, pantone)) distance, image_id
                        from colors
                        group by image_id
                        order by 1 asc) near_images on near_images.image_id = pi.id
                  join products p on pi.product_id = p.id
         order by p.id, near_images.distance asc
     ) result
order by result.distance, result.id;

Be sure to change the (0,255,0) to another color when the client requests another color.

Showing the results!

So, the new version of Shefa Products is still in the works, but here is an advance of the feature I’ve uploaded to youtube!

Till next time!

If you’ve seen any error or have any suggestion, please tell me!

5 thoughts on “How to Implement Search By Color When All You Have is A Good Coffee”

  1. Johannes Schmitz

    What about, additionally to predefined colors, letting the customer choose arbitrary colors from a color picker and/or by entering rgb values into a pop over dialog?

  2. Alex Campbell

    Hi Mike, cool approach, one thing I remember from a computer vision tutorial a while ago was that using a different colourspace such as HSV would allow you to search colours more easily, as the H stands for hue, the colour component, and so you would be able to reference and compare a single value unlike RGB.

    1. +1 to this, I was thinking the same! It probably would be more advanced than just comparing hue (because you’d end up with colors like brown and orange being classified the same, and white could have the hue of any color), but in terms of color similarity, HSV would likely do a better job!
      That being said, perfect is the enemy of good enough, and if it ain’t broke don’t fix it!

  3. Thanks for sharing. I work in e-commerce and it’s great to see this kind of expert insight into one of the many interesting corners of the domain. I was already a postgres fan and once again I come away having an even higher opinion of it!

Leave a Comment

Your email address will not be published. Required fields are marked *