Counting Colours in an Image

Counting Colours Sample

Counting the number of distinct colours in an image doesn't sound like a particularly hard thing to do until you try it on a large 24-bit image. This article demonstrates one technique for counting the colours quickly.

Counting Colours

One basic technique for counting the colours in an image would be to create an array or collection to hold the colours and then start counting from the start of the image. However, consider what happens then if you load a 1280 x 1024 pixel image. Firstly, you have 1.3 million pixels to check, and secondly for every single one you need to check whether that colour has been counted before. With an array, that would potentially mean you could have to perform somewhere around 400 billion checks, and with a collection you'll probably find it dies if there are more than 10,000 unique colours.

Clearly you need a quicker method. One method is to use indexes to divide the problem down and some basic algorithmic techniques. For this sample I do the following:

  • Use a sorted list with insert optimisation and binary search to find the item.
  • Create distinct counters for each green component in the image.

Sorted Lists and Binary Search

If a list is sorted, you can find an item in it much more quickly that you can in an unsorted one. This technique is called "Binary Search" and works as follows:

  1. Start in the middle of the list. Compare the middle item to the item you're trying to find. In the (unlikely) event it matches then the search is complete. Otherwise:
  2. If the item is less than the middle item, then since the list is sorted it must reside somewhere between the start and the middle. Likewise, if the item is greater than the middle item, it must reside between the middle and the end. Go back to step 1 and repeat with the appropriate shorter list.

Obviously to use this technique you need the list to be sorted in the first place. You can perform sorting as you add items to a list using the same Binary Search technique to work out where the insertion point is; the only thing that prevents this normally in VB is that it is slow to insert an item into an array. Luckily if you're creating an array of Longs, you can make inserting quicker by using CopyMemory. This technique is described in the article A Fast Index-Based Collection.

Enhancing the code in that article to provide a binary search method is straightforward. Here's the code you need:

Public Function BinarySearch( _
      ByVal lFor As Long, _
      ByRef lInsertIndex As Long _
   ) As Long
   lInsertIndex = 0
   BinarySearch = plBinSearch(lFor, lInsertIndex, 1, m_lCount)
End Function

Private Function plBinSearch( _
      ByVal lFor As Long, _
      ByRef lInsertIndex As Long, _
      ByVal lStart As Long, _
      ByVal lEnd As Long _
   ) As Long
   If lEnd - lStart > 1 Then
      Dim iP As Long
      iP = lStart + (lEnd - lStart) \ 2
      If m_lItem(iP) = lFor Then
         ' Success:
         plBinSearch = iP
      ElseIf m_lItem(iP) > lFor Then
         ' the centre element is greater than the
         ' item we're searching for.  Set the end
         ' to the centre element & repeat:
         lEnd = iP - 1
         plBinSearch = plBinSearch(lFor, lInsertIndex, lStart, lEnd)
      ElseIf m_lItem(iP) < lFor Then
         ' the centre element is less than the
         ' item we're searching for.  Set the start
         ' to the centre element & repeat:
         lStart = iP + 1
         plBinSearch = plBinSearch(lFor, lInsertIndex, lStart, lEnd)
      End If
   Else
      ' 1 or 2 items left.  Check if either
      ' match the search item.
      If (lEnd < lStart) Then
         lInsertIndex = lStart
      Else
         If m_lItem(lEnd) = lFor Then
            plBinSearch = lEnd
         ElseIf m_lItem(lStart) = lFor Then
            plBinSearch = lStart
         ElseIf lFor > m_lItem(lEnd) Then
            lInsertIndex = lEnd + 1
         ElseIf lFor > m_lItem(lStart) Then
            lInsertIndex = lEnd
         Else
            lInsertIndex = lStart
         End If
      End If
   End If
End Function

The function returns the index of the item if found, otherwise it returns 0 (the collection is implemented with 1-based indexes). Adding the second lInsertIndex parameter lets you know where you should insert an item in the array if it isn't found.

So with this in place, counting colours can be implemented more quickly.

Counting by sub-division

Even using this technique, using a single array to store all possible colours results in too slow a count. To speed it up, you can subdivide the colour set, in effect adding an index to the colours. In the sample I've chosen the green component of the colour as the basis for the index, because the green component of any image is the brightest of the RGB components in an image. Creating 255 sorted sub arrays using a class may sound like madness but as you will see if you try modifying the code to use a single array for an image with a large number of colours its much, much quicker.

Performance

This technique is much better than any of the naive implementations. A 1280x1024 image with 264,000 distinct colours can be counted in just over a second on my current Athlon XP 2200 machine (which is at the time of writing having a number of performance difficulties all of its own). However, I note with envy that Paint Shop Pro can do the same task more quickly (less than 0.3s). How do they do it? If you know, or have an ideas then I'd love to know!