Using ColdFusion to Search Images Based on Color

Nov. 18, 2006

Introduction

This page describes a basic method to index and search images/digital-photographs based on color using ColdFusion and CFImageHistogram.

This method indexes and searches color in images quickly using ColdFusion. I do not pretend to claim that this is the best way to search images based on color. There are many other more in depth and precise methods, for more information I suggest you read more about CBIRS.

*NOTE Code examples below range from pseudo-code to 95% of what you will need to index and search images. There are so many possibilities of how images and image color information can be stored that it is hard to demonstrate a one size fits all solution.

Where to Start

To be able to search for colors in an image we have to know something about the colors that make up that image. A histogram is a good way to do this. More importantly an image color histogram is useful because it tracks the frequency of colors that occur in an image. Each occurrence of a color, in this case red-green-blue, is counted based on its value 0-255.

To get the color histogram of an image you need to inspect each pixel and calculate the red/green/blue components of that pixel. This can be a very slow process, so it may be best to resize a large image before trying to calculate the color histogram.

Some Statistics

The CFImageHistogram color histogram creates an array for each of the three colors (R-G-B). In each array are 255 cells that store the count of the occurrences of that particular color (0-255). While the arrays are nice to have, more important to be able to search on is knowing the mean and standard deviation of that color’s array. The mean will tell us what the average color is and the standard deviation will give us an idea of how spread out the color range might be.

Where to Start

This statistical color information works well, we can take an image and find the average red green and blue for it. We can store this information in a database and query for images that have a high occurrence of blue for example. But the problem is that most images have a mix of colors, so when you take create the color histogram of the entire image you get an average result that tends to be well the average color and for entire image and that's most likely some grayish color.

average image color
Example image and its mean color to illustrate that the average color of an entire image is not accurate to search with.

A solution to this is to split the image up into sub-images. By looking at the color histogram information and mean/standard deviation of smaller regions of the image we can better search within that image. By looking in smaller areas of the image we are more likely to get average colors for the histogram that better reflect the colors in the image:

Color bin example image

Example image shows that the mean color of the 16 sub-images better relects the colors in the original image.

The more sub-images (or bins) that you use the more accurate your searches can be; however, there is a trade off in processing time it takes to index the images and then later the search query time.

Indexing the Images

The code examples below might be a little unclear, so i've included a basic flow chart that should help in understanding the process of indexing the color information on the images.

The first step in to getting this color information out of an image is to buffer an image so we can manipulate it:

<cfscript>
 //image to get color information from
 photo = expandPath("photo.jpg");
 //Jpeg Codec
 jpegCodec = createObject("java", "com.sun.image.codec.jpeg.JPEGCodec");
 //file input
 fileInputStream = createObject("java", "java.io.FileInputStream").init(photo);
 //decodes the JPEG
 decoder = jpegCodec.createJPEGDecoder(fileInputStream);
 //get buffered image
 bufferedImage = decoder.decodeAsBufferedImage();
 imageHeight = bufferedImage.getHeight();//buffered image height
 imageWidth = bufferedImage.getWidth();//buffered image width
</cfscript>

Now that we have a buffered image we need to be able to isolate the subimages within the image. The "subImage" method of the bufferedImage can be used. The subImage creates a new buffered image based on a rectangle defined in the arguments:

subImage = bufferedImage.getSubimage(x,y,width,height);

Once we have an isolated region of an image we need to get the color histogram and color statistics of it. This can be done with the CFImageHistogram:

<cfscript>
 //create image histogram object
 imageHistogram = createObject("component","imageHistogram").init();
 //set the buffered subimage into the image histogram object
 imageHistogram.setBufferedImage(subImage);
 //get the color histogram and color statictics struct
 hist = imageHistogram.getColorHistogram();
</cfscript>

We need to store this image color histogram information in a database table so that we can query it. We will need to store a unique ID of the image (either a file name or database ID) and the means and standard deviations. CFImageHistogram will return a struct containing the histogram and color statistics for R-G-B of the image. Here is an example of what that query might look like:

<!--- histogram struct name = "hist" --->
<cfquery datasource="#DSN#">
INSERT INTO colors (
photoid,
redmean,
redsd,
greenmean,
greensd,
bluemean,
bluesd,
bin
)
VALUES (
<cfqueryparam value="#photoid#" cfsqltype="cf_sql_char" />,
<cfqueryparam value="#hist.r.mean#" cfsqltype="cf_sql_double" />,
<cfqueryparam value="#hist.r.standarddeviation#" cfsqltype="cf_sql_double" />,
<cfqueryparam value="#hist.g.mean#" cfsqltype="cf_sql_double" />,
<cfqueryparam value="#hist.g.standarddeviation#" cfsqltype="cf_sql_double" />,
<cfqueryparam value="#hist.b.mean#" cfsqltype="cf_sql_double" />,
<cfqueryparam value="#hist.b.standarddeviation#" cfsqltype="cf_sql_double" />,
<cfqueryparam value="#bin#" cfsqltype="cf_sql_numeric" />
)
</cfquery>

While looping through each of the 16 sub-images we need to store this information based on a unique ID or the filename of this image so we can later query this data.

Putting the above code snippets together we can index images. Below is a more in depth example of how to index a database or cfdirectory of images storing color data for each subimage within each image:

<!--- cfquery of cfdirectory here --->
<cfoutput query="photos">
	<cfscript>
		//size of sub images (WARNING this number will be squared so 4 = 16 subimages)
		size = 4;
		//max size in pixels of subimage (saves processing time)
		maxSize = 50;
		//Jpeg Codec
		jpegCodec = createObject("java", "com.sun.image.codec.jpeg.JPEGCodec");
		//file open
		fileInputStream = createObject("java", "java.io.FileInputStream").init("#directory#\#photos.photofilename#");
		//decodes the JPEG
		decoder = jpegCodec.createJPEGDecoder(fileInputStream);
		//give us an image to scale
		image = decoder.decodeAsBufferedImage();
		imageHeight = image.getHeight();
		imageWidth = image.getWidth();
	</cfscript>
	<cfset bin = 0 />
	<cfloop from="1" to="#size#" index="i">
		<cfloop from="1" to="#size#" index="ii">
			<cfset bin = bin + 1 />
			<cfscript>
				//get subimage
				x = imageWidth/size*(ii-1);
				y = imageHeight/size*(i-1);
				subImage = image.getSubimage(JavaCast("int",x),JavaCast("int",y),JavaCast("int",imageWidth/size),JavaCast("int",imageHeight/size));
				
				subImageHeight = subImage.getHeight();
				subImageWidth = subImage.getWidth();
				sizeRatio = subImageWidth/subImageHeight;
				
				//figure out if the photo is portrait or landscape make the larger equal to the max thumb size given in agruments
				if (subImageWidth gte subImageHeight) {
					subImageWidth = maxSize;
					subImageHeight = round(maxSize/sizeRatio);
				} else {
					subImageWidth = round(maxSize * sizeRatio);
					subImageHeight = maxSize;
				};
				
				//create a scaled image
				scaledImage = subImage.getScaledInstance(JavaCast("int", subImageWidth), JavaCast("int", subImageHeight), subImage.SCALE_SMOOTH);
				BufferedImage = createObject("java", "java.awt.image.BufferedImage").init(JavaCast("int", subImageWidth), JavaCast("int", subImageHeight), subImage.TYPE_INT_RGB);
				//draw the image into the buffered image
				graphics = BufferedImage.createGraphics();
				graphics.drawImage(scaledImage, 0, 0, Javacast("null", ""));
					
				//get histogram
				imageHistogram.setBufferedImage(BufferedImage);
				hist = imageHistogram.getColorHistogram();
			</cfscript>
	
			<cfquery datasource="#DSN#">
			INSERT INTO colors (
			photoid,
			redmean,
			redsd,
			greenmean,
			greensd,
			bluemean,
			bluesd,
			bin
			)
			VALUES (
			<cfqueryparam value="#photos.photoid#" cfsqltype="cf_sql_char" />,
			<cfqueryparam value="#hist.r.mean#" cfsqltype="cf_sql_double" />,
			<cfqueryparam value="#hist.r.standarddeviation#" cfsqltype="cf_sql_double" />,
			<cfqueryparam value="#hist.g.mean#" cfsqltype="cf_sql_double" />,
			<cfqueryparam value="#hist.g.standarddeviation#" cfsqltype="cf_sql_double" />,
			<cfqueryparam value="#hist.b.mean#" cfsqltype="cf_sql_double" />,
			<cfqueryparam value="#hist.b.standarddeviation#" cfsqltype="cf_sql_double" />,
			<cfqueryparam value="#bin#" cfsqltype="cf_sql_numeric" />
			)
			</cfquery>
		</cfloop>
	</cfloop>
</cfoutput>

This process of storing the data of each sub-image needs to be repeated for all the images you would like to search. So you can do this within a cfoutput of a cfdirectory or a query result of image locations. This process can be very slow, each image may take as long as 1-5 seconds to process depending on the size of the original image, the size and number of sub-images, and your server's speed. This can also create a large number of records, for example: 200 images at 16 bins per image = 3200 records.

Searching the Indexed Images

Once we have a database table populated with image region color information we need to figure out how to search for a color. We will do this based on a single color and look for images that have the most occurrences (or nearest) in their sub areas. We will query the database looking for rows that are closest to the r-g-b color selected. We will use a SQL 'between'.

That query will return all the matches of the colors that fall in those ranges, that's fine but we’d like to know which images have the most occurrences. So we will add some grouping and sorting to the query so that we return in order the images that have the highest occurrence of the color selected:

<!--- set range of colors to fall in based on mean --->
<cfset sd = 10 />
<!--- query for colors expecting red,green,blue 0-255 from form--->
<cfquery datasource="#datasource#" name="search">
SELECT Count(colors.photoid) AS CountOfphotoid,cfcphotoblogphotos.photoid , cfcphotoblogphotos.photoTitle,cfcphotoblogphotos.photoThumbFileName, avg(colors.redmean) as redmean, avg(colors.bluemean) as bluemean, avg(colors.greenmean) as greenmean
FROM cfcphotoblogphotos RIGHT JOIN colors ON cfcphotoblogphotos.photoID = colors.photoid
WHERE 1=1
AND colors.redmean between <cfqueryparam value="#form.red-sd#" cfsqltype="cf_sql_numeric" />  and <cfqueryparam value="#form.red+sd#" cfsqltype="cf_sql_numeric" />
AND colors.greenmean between <cfqueryparam value="#form.green-sd#" cfsqltype="cf_sql_numeric" />  and <cfqueryparam value="#form.green+sd#" cfsqltype="cf_sql_numeric" />
AND colors.bluemean between <cfqueryparam value="#form.blue-sd#" cfsqltype="cf_sql_numeric" />  and <cfqueryparam value="#form.blue+sd#" cfsqltype="cf_sql_numeric" />
GROUP BY colors.photoid, cfcphotoblogphotos.photoTitle
ORDER BY Count(colors.photoid) DESC;
</cfquery>

Chances are fairly high that most selected colors will return no exact matches to the selected color, so you may need to add more range that the record’s color mean can fall into. You may even want to loop the query several times each time expanding the range until you have say 10 results.

Final Notes

This page is meant as a starting point. There is much more that could be done/improved with this method. The search query could definitely be improved to better take into account the standard deviation of subimages. Also since we are storing colors in areas of an image it would be possible to search images based on patterns or drawings similar to the way retrievr works.

If you have questions or suggestions please email me.

 

Creative Commons License