Counting Moles

Moles are like fingerprints; you can find constellations of moles almost anywhere — on your back, your arms, legs, face. But writing software to track them is not as easy as it sounds. In this article, I'll look at what it takes to write software that can extract moles from pictures, and then track them between images taken at various time periods.

First, some pictures. This is a small section of a back, showing a variety of moles:

Typical back with moles
Typical back with moles

Two weeks later, another picture:

Typical back with moles
Typical back with moles

As you might imagine, it's desirable to be able to spot any new moles, or changes in shape of existing moles. So, here they are side by side:

First mole set
First mole set
Second mole set
Second mole set

As you can guess, there are going to be a few challenges:

In the first part of the development of my program kmole, I'm interested in detecting new moles.

The approach I've taken is to try and identify the major moles and have the program use them as a map to figure out where the moles should be in relation to one another. The first step, therefore, is to come up with an algorithm that detects moles.

I tried various methods, such as sum of squares of differences and triggering when a threshold was detected, or comparing against a well-defined colour (and triggering on a threshold), but the method that seems to work best is an average discrepancy filter.

The average discrepancy filter works by taking an area of pixels, say 30 x 30 (900 pixels) around a given centre pixel, and computing the average colour for that block. Then, the centre pixel is compared against the average, and if it's different from the average value by a certain threshold, it's tagged.

This is what the average colour filter is working with:

First mole image run through averaging filter
First mole image run through averaging filter

This does what you'd intuitively expect — it effectively blurs the image. But it does so in a “square” way — notice how the moles look like they are square instead of round. That's because I used the following algorithm:

for (int y = 0; y < h; y++) {
    for (int x = 0; x < w; x++) {
        int numpoints = 0;
        double  r = 0, g = 0, b = 0;
        for (int ix = -XSIZE; ix < XSIZE; ix++) {
            if (x + ix >= 0 && x + ix < w) {
                for (int iy = -YSIZE; iy < YSIZE; iy++) {
                    if (y + iy >= 0 && y + iy < h) {
                        r += image.r (x + ix, y + iy);
                        g += image.g (x + ix, y + iy);
                        b += image.b (x + ix, y + iy);
                        numpoints++;
                    }
                }
            }
        }
        if (numpoints) {
            r /= numpoints;
            g /= numpoints;
            b /= numpoints;
        }
        average.value (x, y, { (int) r, (int) g, (int) b });
    }
}

Effectively, I coded a square grid of 2 x XSIZE by 2 x YSIZE. A more effective algorithm, which I'm going to explore, is to come up with a list of (x, y) points and use those. The nice thing about that is that the list can be generated as a thick circle; excluding some number of pixels in the centre, and including only pixels between an inner and outer radius. (If you're particularly clever, your list of (x, y) points can actually be a list of (x, y, weight) points, allowing you to specify custom weighting for averaging.) If the list was presented as a vector<xy>, it could be done like so:

...
for (auto &xy : vlist) {
    if (x + xy.x >= 0 && x + xy.x < w && y + xy.y >= 0 && y + xy.y < h) {
        r += image.r (x + xy.x, y + xy.y);
        g += image.g (x + xy.x, y + xy.y);
        b += image.b (x + xy.x, y + xy.y);
        numpoints++;
    }
}
...

Effectively replacing the inner X and Y block with a custom list called xy (a struct with two members, x and y).

Now, computing the discrepancy against a threshold yields an interesting result:

First mole image after discrepancy threshold filtering
First mole image after discrepancy threshold filtering

Some pixels are red (when the red channel's discrepancy was greatest) and some are green (when the green channel's was greatest) — there are no blue pixels (the blue channel's discrepancy was never greater than either of the red or green channels').

There's a lot of noise in the image — places where only a few pixels are considered different, giving the image a “speckly” look. We can eliminate that by counting neighbours. The algorithm is simple: if a given pixel has fewer than one or two neighbours, we kill it. Much better:

First mole image after discrepancy threshold filtering and singleton elimination
First mole image after discrepancy threshold filtering and singleton elimination

The next step in the process is to sort the moles by size. I used a really cheesy algorithm and simply gave each pixel that was not white (so the red and green ones) a unique value. Then I “joined” all the neighbours, effectively coalescing them into clumps. I sorted the results by the number of pixels involved, and marked the top 5 clumps with a red square, and the next top 5 with a green square.

First mole image after discrepancy threshold filtering and singleton elimination
First mole image after discrepancy threshold filtering and singleton elimination

If you now compare this side-by-side with the second original picture, you'll see that we have the basis for comparing the major mole constellations:

First mole constellations
First mole constellations
Second mole constellations
Second mole constellations

Improvements planned for the next coding session are:

Stay tuned as more developments take place.