ResNet-style CNNs To Predict Freshwater Algae Blooms in Satellite Imagery: Mediocre Results

ResNet-style CNNs To Predict Freshwater Algae Blooms in Satellite Imagery: Mediocre Results

Although I have no domain experience with satellite imagery, I've used convolutional neural nets with aerial photography to recognize marine debris. So when I saw the DataDriven challenge 'Tick Tick Bloom' I took a glance at the dataset and decided to put a few days of effort into it. One thing I liked about the challenge was that the dataset was very straightforward: you can use satellite visual data, date, latitude & longitude, and a specific feed of weather data. The target variable is equally straightforward: a classification, on a scale of 1-5, of cyanobacteria density. Rather than a classifier, though, I predicted a floating point value and rounded to the nearest integer. The challenge comes with a baseline solution based on LightGBM that produces a mean error of about 1.5 (which is pretty poor, considering that "5" is a rare result, so mostly you're in the business of guessing 1-4!). The baseline uses visible spectrum bands, and my first iteration used the same RGB data in a ResNet-style CNN. Depending on the platform and spectral band, the Landsat and Sentinel satellite data come in 5-, 10, and 20- meter resolutions. Intuitively, I grabbed small areas around the sample point, which made the images tiny, batch sizes big, and training very fast. If I recall correctly, I used a straight ResNet backbone and transfer learning on this first iteration, figuring that the textures and shapes known by the early layers were probably helpful. This initial model did a little bit better than the LightGBM model. Preliminary data evaluation showed the non-surprising result that seasonal imagery varied widely, so in my next iteration, I took the day of the year of the imagery, calculated the sin and cos of that as a percentage of the year, expecting that this might create a reasonable "positional" encoding of an annual cycle. I also normalized the latitude and longitude and passed that to the head as well.
So now I had a model with a ResNet backbone, "time of year," and "location" to pass to the head, which was just a few fully-connected layers outputting the single floating-point "severity" prediction. This model did quite well, putting me in 4th place in the early running and right in the middle of the 2nd-tier competitors (there's 1 competitor who's a good 15% better than any of the others. I expect them to win!). It seemed a no-brainer to add additional spectral bands to the model, as well as the available "quality" masks that alert users to missing data or cloud cover (which is a very common issue when looking at a small spatial area). Downloading all that data was itself a chore that consumed two days of my time budget. I expected to be able to use FastAI's CNN support, basically passing in a [batchsize, channel_count,width,height] tensor with a channel_count of around 8. But it seems like FastAI has a hard-wired expectation for 3 channels (or 1 channel) when it comes to CNNs. So I ended up writing a custom PyTorch module to convolve over the bands. This had the downside that I couldn't get a jumpstart by transfer-loading weights (well, I could, but not easily), but the upside that I ended up with weights "dedicated" to each band and resolution. Since the bands are a little different between Landsat and Sentinel platforms, I ended up creating channels for both and added a band_available mask Tensor.   With January flying by and other commitments piling up, I trained up the model with a strict admonition to myself that if it didn't train well enough to re-establish myself in the top tier of the competition, I would have to walk away. Well, it didn't. Much to my surprise, the addition of all those additional bands hardly budged the final error levels. With more competitors developing competitive models, I dropped out of the top dozen scores. Meanwhile, the naive way that I had just stored the band data as Numpy arrays in separate files led to my training being bottlenecked by IO. So even though there is some fairly low-hanging fruit in terms of data augmentation, pre-judging image quality, and architecture search, the first thing I'd have to do is a performance iteration. But... well, it's the season for both FIRST Robotics and the annual humpback whale migration, so it's time to move on!