ResNet-style CNNs To Predict Freshwater Algae Blooms in Satellite Imagery: Mediocre Results

ResNet-style CNNs To Predict Freshwater Algae Blooms in Satellite Imagery: Mediocre Results

Although I have no domain experience with satellite imagery, I've used convolutional neural nets with aerial photography to recognize marine debris. So when I saw the DataDriven challenge 'Tick Tick Bloom' I took a glance at the dataset …

more ...

Sentiment Analysis of Mastodon Toots is Very Easy

The Mastodon API is very straightforward, as is the OpenAI API for its NLP models. I wrote a quick proof-of-concept program to do sentiment analysis of "toots.".

more ...

Re-Identifying Manta Rays

My current project is re-identifying individual manta rays (Mobula alfredi and Mobula birostris) by their distinct belly patterns … er… ventral markings.

Photo of manta ray 'Queenie' showing distinctive markings

Every night at two spots on the Big Island of Hawai’i where I live, dive boats shine bright lights that attract plankton. Most nights, the plankton in turn …

more ...

What We Talk About When We Talk About Attention

The "attention" in ML is "what you should attend to," not "alertness." In the sentence "They crossed the to get to the other bank." you need to "attend to" the word to disambiguate "bank". If is "street" then it's "bank" as in "financial institution" (most likely). If is "river" then …

more ...

How to train/test on a subset of your FastAI data

If you have a large FastAI (v2) DataLoaders and you're trying to debug something at epoch-scale (such as a custom metric), an easy way to train on a small subset of your data is:

subset_size = 100 # Or whatever
selected_items = np.random.choice(dls.train_ds.items, subset_size, replace=False)
# Swap in …
more ...

Large Language Models and the Chinese Room

The Chinese Room is a 1980 thought experiment from the philosopher John Searle. The Wikipedia summarizes the setup:

[S]uppose that artificial intelligence research has succeeded in constructing a computer that behaves as if it understands Chinese. It takes Chinese characters as input and, by following the instructions of a …

more ...

How I Failed At Kaggle Happywhale

I just deleted my intermediate data and models for Kaggle’s Happywhale competition. I did terrible, never getting much above pure random guessing. Which was frustrating, because it’s a problem in Machine Learning that I’m very interested in (and want to do more work in).

Happywhale, Sad Human …

more ...

Biggest mistake on Kaggle

Don’t join a team too quickly. Once you’re on a Kaggle team:

  • You cannot choose to leave
  • The team leader cannot choose to remove you

Unless you have a very good sense of what exactly your teammates are bringing to the competition, including their:

  • Knowledge level
  • Time commitment …
more ...

Installing Detectron2 on a Mac in CPU mode

At the risk of saying, “Yeah, it’s in the docs,” this is what I did. I think the crucial thing is installed things in the proper order, so I would advise going step-by-step:

  1. Have conda installed (brew install conda if not, I suppose)
  2. Create a conda environment with conda …
more ...

A Simple 3-Step AzureML Pipeline

Get the source code and data on Github

Illustration of pipeline graph

This demonstrates how you create a multistep AzureML pipeline using a series of PythonScriptStep objects.

In this case, the calculation is extremely trivial: predicting Iris species using scikit-learn's Gaussian Naive Bayes. This pipeline could be solved (very quickly) using this code:

import …
more ...