How to train/test on a subset of your FastAI data

If you have a large FastAI (v2) DataLoaders and you're trying to debug something at epoch-scale (such as a custom metric), an easy way to train on a small subset of your data is:

subset_size = 100 # Or whatever
selected_items = np.random.choice(dls.train_ds.items, subset_size …
more ...

Large Language Models and the Chinese Room

The Chinese Room is a 1980 thought experiment from the philosopher John Searle. The Wikipedia summarizes the setup:

[S]uppose that artificial intelligence research has succeeded in constructing a computer that behaves as if it understands Chinese. It takes Chinese characters as input and, by following the instructions of a …

more ...

How I Failed At Kaggle Happywhale

I just deleted my intermediate data and models for Kaggle’s Happywhale competition. I did terrible, never getting much above pure random guessing. Which was frustrating, because it’s a problem in Machine Learning that I’m very interested in (and want to do more work in).

Happywhale, Sad Human …

more ...

Biggest mistake on Kaggle

Don’t join a team too quickly. Once you’re on a Kaggle team:

  • You cannot choose to leave
  • The team leader cannot choose to remove you

Unless you have a very good sense of what exactly your teammates are bringing to the competition, including their:

  • Knowledge level
  • Time commitment …
more ...

Installing Detectron2 on a Mac in CPU mode

At the risk of saying, “Yeah, it’s in the docs,” this is what I did. I think the crucial thing is installed things in the proper order, so I would advise going step-by-step:

  1. Have conda installed (brew install conda if not, I suppose)
  2. Create a conda environment with conda …
more ...

A Simple 3-Step AzureML Pipeline

Get the source code and data on Github

Illustration of pipeline graph

This demonstrates how you create a multistep AzureML pipeline using a series of PythonScriptStep objects.

In this case, the calculation is extremely trivial: predicting Iris species using scikit-learn's Gaussian Naive Bayes. This pipeline could be solved (very quickly) using this code:

import …
more ...

From Whalesharks To Leopard Spheres

One of my big weekend projects (for longer than I care to think) has been trying to create a pipeline for identifying individual whalesharks from photos. The project had kind of grown moribund as I repeatedly failed to get any decent level of recognition despite using what I thought was …

more ...

Humpback Whale Identification By Deep Learning

OMG I can't believe I missed this competition. It's exactly in line with my whaleshark identification project.

https://www.kaggle.com/c/humpback-whale-identification

Here's a baby humpback whale that swam up to the boat a few weeks ago. In the second photo you can see it popped it's head up …

more ...