Image Classification

From Dogs and Cats to Pet Breeds

from fastai.vision.all import *
path = untar_data(URLs.PETS)
Path.BASE_PATH = path
path.ls(), (path/"images").ls()
((#2) [Path('images'),Path('annotations')],
 (#7393) [Path('images/miniature_pinscher_199.jpg'),Path('images/newfoundland_183.jpg'),Path('images/pomeranian_90.jpg'),Path('images/pomeranian_102.jpg'),Path('images/japanese_chin_74.jpg'),Path('images/yorkshire_terrier_45.jpg'),Path('images/chihuahua_34.jpg'),Path('images/american_pit_bull_terrier_150.jpg'),Path('images/wheaten_terrier_160.jpg'),Path('images/staffordshire_bull_terrier_91.jpg')...])
fname = (path/"images").ls()[0]
fname, re.findall(r'(.+)_\d+.jpg$', fname.name)
(Path('images/miniature_pinscher_199.jpg'), ['miniature_pinscher'])
pets = DataBlock(
    blocks = (ImageBlock, CategoryBlock),
    get_items=get_image_files, 
    splitter=RandomSplitter(seed=42),
    get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
    item_tfms=Resize(460),
    batch_tfms=aug_transforms(size=224, min_scale=0.75)
)
dls = pets.dataloaders(path/"images")
f = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name')
f(fname)
'miniature_pinscher'

Presizing

%cd /notebooks/fastbook
/notebooks/fastbook
dblock1 = DataBlock(
    blocks=(
        ImageBlock(),
        CategoryBlock()
    ),
    get_y=parent_label,
    item_tfms=Resize(460)
)
# Place an image in the 'images/grizzly.jpg' subfolder where this notebook is located before running this
dls1 = dblock1.dataloaders([(Path.cwd()/'images'/'grizzly.jpg')]*100, bs=8)
dls1.train.get_idxs = lambda: Inf.ones
x,y = dls1.valid.one_batch()
_,axs = subplots(1, 2)

x1 = TensorImage(x.clone())
x1 = x1.affine_coord(sz=224)
x1 = x1.rotate(draw=30, p=1.)
x1 = x1.zoom(draw=1.2, p=1.)
x1 = x1.warp(draw_x=-0.2, draw_y=0.2, p=1.)

tfms = setup_aug_tfms([
    Rotate(draw=30, p=1, size=224),
    Zoom(draw=1.2, p=1., size=224),
    Warp(draw_x=-0.2, draw_y=0.2, p=1., size=224)
])
x = Pipeline(tfms)(x)
#x.affine_coord(coord_tfm=coord_tfm, sz=size, mode=mode, pad_mode=pad_mode)
TensorImage(x[0]).show(ctx=axs[0])
TensorImage(x1[0]).show(ctx=axs[1]);

How 'parent_label()' works

fname, parent_label(fname)
(Path('images/miniature_pinscher_199.jpg'), 'images')
Path.cwd(), Path.cwd().ls()
(Path('/notebooks/fastbook'),
 (#66) [Path('/notebooks/fastbook/requirements.txt'),Path('/notebooks/fastbook/08_collab.py'),Path('/notebooks/fastbook/CODE_OF_CONDUCT.md'),Path('/notebooks/fastbook/03_ethics.py'),Path('/notebooks/fastbook/README_es.md'),Path('/notebooks/fastbook/11_midlevel_data.py'),Path('/notebooks/fastbook/20_conclusion.ipynb'),Path('/notebooks/fastbook/14_resnet.py'),Path('/notebooks/fastbook/README_vn.md'),Path('/notebooks/fastbook/16_accel_sgd.py')...])

Checking and Debugging a DataBlock

dls.show_batch(nrows=1, ncols=3)

added 'Resize(460)' for GPU, Otherwise 'summary()' doesn't work

pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
                 item_tfms=Resize(460)) # same size for GPU
pets1.summary(path/"images")
Setting-up type transforms pipelines
Collecting items from /root/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}

Building one sample
  Pipeline: PILBase.create
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/shiba_inu_180.jpg
    applying PILBase.create gives
      PILImage mode=RGB size=500x375
  Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/shiba_inu_180.jpg
    applying partial gives
      shiba_inu
    applying Categorize -- {'vocab': None, 'sort': True, 'add_na': False} gives
      TensorCategory(33)

Final sample: (PILImage mode=RGB size=500x375, TensorCategory(33))


Collecting items from /root/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
Setting up after_item: Pipeline: Resize -- {'size': (460, 460), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0} -> ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}

Building one batch
Applying item_tfms to the first sample:
  Pipeline: Resize -- {'size': (460, 460), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0} -> ToTensor
    starting from
      (PILImage mode=RGB size=500x375, TensorCategory(33))
    applying Resize -- {'size': (460, 460), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0} gives
      (PILImage mode=RGB size=460x460, TensorCategory(33))
    applying ToTensor gives
      (TensorImage of size 3x460x460, TensorCategory(33))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}
    starting from
      (TensorImage of size 4x3x460x460, TensorCategory([33, 23, 18,  4], device='cuda:0'))
    applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
      (TensorImage of size 4x3x460x460, TensorCategory([33, 23, 18,  4], device='cuda:0'))
learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(2)
epoch train_loss valid_loss error_rate time
0 1.481196 0.374433 0.122463 00:29
epoch train_loss valid_loss error_rate time
0 0.514027 0.305406 0.100135 00:38
1 0.338844 0.226652 0.073748 00:39

Cross-Entropy Loss

Viewing Activations and Labels

dls.vocab[2]
'Birman'
x,y = dls.one_batch()
preds,y1 = learn.get_preds(dl=[(x,y)])
y, preds[0]
(TensorCategory([14,  6, 24, 33,  2, 15,  7, 10, 19, 34, 16,  8, 10,  0, 34, 13, 31, 13, 26, 30,  2,  9, 17, 32, 26, 14, 14, 32, 24, 34, 28, 16,  6, 14,  3, 10, 34,  5,  5, 11, 36, 24, 35, 11, 30, 35, 23,  3,
         34,  0, 16, 17, 23,  2, 23, 29, 22, 13, 27, 11, 25,  8, 18, 29], device='cuda:0'),
 TensorBase([7.2221e-04, 5.1115e-05, 3.6684e-05, 3.7296e-06, 3.6732e-04, 1.9829e-04, 1.9351e-05, 7.4164e-05, 1.5373e-06, 3.3986e-04, 5.7419e-06, 8.2857e-06, 2.3575e-04, 2.1627e-04, 9.8651e-01, 2.7815e-03,
         3.9936e-05, 2.9327e-06, 3.8810e-03, 7.3829e-05, 4.0817e-04, 4.3759e-04, 1.2597e-05, 7.3710e-06, 2.7713e-06, 1.2360e-04, 5.5445e-05, 1.8330e-04, 3.3123e-05, 2.6849e-05, 9.8329e-05, 3.2327e-05,
         4.6130e-05, 6.5665e-05, 9.1696e-05, 2.7759e-03, 2.4891e-05]))
len(preds[0]),preds[0].sum()
(37, TensorBase(1.))

Softmax

plot_function(torch.sigmoid, min=-4,max=4)
plot_function(torch.exp, min=-4,max=4)
acts = torch.randn((6,2))*2
acts
tensor([[ 0.6734,  0.2576],
        [ 0.4689,  0.4607],
        [-2.2457, -0.3727],
        [ 4.4164, -1.2760],
        [ 0.9233,  0.5347],
        [ 1.0698,  1.6187]])
acts.sigmoid()
tensor([[0.6623, 0.5641],
        [0.6151, 0.6132],
        [0.0957, 0.4079],
        [0.9881, 0.2182],
        [0.7157, 0.6306],
        [0.7446, 0.8346]])
torch.exp(acts[:,0]) / torch.exp(acts).sum(dim=1)
tensor([0.6025, 0.5021, 0.1332, 0.9966, 0.5959, 0.3661])
sm_acts = torch.softmax(acts, dim=1)
sm_acts
tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])
sm_acts.sum(dim=1)
tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000])

Log Likelihood

targ = tensor([0,1,0,1,1,0])
sm_acts
tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])
idx = range(6)
sm_acts[idx, targ]
tensor([0.6025, 0.4979, 0.1332, 0.0034, 0.4041, 0.3661])
from IPython.display import HTML
df = pd.DataFrame(sm_acts, columns=["3","7"])
df['targ'] = targ
df['idx'] = idx
df['result'] = sm_acts[range(6), targ]
t = df.style.hide_index()
#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))
/tmp/ipykernel_5100/1219388098.py:6: FutureWarning: this method is deprecated in favour of `Styler.hide(axis='index')`
  t = df.style.hide_index()
3 7 targ idx result
0.602469 0.397531 0 0 0.602469
0.502065 0.497935 1 1 0.497935
0.133188 0.866811 0 2 0.133188
0.996640 0.003360 1 3 0.003360
0.595949 0.404051 1 4 0.404051
0.366118 0.633882 0 5 0.366118
-sm_acts[idx, targ]
tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])
F.nll_loss(sm_acts, targ, reduction='none')
tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])

Taking the Log

Recall that cross entropy loss may involve the multiplication of many numbers. Multiplying lots of negative numbers together can cause problems like numerical underflow in computers. Therefore, we want to transform these probabilities to larger values so we can perform mathematical operations on them. There is a mathematical function that does exactly this: the logarithm (available as torch.log). It is not defined for numbers less than 0, and looks like this between 0 and 1:

plot_function(torch.log, min=0,max=1, ty='log(x)', tx='x')
plot_function(lambda x: -1*torch.log(x), min=0,max=1, tx='x', ty='- log(x)', title = 'Log Loss when true label = 1')
from IPython.display import HTML
df['loss'] = -torch.log(tensor(df['result']))
t = df.style.hide_index()
#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))
/tmp/ipykernel_5100/2201212877.py:3: FutureWarning: this method is deprecated in favour of `Styler.hide(axis='index')`
  t = df.style.hide_index()
3 7 targ idx result loss
0.602469 0.397531 0 0 0.602469 0.506720
0.502065 0.497935 1 1 0.497935 0.697285
0.133188 0.866811 0 2 0.133188 2.015990
0.996640 0.003360 1 3 0.003360 5.695763
0.595949 0.404051 1 4 0.404051 0.906213
0.366118 0.633882 0 5 0.366118 1.004798

Negative Log Likelihood

loss_func = nn.CrossEntropyLoss()
loss_func(acts, targ)
tensor(1.8045)
F.cross_entropy(acts, targ)
tensor(1.8045)
nn.CrossEntropyLoss(reduction='none')(acts, targ)
tensor([0.5067, 0.6973, 2.0160, 5.6958, 0.9062, 1.0048])
acts, targ
(tensor([[ 0.6734,  0.2576],
         [ 0.4689,  0.4607],
         [-2.2457, -0.3727],
         [ 4.4164, -1.2760],
         [ 0.9233,  0.5347],
         [ 1.0698,  1.6187]]),
 tensor([0, 1, 0, 1, 1, 0]))

Model Interpretation

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
interp.most_confused(min_val=5)
[('Bengal', 'Egyptian_Mau', 9),
 ('american_pit_bull_terrier', 'staffordshire_bull_terrier', 7),
 ('basset_hound', 'beagle', 5)]

Improving Our Model

The Learning Rate Finder

learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1, base_lr=0.1)
epoch train_loss valid_loss error_rate time
0 2.801034 7.120271 0.574425 00:30
epoch train_loss valid_loss error_rate time
0 3.409698 1.739531 0.510825 00:38
learn = vision_learner(dls, resnet34, metrics=error_rate)
lr_min,lr_steep = learn.lr_find(suggest_funcs=(minimum, steep))
print(f"Minimum/10: {lr_min:.2e}, steepest point: {lr_steep:.2e}")
Minimum/10: 8.32e-03, steepest point: 4.37e-03
(lr_min + lr_steep)/2.
0.006341397855430841
learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(3, base_lr=3e-3)
epoch train_loss valid_loss error_rate time
0 1.287065 0.331208 0.102165 00:29
epoch train_loss valid_loss error_rate time
0 0.522278 0.414550 0.127876 00:38
1 0.402920 0.259237 0.082544 00:38
2 0.214323 0.236262 0.073748 00:39
learn.recorder.plot_loss()
plt.plot(L((learn.recorder).values).itemgot(2))
[<matplotlib.lines.Line2D at 0x7f1fd92f8fa0>]

Unfreezing and Transfer Learning

learn.fine_tune??
Signature:
learn.fine_tune(
    epochs,
    base_lr=0.002,
    freeze_epochs=1,
    lr_mult=100,
    pct_start=0.3,
    div=5.0,
    lr_max=None,
    div_final=100000.0,
    wd=None,
    moms=None,
    cbs=None,
    reset_opt=False,
    start_epoch=0,
)
Source:   
@patch
@delegates(Learner.fit_one_cycle)
def fine_tune(self:Learner, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100,
              pct_start=0.3, div=5.0, **kwargs):
    "Fine tune with `Learner.freeze` for `freeze_epochs`, then with `Learner.unfreeze` for `epochs`, using discriminative LR."
    self.freeze()
    self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
    base_lr /= 2
    self.unfreeze()
    self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)
File:      ~/mambaforge/lib/python3.9/site-packages/fastai/callback/schedule.py
Type:      method
learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fit_one_cycle(3, 3e-3)
epoch train_loss valid_loss error_rate time
0 1.136077 0.414258 0.139378 00:29
1 0.509912 0.293178 0.089986 00:29
2 0.330361 0.248266 0.076455 00:29
learn.unfreeze()
learn.lr_find()
SuggestedLRs(valley=6.918309736647643e-06)
learn.fit_one_cycle(6, lr_max=1e-5)
epoch train_loss valid_loss error_rate time
0 0.263112 0.242040 0.077131 00:38
1 0.232440 0.230828 0.069689 00:39
2 0.217150 0.227805 0.069689 00:39
3 0.206957 0.220832 0.071042 00:39
4 0.202025 0.217777 0.070365 00:39
5 0.178928 0.219291 0.066306 00:40
learn.recorder.plot_loss()

Discriminative Learning Rates

learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fit_one_cycle(3, 3e-3)
learn.unfreeze()
learn.fit_one_cycle(12, lr_max=slice(1e-6,1e-4))
epoch train_loss valid_loss error_rate time
0 1.125234 0.340846 0.100135 00:30
1 0.538148 0.255423 0.079161 00:30
2 0.338863 0.227787 0.072395 00:29
epoch train_loss valid_loss error_rate time
0 0.270723 0.225311 0.074425 00:39
1 0.258127 0.216694 0.071042 00:39
2 0.249060 0.215301 0.069689 00:39
3 0.208266 0.211926 0.065629 00:39
4 0.184873 0.213426 0.067659 00:39
5 0.168494 0.208151 0.066306 00:39
6 0.158416 0.201037 0.066306 00:39
7 0.161552 0.205724 0.063599 00:39
8 0.138080 0.204028 0.060217 00:39
9 0.133944 0.204857 0.064276 00:39
10 0.125342 0.200153 0.064276 00:39
11 0.134450 0.201896 0.063599 00:39
plt.plot(L(learn.recorder.values).itemgot(2))
[<matplotlib.lines.Line2D at 0x7f1f430fd670>]
learn.recorder.plot_loss()

Selecting the Number of Epochs

Deeper Architectures

from fastai.callback.fp16 import *
learn = vision_learner(dls, resnet50, metrics=error_rate).to_fp16()
learn.fine_tune(6, freeze_epochs=3)
Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
epoch train_loss valid_loss error_rate time
0 1.309392 0.298439 0.094046 00:24
1 0.599204 0.314211 0.095399 00:24
2 0.431089 0.283283 0.087280 00:24
epoch train_loss valid_loss error_rate time
0 0.285965 0.235522 0.073748 00:29
1 0.303820 0.411849 0.108254 00:29
2 0.256097 0.265159 0.077131 00:28
3 0.158060 0.272260 0.075101 00:28
4 0.100597 0.207765 0.060217 00:29
5 0.060010 0.197659 0.061570 00:29

Conclusion

Questionnaire

  1. Why do we first resize to a large size on the CPU, and then to a smaller size on the GPU?
  2. If you are not familiar with regular expressions, find a regular expression tutorial, and some problem sets, and complete them. Have a look on the book's website for suggestions.
  3. What are the two ways in which data is most commonly provided, for most deep learning datasets?
  4. Look up the documentation for L and try using a few of the new methods that it adds.
  5. Look up the documentation for the Python pathlib module and try using a few methods of the Path class.
  6. Give two examples of ways that image transformations can degrade the quality of the data.
  7. What method does fastai provide to view the data in a DataLoaders?
  8. What method does fastai provide to help you debug a DataBlock?
  9. Should you hold off on training a model until you have thoroughly cleaned your data?
  10. What are the two pieces that are combined into cross-entropy loss in PyTorch?
  11. What are the two properties of activations that softmax ensures? Why is this important?
  12. When might you want your activations to not have these two properties?
  13. Calculate the exp and softmax columns of <> yourself (i.e., in a spreadsheet, with a calculator, or in a notebook).</li>
  14. Why can't we use torch.where to create a loss function for datasets where our label can have more than two categories?
  15. What is the value of log(-2)? Why?
  16. What are two good rules of thumb for picking a learning rate from the learning rate finder?
  17. What two steps does the fine_tune method do?
  18. In Jupyter Notebook, how do you get the source code for a method or function?
  19. What are discriminative learning rates?
  20. How is a Python slice object interpreted when passed as a learning rate to fastai?
  21. Why is early stopping a poor choice when using 1cycle training?
  22. What is the difference between resnet50 and resnet101?
  23. What does to_fp16 do?
  24. </ol> </div> </div> </div>

    Further Research

    1. Find the paper by Leslie Smith that introduced the learning rate finder, and read it.
    2. See if you can improve the accuracy of the classifier in this chapter. What's the best accuracy you can achieve? Look on the forums and the book's website to see what other students have achieved with this dataset, and how they did it.
    </div>