Fast and Reproducible Deep Learning

Best practices for managing deep learning projects, including tools for processing large datasets, managing dependencies, and tracking experiments.
deep learning
ml
computer vision
wildebeest
oss
Author

Greg Gandenberger

Published

March 26, 2020

Our team’s open-source Wildebeest library is one tool for managing deep learning projects — it makes processing large datasets fast and easy. Photo Credit: Gopal Vijayaraghavan cc

There are endless resources for someone who wants to learn to train a deep learning model, but running a successful deep learning project requires managing many additional moving parts that are much less discussed. This talk contributes to filling that gap in our deep learning education resources.

Thanks to the Chicago ML Meetup for hosting.

Note

Note: The presentation refers to the “Creevey” library. That library has been renamed “Wildebeest.” It also mentions “Tonks”, which has been renamed “Octopod.” Our team previously had a tradition of naming projects with terms or characters from the Harry Potter series, but we renamed them in response to J.K. Rowling’s persistent transphobic comments.

Video

Slides

Abstract

Deep learning projects require managing large datasets, heavy-duty dependencies, complex experiments, and large amounts of code. This talk provides best practices for accomplishing these tasks efficiently and reproducibly. Tools that are covered include:

Additional Resources

Autofocus is a deep learning project that labels animals in images taken by motion-activated “camera traps.” It illustrates many of the ideas discussed in the talk.