Blog - Fast and Reproducible Deep Learning

Our team’s open-source Wildebeest library is one tool for managing deep learning projects — it makes processing large datasets fast and easy. Photo Credit: Gopal Vijayaraghavan cc

There are endless resources for someone who wants to learn to train a deep learning model, but running a successful deep learning project requires managing many additional moving parts that are much less discussed. This talk contributes to filling that gap in our deep learning education resources.

Thanks to the Chicago ML Meetup for hosting.

Note

Note: The presentation refers to the “Creevey” library. That library has been renamed “Wildebeest.” It also mentions “Tonks”, which has been renamed “Octopod.” Our team previously had a tradition of naming projects with terms or characters from the Harry Potter series, but we renamed them in response to J.K. Rowling’s persistent transphobic comments.

Video

Slides

Abstract

Deep learning projects require managing large datasets, heavy-duty dependencies, complex experiments, and large amounts of code. This talk provides best practices for accomplishing these tasks efficiently and reproducibly. Tools that are covered include:

The Wildebeest library for processing large collections of files
pip-tools and nvidia-docker for managing dependencies
MLflow Tracking for tracking experiments

Additional Resources

Autofocus is a deep learning project that labels animals in images taken by motion-activated “camera traps.” It illustrates many of the ideas discussed in the talk.