DescriptionParallel and distributed computing are becoming necessary in almost all aspects of computation. Due to this growing demand, curriculum initiatives have been developed for integrating parallel and distributed computing into traditional undergraduate computer science programs. However, adoption has been slow resulting in many students lacking proper training for parallel and distributed computing. Two potential barriers for slow adoption are a deficiency in example programs that step students through the processes of parallelizing serial code, and the inaccessibility of dedicated machines to run highly parallel programs at scale within the confines of a course schedule. We have developed course material using a simple two-dimensional Lattice-Boltzmann Method Computational Fluid Dynamic simulation to walk students though shared memory parallelism, distributed memory parallelism, and hybrid parallel execution. We also created a custom mini-cluster comprised of 16 credit-card sized compute nodes, with a total of 288 cores, as an inexpensive solution for testing the scalability of different parallel models that can be deployed in a classroom setting.