Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer differential privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to less than ideal privacy/utility tradeoffs, as we show here. To improve these tradeoffs, prior work introduces variants of differential privacy that weaken the privacy guarantee proved to increase model utility. We show this is not necessary and instead propose that utility be improved by choosing activation functions designed explicitly for privacy-preserving training. A crucial operation in differentially private SGD is gradient clipping, which along with modifying the optimization path (at times resulting in not-optimizing a single objective function), may also introduce both significant bias and variance to the learning process. We empirically identify exploding gradients arising from ReLU may be one of the main sources of this. We demonstrate analytically and experimentally how a general family of bounded activation functions, the tempered sigmoids, consistently outperform the currently established choice: unbounded activation functions like ReLU. Using this paradigm, we achieve new state-of-the-art accuracy on MNIST, FashionMNIST, and CIFAR10 without any modification of the learning procedure fundamentals or differential privacy analysis. While the changes we make are simple in retrospect, the simplicity of our approach facilitates its implementation and adoption to meaningfully improve state-of-the-art machine learning while still providing strong guarantees in the original framework of differential privacy.