In distributed training of deep models, the transmission volume of stochastic gradients (SG) imposes a bottleneck in scaling up the number of processing nodes. On the other hand, the existing methods for compression of SGs have two major drawbacks. First, due to the increase in the overall variance of the compressed SG, the hyperparameters of the learning algorithm must be readjusted to ensure the convergence of the training. Further, the convergence rate of the resulting algorithm still would be adversely affected. Second, for those approaches for which the compressed SG values are biased, there is no guarantee for the learning convergence and thus an error feedback is often required. We propose Quantized Compressive Sampling (QCS) of SG that addresses the above two issues while achieving an arbitrarily large compression gain. We introduce two variants of the algorithm: Unbiased-QCS and MMSE-QCS and show their superior performance w.r.t. other approaches. Specifically, we show that for the same number of communication bits, the convergence rate is improved by a factor of 2 relative to state of the art. Next, we propose to improve the convergence rate of the distributed training algorithm via a weighted error feedback. Specifically, we develop and analyze a method to both control the overall variance of the compressed SG and prevent the staleness of the updates. Finally, through simulations, we validate our theoretical results and establish the superior performance of the proposed SG compression in the distributed training of deep models. Our simulations also demonstrate that our proposed compression method expands substantially the region of step-size values for which the learning algorithm converges.