Automatic text summarization focuses on distilling summary information from texts. This research field has been considerably explored over the past decades because of its significant role in many natural language processing tasks; however, two challenging issues block its further development: (1) how to yield a summarization model embedding topic inference rather than extending with a pre-trained one and (2) how to merge the latent topics into diverse granularity levels. In this study, we propose a variational hierarchical model to holistically address both issues, dubbed VHTM. Different from the previous work assisted by a pre-trained single-grained topic model, VHTM is the first attempt to jointly accomplish summarization with topic inference via variational encoder-decoder and merge topics into multi-grained levels through topic embedding and attention. Comprehensive experiments validate the superior performance of VHTM compared with the baselines, accompanying with semantically consistent topics.