We describe an integrated approach for statistical modeling of discourse structure for natural conversational speech. Our model is based on 42 ~dialog acts’ (e.g., Statement, Question, Backchannel, Agreement, Disagreement, Apology), which were hand-labeled in 1155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We developed several models and algorithms to automatically detect dialog acts from transcribed or automatically recognized words and from prosodic properties of the speech signal, and by using a statistical discourse grammar. All of these components were probabilistic in nature and estimated from data, employing a variety of techniques (hidden Markov models, N-gram language models, maximum entropy estimation, decision tree classifiers, and neural networks). In preliminary studies, we achieved a dialog act labeling accuracy of 65% based on recognized words and prosody, and an accuracy of 72~o based on word transcripts. Since humans achieve 84% on this task (with chance performance at 35%) we find these results encouraging.