Raven and Song Scope are two, state-of-the-art automated sound analysis tools, based on machine learning techniques for detection of species vocalisations. Individually, these systems have been the subject of a number of reviews; however, to date there have been no comparisons made of their relative performance. This paper compares the tools based on six aspects: theory, software interface, ease of use, detection targets, detection accuracy, and potential applications. Examining these tools, we identified that they fail to detect both syllables and call structures, since Raven only aims to detect syllables while Song Scope targets call structures. Therefore, a Timed Probabilistic Automata (TPA) system is proposed which separates syllables and clusters them into complex structures.