AAAI Publications, Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

Extracting Information Types from Android Layout Code Using Sequence to Sequence Learning
Mitra Bokaei Hosseini, Xue Qin, Xiaoyin Wang, Jianwei Niu

Last modified: 2018-06-22


Mobile apps offer users with functionalities and services by collecting information in various ways. Android app manifest file and privacy policy are documents that provide users with guidelines about what information type is being collected. However, the information types mentioned in these files are often abstract and do not include fine-grained details about information collected through user input fields in apps. Existing approaches only focus on Android API method calls which can reveal collected information types from a general category of well-defined names. However, these approaches are unable to identify the information types based on direct user input as a major source of private information. These information types contain more sensitive data compared to API retrieved information types. Moreover, developers can design user input fields that refer to any kind of information which can also vary among different apps. To address these problems, we propose to apply natural language processing techniques to Android layout code to extract information types associated with user input fields.


GUI code analysis, natural language processing, deep learning, LSTM, privacy policy

