Flight delays impact airlines, airports and passengers. Delay prediction is crucial during the decision-making process for all players in commercial aviation, and in particular for airlines to meet their on-time performance objectives. Although many machine learning approaches have been experimented with, they fail in (i) predicting delays in minutes with low errors (less than 15 minutes), (ii) being applied to small carriers i.e., low cost companies characterized by a small amount of data. This work presents a Long Short-Term Memory (LSTM) approach to predicting flight delay, modeled as a sequence of flights across multiple airports for a particular aircraft throughout the day. We then suggest a transfer learning approach between heterogeneous feature spaces to train a prediction model for a given smaller airline using the data from another larger airline. Our approach is demonstrated to be robust and accurate for low cost airlines in Europe.