A common approach in machine learning is to use a large amount of labeled data to train a model. Usually this model can then only be used to classify data in the same feature space. However, labeled data is often expensive to obtain. A number of strategies have been developed by the machine learning community in recent years to address this problem, including: semi-supervised learning,domain adaptation,multi-task learning,and self-taught learning. While training data and test may have different distributions, they must remain in the same feature set. Furthermore, all the above methods work in the same feature space. In this paper, we consider an extreme case of transfer learning called heterogeneous transfer learning — where the feature spaces of the source task and the target tasks are disjoint. Previous approaches mostly fall in the multi-view learning category, where co-occurrence data from both feature spaces is required. We generalize the previous work on cross-lingual adaptation and propose a multi-task strategy for the task. We also propose the use of a restricted Boltzmann machine (RBM), a special type of probabilistic graphical models, as an implementation. We present experiments on two tasks: action recognition and cross-lingual sentiment classification.