Potential crowd flow prediction for new planned transportation sites is a fundamental task for urban planners and administrators. Intuitively, the potential crowd flow of the new coming site can be implied by exploring the nearby sites. However, the transportation modes of nearby sites (e.g. bus stations, bicycle stations) might be different from the target site (e.g. subway station), which results in severe data scarcity issues. To this end, we propose a data-driven approach, named MOHER, to predict the potential crowd flow in a certain mode for a new planned site. Specifically, we first identify the neighbor regions of the target site by examining the geographical proximity as well as the urban function similarity. Then, to aggregate these heterogeneous relations, we devise a cross-mode relational GCN, a novel relation-specific transformation model, which can learn not only the correlation but also the differences between different transportation modes. Afterward, we design an aggregator for inductive potential flow representation. Finally, an LTSM module is used for sequential flow prediction. Extensive experiments on real-world data sets demonstrate the superiority of the MOHER framework compared with the state-of-the-art algorithms.