In some domains (e.g., molecular biology), data repositories are large in size, dynamic, and physically distributed. Consequently, it is neither desirable nor feasible to gather all the data in a centralized location for analysis. Hence, efficient distributed learning algorithms that can operate across multiple data sources without the need to transmit large amounts of data and cumulative learning algorithms that can cope with data sets that grow at rapid rate are needed. We formulate a class of distributed and cumulative learning problems, and present a general strategy for transforming a large class of traditional machine learning algorithms into distributed and cumulative learning algorithms. Our general strategy is based on a decomposition of the learning task into information extraction and hypothesis generation components. We use this approach to construct provably exact distributed algorithms for support vector machines and also for decision tree learning. We formalize the treatment of distributed learning by introducing a family of learning, information extraction and information composition operators and establishing sufficient conditions for provably exact distributed and cumulative learning in terms of general algebraic properties of the operators.