Proceedings:
Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, 1993
Volume
Issue:
Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, 1993
Track:
Contents
Downloads:
Abstract:
We introduce a parallel approach, "DT-SELECT," for selecting features used by inductive learning algorithms to predict protein secondary structure. DT-SELECT is able to rapidly choose small, nonredundant feature sets from pools containing hundreds of thonsands of potentially useful features. It does this by building a decision tree, using features from the pool, that classifies a set of training examples. The features included in the tree provide a compact description of the training data and are thus suitable for use as inputs to other inductive learning algorithms. Empirical experiments in the protein secondary-structure task, in which sets of complex features chosen by DTSELECT are used to augment a standard artificial neural network representation, yield surprisingly little performance gain, even though features are selected from very large feature pools. We discuss some possible reasons for this result. 1
ISMB
Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, 1993