New construction of ensemble classifiers for imbalanced datasets

Research output: Contribution to report/book/conference proceedingsIn-proceedings paper

Authors

Documents & links

Abstract

Learning in the presence of data imbalances presents a great challenge to machine learning. Imbalanced data sets represent a significant problem because the corresponding classifier has a tendency to ignore samples which have smaller representation in the training sets. In this paper, we propose an ensemble-based learning algorithm as a new ensemble classifier model called as SVM-C5.0 Ensemble Classifier Model, SCECM. SCECM adopts a differentiated sampling rate algorithm (DSRA) based on an improved Adaboost algorithm and further employs unique classifier-selection strategy, novel classifier integration approach and original classification decision-making method. Comparative experimental results show that the proposed approach improves performance for the minority class while preserving the ability to recognize examples from the majority classes.

Details

Original languageEnglish
Title of host publicationProceedings of 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering
Place of PublicationBeijing, China
Pages228-233
StatePublished - Nov 2010
Event2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering - Hangzhou, China

Conference

Conference2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering
CountryChina
CityHangzhou
Period2010-11-152010-11-16

Keywords

  • data mining, classification in imbalanced datasets, heterogeneous classifier, differentiated sampling rate, ensemble model of classifiers

ID: 289689