A theoretical and experimental approach to the use of information theory in input space selection for modeling and diagnostic applications is examined. The assumptions and test cases used throughout the paper are specifically tailored to diesel engine diagnostic and modeling applications. This work seeks to quantify the amount of information about an output contained within an input space. The information theoretic quantity, conditional entropy, is shown to be an accurate predictor of model and diagnostic algorithm performance and therefore is a good choice for an input vector selection metric. Methods of estimating conditional entropy from collected data, including the amount of needed data, are also discussed.

1.
Fukunaga
,
K.
, 1990,
Introduction to Statistical Pattern Recognition
,
Academic
,
New York
.
2.
Baum
,
E.
, and
Haussler
,
D.
, 1989, “
What Size Net Gives Valid Generalization?
,”
Neural Comput.
0899-7667,
1
(
1
), pp.
151
160
.
3.
German
,
S.
,
Bienenstock
,
E.
, and
Doursat
,
R.
, 1992, “
Neural Networks and the Bias/Variance Dilemma
,”
Neural Comput.
0899-7667,
4
(
1
), pp.
1
58
.
4.
Battiti
,
R.
, 1994, “
Using Mutual Information for Selecting Features in Supervised Neural Net Learning
,”
IEEE Trans. Neural Netw.
1045-9227,
5
(
4
), pp.
537
550
.
5.
Utans
,
J.
,
Moody
,
J.
,
Rehfuss
,
S.
, and
Siegelmann
,
H.
, 1995, “
Input Variable Selection for Neural Networks: Application to Predicting the U. S. Business Cycle
,”
IEEE/IAFE Conference on Computational Intelligence for Financial Engineering
, New York, NY.
6.
MacKay
,
J.
, 1995, “
Neural Networks Summer School
,”
University of Cambridge Programme for Industry
, Short Course, July 24–25.
7.
Bonnlander
,
B.
, 1996, “
Nonparametric Selection of Input Variables for Connectionist Learning
,” Ph.D. Thesis, University of Colorado.
8.
Yang
,
H.
, and
Moody
,
J.
, 1999, “
Feature Selection Based on Joint Mutual Information
,”
Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis
,
Rochester, NY
.
9.
Deignan
,
P.
,
Meckl
,
P.
,
Franchek
,
M.
,
Jaliwala
,
S.
, and
Zhu
,
G.
, 2000, “
Input Vector Identification and System Model Construction by Average Mutual Information
,”
Proceedings of the ASME Dynamic Systems and Control Division
, Vol.
1
, pp.
379
382
.
10.
Shannon
,
C.
, 1948, “
A Mathematical Theory of Communication
,”
Bell Syst. Tech. J.
0005-8580,
27
, pp.
379
423
, 623–653.
11.
Cover
,
T.
, and
Thomas
,
J.
, 1991,
Elements of Information Theory
,
Wiley
,
New York
.
12.
Parzen
,
E.
, 1962, “
On the Estimation of a Probability Density Function and Mode
,”
Ann. Math. Stat.
0003-4851,
33
(
3
), pp.
1065
1076
.
13.
Scott
,
D.
, 1992,
Multivariate Density Estimation Theory, Practice, and Visualization
,
Wiley
,
New York
.
14.
Mansuripur
,
M.
, 1986,
Introduction to Information Theory
,
Prentice Hall
,
Englewood Cliffs, NJ
.
15.
Beale
,
M.
, 2001, MATLAB Neural Network Toolbox, Rev. 1.7, Rel. 4.0.1.
16.
Kohonen
,
T.
,
Kangas
,
J.
,
Laaksonen
,
J.
, and
Torkkola
,
K.
, 1992, “
LVQ̱PAK: A Program Package for the Correct Application of Learning Vector Quantization Algorithms
,”
Proceedings of the International Joint Conference on Neural Networks
,
Baltimore, MD
, pp.
725
730
.
You do not currently have access to this content.