Amari S. (1995). Information geometry of EM and EM algorithms for neural networks Neural Netw. 8
Amari S, Nagaoka H. (2000). Methods of information geometry.
Amari SI. (1985). Differential-geometrical methods in statistics.
Barron AR. (1993). Universal approximation bounds for superposition of a sigmoidal function IEEE Trans Inform Theory. 39
Bartlett P, Freund Y, Schapire RE, Lee WS. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods Ann Stat. 26
Bishop C. (1995). Neural Networks For Pattern Recognition.
Eguchi S, Copas J. (2001). Recent developments in discriminant analysis from an information geometric point of view J Korean Statist Soc. 30
Eguchi S, Copas J. (2002). A class of logistic type discriminant functions Biometrika. 89
Eguchi S, Kano Y. (2001). Robustifying maximum likelihood estimation (Research memorandum 802).
Freund Y. (1995). Boosting a weak learning algorithm by majority Information And Computation. 12
Freund Y, Schapire R. (1996). Experiment with a new boosting algorithm Proc. of the 13th International Conference on Machine Learning.
Freund Y, Schapire R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting J Comput Sys Sci. 55
Hampel FR, Rousseeuw PJ, Ronchetti EM, Stahel WA. (1986). Robust statistics: The approach based on influence functions.
Kearns M, Valiant LG. (1988). Learning boolean formulae or finite automata is ashard as factoring Tech Rep TR-14-88 Harvard University Aiken Computation Laboratory.
Kivinen J, Warmuth MK. (1999). Boosting as entropy projection Proc 12th Ann Conf Comput Learn Theory.
Lafferty J, Lebanon G. (2001). Boosting and maximum likelihood for exponential models Tech Rep CMU-CS-01-144 School of Computer Science, Carnegie Mellon University.
Mclachlan G. (1992). Discriminant analysis and statistical pattern recognition.
Mihoko M, Eguchi S. (2002). Robust blind source separation by beta divergence. Neural computation. 14 [PubMed]
Murata N. (1996). An Integral Representation of Functions Using Three-layered Networks and Their Approximation Bounds. Neural networks : the official journal of the International Neural Network Society. 9 [PubMed]
Murata N, Yoshizawa S, Amari S. (1994). Network information criterion-determining the number of hidden units for an artificial neural network model. IEEE transactions on neural networks. 5 [PubMed]
Schapire RE. (1990). The strength of weak learnability Machine Learning. 5
Schapire RE, Singer Y, Collins M. (2000). Logistic regression, Adaboost and Bregman distances Proc 13th Ann Conf Comput Learn Theory.
Takenouchi T, Eguchi S. (2004). Robustifying AdaBoost by adding the naive error rate. Neural computation. 16 [PubMed]
Tibshirani R, Hastie T, Friedman J. (2000). Additive logistic regression: A statictical view of boosting Ann Stat. 28
Tibshirani R, Hastie T, Friedman J. (2001). The elements of statistical learning.
Vapnik V. (1995). The Nature of Statistical Learning Theory.
Watanabe O, Domingo C. (2000). MadaBoost: A modification of AdaBoost Proceedings of the 13th Conference on Computational Learning Theory.
Amari S. (2007). Integration of stochastic models by minimizing alpha-divergence. Neural computation. 19 [PubMed]
Kanamori T, Takenouchi T, Eguchi S, Murata N. (2007). Robust loss functions for boosting. Neural computation. 19 [PubMed]