Albert A. (1972). Regression and the Moore-Penrose pseudoinverse.
Amari S, Kawanabe M. (1994). Estimation of network parameters in semiparametric stochastic perceptron Neural Comput. 6
Amari S, Murata N. (1993). Statistical theory of learning curves under entropic loss criterion Neural Comput. 5
Amari S, Nagaoka H. (2001). Methods of information geometry.
Baum E, Haussler D. (1989). What size net gives valid generalization Neural Comput. 1
Biehl M, Watkin T, Rau A. (1993). The statistical mechanics of learning a rule Rev Mod Phys. 65
Cox D, Barndorff-nielsen O. (1989). Asymptotic techniques for use in statistics.
Cox D, Hinkley D. (1974). Theoretical statistics.
Devroye L, Gyorfi L, Lugosi G. (1996). A probabilistic theory of pattern recognition.
Eguchi S, Copas J. (2001). Information geometry on discriminant analysis and recent development J Korean Stat Soc. 27
Gelfand I, Fomin S. (1963). Calculus of variations.
Girolami M, Vinokourov A. (2002). A probabilistic framework for the hierarchic organization and classification of document collections J Intell Inform Systems. 18
Gotoh O. (1982). An improved algorithm for matching biological sequences. Journal of molecular biology. 162 [PubMed]
Haussler D, Jaakkola T. (1999). Exploiting generative models in discriminative classifiers Advances in neural information processing systems. 11
Jagota A, Muller KR, Ratsch G, Sonnenburg S. (2002). New methods for splice site recognition Artificial neural networks--ICANN 2002.
Meila M, Jebara T, Jaakkola T. (1999). Maximum entropy discrimination Tech. Rep. No. AITR-1668.
Meyer C, Campbell S. (1979). Generalized inverse of linear transformations.
Muller KR, Tsuda K, Kawanabe M. (2004). Clustering with the Fisher score Advances in neural information processing systems. 15
Müller KR, Finke M, Murata N, Schulten K, Amari S. (1996). A numerical study on learning curves in stochastic multilayer feedforward networks. Neural computation. 8 [PubMed]
Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B. (2001). An introduction to kernel-based learning algorithms. IEEE transactions on neural networks. 12 [PubMed]
Opper M, Malzahn D. (2002). A variational approach to learning curves Advances in neural information processing systems. 14
Scholkopf B, Smola AJ. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond.
Seeger M. (2000). Learning with labeled and unlabeled data Tech Rep.
Seeger M. (2002). Covariance kernels from Bayesian generative models Advances in neural information processing systems. 14
Seung HS, Sompolinsky H, Tishby N. (1992). Statistical mechanics of learning from examples. Physical review. A, Atomic, molecular, and optical physics. 45 [PubMed]
Shawe-taylor J, Cristianini N. (2000). An introduction to support vector machines.
Smith N, Gales M. (2002). Speech recognition using SVMs Advances in neural information processing systems. 14
Sugiyama M. (2001). A theory of model selection and active learning for supervised learning Unpublished doctoral dissertation.
Tishby N, Haussler D, Seung H, Kearns M. (1996). Rigorous learning curve bounds from statistical mechanics Mach Learn. 25
Tsuda K, Kawanabe M. (2002). The leave-one-out kernel Artificial neural networks--ICANN 2002.
Tsuda K, Kawanabe M, Rätsch G, Sonnenburg S, Müller KR. (2002). A new discriminative kernel from probabilistic models. Neural computation. 14 [PubMed]
Vapnik V. (1998). Statistical Learning Theory.
Watanabe S. (2001). Algebraic analysis for nonidentifiable learning machines. Neural computation. 13 [PubMed]
Zhang T, Oles F. (2000). The value of unlabeled data for classification problems Proceedings of the Seventeenth International Conference on Machine Learning.
Zien A et al. (2000). Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics (Oxford, England). 16 [PubMed]
van_derVaart A. (1998). Asymptotic statistics.