Akaike H. (1974). A new look at the statistical model identification IEEE Trans Appl Comp. 19
Akaike H. (1980). Likelihood and the Bayes procedure Bayesian statistics.
Aronszajn N. (1950). Theory of reproducing kernels Transactions Of The American Mathematical Society. 68
Bergman S. (1970). The kernel function and conformal mapping.
Bienenstock E, Geman S, Doursat R. (1992). Neural networks and the bias-variance dilemma Neural Comput. 4
Bishop C. (1995). Neural Networks For Pattern Recognition.
Bousquet O, Elisseeff A. (2002). Stability and generalization J Mach Learn Res. 2
Cherkassky V, Shao X, Mulier FM, Vapnik VN. (1999). Model complexity control for regression using VC generalization bounds. IEEE transactions on neural networks. 10 [PubMed]
Daubechies I. (1992). Ten lectures on wavelets.
Devroye L, Gyorfi L, Lugosi G. (1996). A probabilistic theory of pattern recognition.
Donoho DL. (1995). De-noising by soft thresholding IEEE Trans Inform Theory. 41
Donoho DL, Johnstone IM. (1994). Ideal spatial adaptation via wavelet shrinkage Biometrika. 81
Felsenstein J. (1985). CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP. Evolution; international journal of organic evolution. 39 [PubMed]
Girosi F. (1998). An Equivalence Between Sparse Approximation and Support Vector Machines. Neural computation. 10 [PubMed]
Gu C, Wahba G, Heckman N. (1992). A note on generalized cross-validation with replicates Statistics And Probability Letters. 14
Henkel RE. (1979). Tests of significance.
Heskes T. (1998). Bias/Variance Decompositions for Likelihood-Based Estimators. Neural computation. 10 [PubMed]
Hoerl AE, Kennard RW. (1970). Ridge regression: Biased estimation for nonorthogonal problems Technometrics. 12
Joachims T. (1999). Making large-scale SVM learning practical Advances in kernel methods-Support vector learning.
Kitagawa G, Konishi S. (1996). Generalized information criteria in model selection Biometrika. 83
Lehmann E, Casella G. (1998). Theory Of Point Estimation.
Li K. (1986). Asymptotic optimality of CL and generalized cross-validation in ridge regression with application to spline smoothing Ann Stat. 14
Linhart H. (1988). A test whether two AIC's differ significantly South Africa Statistical Journal. 22
Luntz A, Brailovsky V. (1969). On estimation of characters obtained in statistical procedure of recognition Techicheskaya Kibernetica.
Mallows CL. (1964). Choosing variables in a linear regression Paper presented at the Central Regional Meeting of the Institute of Mathematical Statistics.
Mallows CL. (1973). Some comments on CP Technometrics. 15
Muller KR, Kawanabe M, Sugiyama M. (2003). Trading variance reduction with unbiasedness - The regularized subspace information criterion for robust model selection in kernel regression Tech. Rep. No. TR03-0003 (Available on-line: http:--www.cs.titech.ac.jp-).
Muller KR, Sugiyama M. (2002). The subspace information criterion for infinite dimensional hypothesis spaces J Mach Learn Res. 3
Muller KR, Tsuda K, Sugiyama M. (2002). Subspace information criterion for non-quadratic regularizers-Model selection for sparse regressors IEEE Trans Neural Networks. 13
Murata N. (1998). Bias of estimators and regularization terms Proceedings of 1998 Workshop on Information-Based Induction Sciences (IBIS98).
Murata N, Yoshizawa S, Amari S. (1994). Network information criterion-determining the number of hidden units for an artificial neural network model. IEEE transactions on neural networks. 5 [PubMed]
Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B. (2001). An introduction to kernel-based learning algorithms. IEEE transactions on neural networks. 12 [PubMed]
Orr MJL. (1996). Introduction to radial basis function networks Tech Rep (Available on-line: http:--www.anc.ed.ac.uk-~mjo-papers-intro.ps.gz).
Saitoh S. (1988). Theory of reproducing kernels and its applications.
Saitoh S. (1997). Integral transforms, reproducing kernels and their applications.
Scholkopf B, Smola AJ. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond.
Scholkopf B, Smola AJ, Williamson RC, Bartlett PL. (2000). New support vector algorithms Neural computation. 12 [PubMed]
Shawe-taylor J, Cristianini N. (2000). An introduction to support vector machines.
Shimodaira H. (1997). Assessing the error probability of the model selection test Anna Inst Stat Math. 49
Shimodaira H. (1998). An application of multiple comparison techniques to model selection Ann Inst Stat Math. 50
Smale S, Cucker F. (2001). On the mathematical foundations of learning Bull Amer Math Soc. 39
Smola AJ, Schölkopf B, Müller KR. (1998). The connection between regularization operators and support vector kernels. Neural networks : the official journal of the International Neural Network Society. 11 [PubMed]
Stein C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution Proceedings Of The 3rd Berkeley Symposium On Mathematical Statistics And Probability. 1
Sugiura N. (1978). Further analysis of the data by Akaike's information criterion and the finite corrections Communications In Statistics-theory And Methods. 7
Sugiyama M, Ogawa H. (2001). Subspace information criterion for model selection. Neural computation. 13 [PubMed]
Sugiyama M, Ogawa H. (2002). Optimal design of regularization term and regularization parameter by subspace information criterion. Neural networks : the official journal of the International Neural Network Society. 15 [PubMed]
Takeuchi K. (1976). Distribution of information statistics and validity criteria of models Mathematical Science. 153
Tibshirani R. (1996). Regression shrinkage and selection via the LASSO J Roy Stat Soc B. 58
Tibshirani R et al. (1996). The DELVE manual Available on-line: http:--www.cs.toronto.edu-~delve-.
Vapnik V. (1995). The Nature of Statistical Learning Theory.
Vapnik V. (1998). Statistical Learning Theory.
Vapnik V et al. (1998). Using support vector machines for time series prediction Advances in kernel methods-Support vector learning.
Vapnik VN. (1982). Estimation of dependencies based on empirical data.
Wahba G. (1985). A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem Ann Stat. 13
Wahba G. (1990). Splines models for observational data.
Wahba G, Craven P. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation Numerische Mathematik. 31
Williams CKI. (1998). Prediction with gaussian processes: From linear regression to linear prediction and beyond Learning in graphical models.
Williams CKI, Rasmussen CE. (1996). Gaussian processes for regression Advances in neural processing systems. 8