Bishop C. (1995). Neural Networks For Pattern Recognition.
Chen S, Cowan CN, Grant PM. (1991). Orthogonal least squares learning algorithm for radial basis function networks. IEEE transactions on neural networks. 2 [PubMed]
Csató L, Opper M. (2002). Sparse on-line gaussian processes. Neural computation. 14 [PubMed]
Denison D, George E. (2000). Bayesian prediction using adaptive ridge estimators Tech Rep Imperial College Department of Mathematics.
Friedman JH. (1991). Multivariate Adaptive Regression Splines Ann Stat. 19
Golub GH, Wahba G, Heath M. (1979). Generalized cross-validation as a method for choosing a good ridge parameter Technometrics. 21
Herbrich R, Lawrence ND, Seeger M. (2003). Fast sparse gaussian process methods: The informative vector machine Advances in neural information processing systems. 15
Hoerl AE, Kennard RW. (1970). Ridge regression: Biased estimation for nonorthogonal problems Technometrics. 12
Orr MJL. (1995). Local smoothing of radial basis function networks Proc Intl Symp Neural Networks.
Orr MJL. (1995). Regularization in the selection of radial basis function centers Neural Comput. 7
Smola AJ, Bartlett PL. (2000). Sparse greedy gaussian process regression Advances in neural information processing systems. 13
Stone M. (1974). Cross-validatory choice and assesment of statistical predictions J Roy Statist Soc B. 36
Sundararajan S, Keerthi SS. (2001). Predictive approaches for choosing hyperparameters in gaussian processes. Neural computation. 13 [PubMed]
Tipping M. (2001). Sparse Bayesian learning and the relevance vector machine J Mach Learn Res. 1
Tipping ME, Faul A. (2003). Fast marginal likelihood maximisation for sparse Bayesian models Proc 9th Intl Workshop Artif Intell Stat.
Vapnik V, Cortes C. (1995). Support-vector networks Mach Learn. 20
Williams CKI, Lawrence ND, Seeger M. (2003). Fast forward selection to speed up sparse gaussian process regression Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics.