Devroye L, Gyorfi L, Lugosi G. (1996). A probabilistic theory of pattern recognition.
Jacobs RA, Hinton GE, Jordan MI, Nowlan SJ. (1991). Adaptive mixtures of local experts Neural Comput. 3
Jacobs RA, Jordan MI. (1994). Hierarchical mixtures of experts and the EM algorithm Neural Comput. 6
Jacobs RA, Tanner MA, Peng F. (1996). Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition J Am Stat Assoc. 91
Jiang W, Tanner MA. (1999). On the approximation rate of hierarchical mixtures-of-experts for generalized linear models. Neural computation. 11 [PubMed]
Jiang W, Tanner MA. (1999). Hierarchical mixtures-of-experts for exponential family regression models: Approximation and maximum likelihood estimation Annals Of Statistics. 27
Jiang W, Tanner MA, Wood SA, Kohn R. (2005). Spatially adaptive nonparametric binary regression using a mixture of probits Tech Rep, Available online at: http:--newton.stats.northwestern.edu-~jiang-report-binary.probit.pdf.
Lee HK. (2000). Consistency of posterior distributions for neural networks. Neural networks : the official journal of the International Neural Network Society. 13 [PubMed]
Xu L, Jordan MI. (1995). Convergence results for the EM approach to mixtures-of-experts architectures Neural Networks. 8
Jiang W. (2006). On the consistency of Bayesian variable selection for high dimensional binary regression and classification. Neural computation. 18 [PubMed]