Akaike H. (1973). Information theory and an extension of the maximum likelihood principle 2nd Intl Symp in Information Theory.
Banerjee A, Mooney R, Basu S. (2002). Semi-supervised clustering by seeding Intl. Conf. on Machine Learning.
Bar-hillel A, Hertz T, Shental N, Weinshall D. (2003). Learning distance functions using equivalence relations Proc. of 20th International Conference on Machine Learning.
Besag J. (1986). On the Statistical Analysis of Dirty Pictures J Roy Stat Soc B. 48
Demiriz A, Bennett K. (2001). Optimization approaches to semi-supervised learning Complementarity: Applications, algorithms and extensions.
Dempster AP, Laird NM, Rubin DB. (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B. 39
Hertz T, Shental N, Weinshall D, Bar-Hillel A. (2003). Learning via equivalence constraints, with applications to the enhancement of image and video retrieval Proc. of IEEE Conference on Computer Vision and Pattern Recognition.
Hertz T, Shental N, Weinshall D, Bar-Hillel A. (2003). Computing gaussian mixture models with EM using equivalence constraints Advances in neural information processing systems. 15
Hinton GE, Neal RM. (1998). A new view of the EM algorithm that justifies incremental, sparse and other variants Learning in graphical models.
Hofmann T, Buhmann JM. (1997). Pairwise data clustering by deterministic annealing IEEE Transactions On Pattern Analysis And Machine Intelligen. 19
Jain A, Law M, Topchy A. (2004). Clustering with soft and group constraints Joint IAPR International Workshop on Syntactical and Structural Pattern Recognition and Statistical Pattern Recognition.
Joachims T. (1999). Transductive inference for text classification using support vector machines Proc. of the Fourteenth Conference on Uncertainty in AI.
Jordan M, Russell S, Xing E, Ng A. (2003). Distance metric learning with application to clustering with side-information Advances in neural information processing systems. 15
Kindermann R, Snell J. (1980). Markov random fields and their applications.
Klein D, Kamvar S, Manning C. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering Proc. of 19th International Conference on Machine Learning.
Mclachlan GJ, Peel D. (2000). Finite mixture models.
Meila M, Jebara T, Jaakkola T. (1999). Maximum entropy discrimination Advances in neural information processing systems. 12
Miller D, Browning J. (2003). A mixture model and EM-based algorithm for class discovery, robust classification, and outlier rejection in mixed labeled-unlabeled data sets IEEE Trans Pattern Anal Mach Intell. 25
Miller D, Uyar H. (1997). A mixture of experts classifier with learning based on both labelled and unlabelled data Advances in neural information processing systems. 9
Mitchell T, Blum A. (1998). Combined labeled and unlabeled data with co-training Proceedings of Computational Learning Theory (COLT 98).
Mitchell T, Mccallum A, Thrun S, Nigam K. (2000). Text classification from labeled and unlabeled documents using EM Mach Learn. 39
Mooney R, Basu S, Bilenko M. (2003). Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering Proc. of ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining Systems.
Rissanen J. (1978). Modeling by shortest data description Automatica. 14
Rogers S, Wagstaff K, Cardie C, Schroedl S. (2001). Constrained k-means clustering with background knowledge Proc. of 18th International Conference on Machine Learning .
Rose K. (1998). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems Proceedings Of The IEEE. 86
Schwarz G. (1978). Estimating the dimension of a model Ann Stat. 6
Seeger M. (2000). Learning with labeled and unlabeled data Tech Rep.
Shashahani B, Landgrebe D. (1994). The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon IEEE Trans Geoscience And Remote Sensing. 32
Stork DG. (2001). Toward a computational theory of data acquisition and truthing Proceedings of Computational Learning Theory (COLT 01).
Tibshirani R, Hastie T, Friedman J. (2001). The elements of statistical learning.
Wagstaff K. (2002). Intelligent clustering with instance-level constraints Unbublished doctoral Dissertation.
Yu SX, Shi J. (2004). Segmentation given partial grouping constraints. IEEE transactions on pattern analysis and machine intelligence. 26 [PubMed]
Yuille A, Stolorz P, Utans J. (1994). Statistical physics, mixtures of distributions, and the EM algorithm Neural Comput. 6