function [err1, err2] = llGrad(p1,p2,type,tolGrad,tolEval)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% [err1,err2] = llGrad(p1, p2, type [, tolGrad ,tolEval])
%
%    Compute the gradient with respect to the means of p1,p2 of the
%       average log-likelihood of p1 evaluated at the locations of p2
%       (p1=p2 => gradient of a resubstitution estimate of negentropy)
%    type: what to take gradient with respect to.
%          0 = means, 1 = bw (variance), 2=weights
%    tolGrad is an acceptable error tolerance on the gradient values
%    tolEval is a *percent* tolerance on the evaluation done as a
%       subroutine in the calculation; it should be tight enough to allow
%       tolGrad to be achieved but beyond that, larger => faster.
%
% Specifically, using the resubstition estimate
%    \hat H = \sum_j v_j \log \sum_i w_i K(x_i - y_j)
% The outputs are
%    err1(:,i) = \sum_j v_j 1/p(y_j) dp(y_j)/dx_i = E_y[ 1/p dp/dx_i ] 
%    err2(:,j) = v_j 1/p(y_j) p'(y_j) 
%
% Note: [err1,err2] = llGrad(p1, type [, tolGrad ,tolEval]) computes the leave-
%   one-out resub. estimate, similar to llGrad(p1,p1,type...)
%
% Some useful estimates which can be made using this function:
%
% ENTROPY GRADIENT
%    [err1, err2] = llGrad(p,p,1e-3,1e-3);
%    p -= (err1 + err2)    moves p in the direction of increasing entropy
%
% KL-DIVERGENCE GRADIENT
%    [errXX1, errXX2] = llGrad(p1,p1,1e-3,1e-3);
%    [errXYY, errXYX] = llGrad(p2,p1,1e-3,1e-3); 
%    err1 = (errXX1 + errXX2 - errXYX);
%    err2 = (-errXYY);
%    p1 += err1, p2 += err2  moves p1,p2 in the direction of increasing KLDiv
%
% See also:  klGrad, miGrad, entropyGrad, adjustPoints
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Copyright (C) 2003 Alexander Ihler; distributable under GPL -- see README.txt

%#mex
error('MEX-file kde/llGrad not found -- please recompile if necessary');

% Incorrect:  Mean shift corresponds to an estimate of 1/p(y) p'(y) where
%    p'(y) is computed using the Epan. kernel while p(y) is computed using
%    a uniform kernel with the same support.  This does *not* correspond to
%    the derivative of log-likelihood for any KDE (that I know of)
%
% MEAN-SHIFT
%    For equal weight uniform bandwidth Epan. kernel densities, 
%      the mean shift algorithm (Comaniciu '99) is given by
%    [err1, err2] = llGrad(p,p,1e-3,1e-3);
%    p += .5*getNpts(p)*getBW(p,1)^2 * err2
%