The STHTD-MP method replaces standard covariance metrics with the symmetric part of the behavior-policy Bellman matrix. This change optimizes the update geometry in primal-dual saddle-point formulations. It simplifies tuning by using a single learning rate for both primal and auxiliary variables. Researchers can now achieve faster convergence in stable off-policy prediction with linear function approximation.