The STHTD-MP method replaces standard feature covariance metrics with the symmetric part of the behavior-policy Bellman matrix. This shift optimizes the update geometry in primal-dual saddle-point formulations. Researchers found this approach improves stability and speed in off-policy prediction. It offers a more efficient alternative for practitioners using linear function approximation.