The STHTD-MP method replaces standard feature covariance metrics with the symmetric part of the behavior-policy Bellman matrix. This shift improves update geometry in primal-dual saddle-point formulations. Researchers found that utilizing behavior-policy transition information stabilizes linear function approximation. The approach allows practitioners to maintain a single learning rate for both primal and auxiliary variables.