The STHTD-MP method replaces standard feature covariance metrics with the symmetric part of the behavior-policy Bellman matrix. This change optimizes the update geometry in primal-dual saddle-point formulations. By utilizing behavior-policy transition information, the approach accelerates convergence in linear function approximation. It offers a more efficient alternative for stable off-policy prediction.