Google AI released a new study that measures 12 distinct behavioral dispositions in LLM. The analysis shows that only 4 of 12 traits align with human expectations, revealing gaps in current alignment methods. Researchers used a benchmark of 3,000 prompts to quantify consistency. The findings suggest developers must refine reward signals to improve model safety.