Analysis of 25,000 agent runs across eight domains reveals that LLM-based scientific agents often produce results without following scientific reasoning norms. The base model determines 41.4% of the agent's behavior, outweighing the impact of the agent scaffold. This suggests that current LLM-based systems lack the self-correcting epistemic rigor required for autonomous discovery.