Assessing the Stability of LLM-Generated Explanations in Network Security via Repeated Sampling

Document Type

Conference Proceeding

Publication Date

6-3-2026

Abstract

Large language models (LLMs) are increasingly proposed as reasoning agents for network operations and security analysis, where human operators rely on explanations to validate and act on model outputs. However, most evaluations remain single-shot, implicitly assuming that a model produces a single stable decision and explanation per input. In practice, LLM decoding is stochastic, and identical network flows can yield different decisions and substantially different justifications across runs. We present a repeated-sampling evaluation framework that treats LLM outputs as distributions rather than deterministic responses and jointly measures (i) decision-level reliability and (ii) explanation-level semantic stability under identical evidence. Using a controlled subset of 360 flows from the UNSW-NB15 dataset, we evaluate four locally deployed open-source LLMs with 30 independent inferences per flow. Decision reliability is quantified using majority-vote accuracy, coverage, and decision entropy, while explanation stability is measured via embeddingbased centroid analysis of generated justifications. This approach enables identification of rare but extreme semantic outliers that are not apparent from aggregate metrics alone. Our results show that strong single-pass accuracy can mask meaningful differences in decision determinism and explanation stability, and that explanation instability often manifests as isolated outlier runs on specific flows rather than uniform degradation. We release a reproducible methodology for assessing explanation trustworthiness in security-critical settings where operator confidence depends on stable, evidence-grounded reasoning.

Comments

Presented as part of Conference: 2026 8th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (ICHORA).

Share

COinS