Tag: Safety
All the articles with the tag "Safety".
-
A Statistical Case Against Empirical Human-AI Alignment
This position paper argues against forward empirical human-AI alignment due to statistical biases and anthropocentric limitations, advocating for prescriptive and backward alignment approaches to ensure transparency and minimize bias, supported by a case study on language model decoding strategies.
-
Belief Injection for Epistemic Control in Linguistic State Space
This paper proposes belief injection as a proactive epistemic control mechanism to shape AI agents' internal linguistic belief states within the Semantic Manifold framework, offering diverse strategies for guiding reasoning and alignment, though it lacks empirical validation.
-
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
This survey paper provides a comprehensive overview of adversarial attacks on multimodal AI systems across text, image, video, and audio modalities, categorizing threats by attacker knowledge, intention, and execution to equip practitioners with knowledge of vulnerabilities and cross-modal risks.
-
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
This paper proposes a three-dimensional taxonomy and develops TTP and HarmFormer tools to filter harmful content from web-scale LLM pretraining datasets, revealing significant toxicity prevalence and persistent safety gaps through benchmarks like HAVOC.
-
A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem
This paper conducts a large-scale empirical analysis of 14,904 custom GPTs in the OpenAI store, revealing over 95% lack adequate security against attacks like roleplay (96.51%) and phishing (91.22%), introduces a multi-metric popularity ranking system, and highlights the need for enhanced security in both custom and base models.