Tag: Safety

All the articles with the tag "Safety".

A Statistical Case Against Empirical Human-AI Alignment

Published: 15 May, 2025 at 11:06 AM

98.37 🤔

This position paper argues against forward empirical human-AI alignment due to statistical biases and anthropocentric limitations, advocating for prescriptive and backward alignment approaches to ensure transparency and minimize bias, supported by a case study on language model decoding strategies.
Belief Injection for Epistemic Control in Linguistic State Space

Published: 16 May, 2025 at 11:37 AM

96.91 🤔

This paper proposes belief injection as a proactive epistemic control mechanism to shape AI agents' internal linguistic belief states within the Semantic Manifold framework, offering diverse strategies for guiding reasoning and alignment, though it lacks empirical validation.
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey

Published: 8 May, 2025 at 11:09 AM

96.71 🤔

This survey paper provides a comprehensive overview of adversarial attacks on multimodal AI systems across text, image, video, and audio modalities, categorizing threats by attacker knowledge, intention, and execution to equip practitioners with knowledge of vulnerabilities and cross-modal risks.
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs

Published: 8 May, 2025 at 11:07 AM

94.57 🤔

This paper proposes a three-dimensional taxonomy and develops TTP and HarmFormer tools to filter harmful content from web-scale LLM pretraining datasets, revealing significant toxicity prevalence and persistent safety gaps through benchmarks like HAVOC.
A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem

Published: 16 May, 2025 at 11:13 AM

94.41 🤔

This paper conducts a large-scale empirical analysis of 14,904 custom GPTs in the OpenAI store, revealing over 95% lack adequate security against attacks like roleplay (96.51%) and phishing (91.22%), introduces a multi-metric popularity ranking system, and highlights the need for enhanced security in both custom and base models.

Tag: Safety

A Statistical Case Against Empirical Human-AI Alignment

Belief Injection for Epistemic Control in Linguistic State Space

Adversarial Attacks in Multimodal Systems: A Practitioner's Survey

Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs

A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem