📢 Exclusive on Gate Square — #PROVE Creative Contest# is Now Live!
CandyDrop × Succinct (PROVE) — Trade to share 200,000 PROVE 👉 https://www.gate.com/announcements/article/46469
Futures Lucky Draw Challenge: Guaranteed 1 PROVE Airdrop per User 👉 https://www.gate.com/announcements/article/46491
🎁 Endless creativity · Rewards keep coming — Post to share 300 PROVE!
📅 Event PeriodAugust 12, 2025, 04:00 – August 17, 2025, 16:00 UTC
📌 How to Participate
1.Publish original content on Gate Square related to PROVE or the above activities (minimum 100 words; any format: analysis, tutorial, creativ
Research reveals the credibility risks of GPT models, with privacy protection and bias issues still needing to be addressed.
Research on the Trustworthiness Assessment of Large Language Models Reveals Potential Vulnerabilities
A study conducted in collaboration by institutions such as the University of Illinois at Urbana-Champaign, Stanford University, and the University of California, Berkeley, has comprehensively evaluated the credibility of the generative pre-trained transformer model (GPT). The research team developed a comprehensive assessment platform and detailed their findings in the recently published paper "Decoding Trust: A Comprehensive Assessment of the Credibility of GPT Models."
The research findings reveal some previously undisclosed vulnerabilities related to reliability. For example, the GPT model is prone to generating toxic and biased outputs, and it may also leak private information from training data and conversation history. Although GPT-4 is generally more reliable than GPT-3.5 in standard tests, it is more susceptible to attacks when faced with malicious prompts designed to circumvent security measures. This may be because GPT-4 adheres more strictly to misleading instructions.
The research team conducted a comprehensive evaluation of the GPT model from eight different perspectives, including adversarial robustness, toxicity and bias, privacy leakage, and more. For example, when assessing the robustness against text adversarial attacks, the researchers designed three scenarios: standard benchmark tests, tests under different task instructions, and self-constructed more challenging adversarial text tests.
In terms of toxicity and bias, research has found that GPT models generally exhibit little bias towards most stereotype topics. However, under misleading system prompts, the model may be induced to agree with biased content. Compared to GPT-3.5, GPT-4 is more susceptible to targeted misleading prompts. The degree of bias in the model also depends on the sensitivity of the specific groups and topics mentioned by the user.
In terms of privacy protection, research has found that GPT models may leak sensitive information from training data, such as email addresses. In some cases, leveraging supplementary knowledge can significantly improve the accuracy of information extraction. Additionally, the model may also leak private information injected into the conversation history. Overall, GPT-4 performs better than GPT-3.5 in protecting personal identity information, but both encounter issues when faced with privacy leak demonstrations.
The research team stated that this assessment work aims to encourage more researchers to participate and work together to create more robust and trustworthy models. To promote collaboration, they have made the evaluation benchmark code publicly available, which is highly scalable and user-friendly. At the same time, researchers have also shared their findings with relevant companies to take timely measures to address potential vulnerabilities.
This research provides a comprehensive perspective on the credibility assessment of GPT models, revealing the strengths and weaknesses of existing models. With the widespread application of large language models in various fields, these findings are significant for enhancing the safety and reliability of AI systems.