NJIT makes entrepreneurs and scientists, but junior Nidhi Sakpal is obsessed with something else — she makes AI safer. Sakpal, an Albert Dorman Honors College member from Boonton double-majoring in applied math and computer science, explained that artificial intelligence safety encompasses the analysis, prevention and rectification of anything that causes AI systems to give users incorrect, harmful or unethical information.
The Critical Role of AI Safety
Nidhi Sakpal, an NJIT junior double-majoring in applied math and computer science, is dedicated to making AI systems safer. She defines AI safety as analyzing, preventing, and rectifying issues that cause AI to provide incorrect, harmful, or unethical information. Sakpal humorously illustrates this by noting that an AI system focused solely on efficiency might suggest ejecting a grandmother from a burning building, highlighting its lack of understanding of human constraints.
The Growing Need for User Awareness and Ethical AI
Sakpal emphasizes the critical importance for users to understand how large language models (LLMs) generate their outputs, especially as people increasingly rely on AI for emotional and personal needs. She points out that despite AI's widespread integration into various workflows and companies, there is a significant lack of focus on the safety and alignment aspects of these powerful AI models within development teams.
Nidhi's Research and Fellowship Experiences
Driven by her goal, Sakpal secured a competitive AI safety fellowship at Algoverse, a research bootcamp. Her work involved investigating how LLMs could be manipulated through excessively long prompts and demonstrating that models with larger 'context windows' don't always perform better, sometimes losing focus. Her contributions were included in a research paper, 'When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents,' presented at a major AI conference.
Continued Engagement and Practical Application
After Algoverse, Sakpal further advanced her expertise by joining Bluedot Impact, a non-profit organization, where she earned a certificate in technical AI safety and is currently conducting research. She is also preparing for an AI software engineering internship at Ariel Partners, a New York-based company specializing in AI solutions for sensitive government and healthcare sectors. Additionally, she organized a university workshop on AI Safety Fundamentals to educate her peers.
Identifying and Addressing AI Risks
Sakpal elaborates on various AI risks, including 'hallucinations' (where AI fabricates information), 'reward hacking' (AI excessively agreeing with users even when incorrect), and 'deceptive alignment' (AI models trained on biased data leading to unfair outcomes, such as in employment trends). She favors mechanistic interpretability, a method of reverse-engineering AI outputs, to understand how models arrive at their answers, believing it reveals the AI's true internal workings.
Future Aspirations and Call for Mindful AI Use
Recognizing that new AI features are developed faster than safety mechanisms, Sakpal is committed to a future career in AI safety, intending to pursue a master's or M.S./Ph.D. and join a research laboratory. She advises all AI users to be cautious about integrating AI into their daily lives and to understand how companies utilize their personal data, advocating for a mindful and informed approach to AI adoption.