The core mission of Aymara revolves around ensuring the safety of artificial intelligence for the benefit of humanity. With this objective in mind, Aymara aims to create and implement testing procedures that effectively evaluate bias, misinformation, and inaccuracy present in large language models (LLMs) such as ChatGPT.

The current approach of manually reviewing LLM output to identify bias, misinformation, and inaccuracy is not scalable given the increasing number of LLMs and their widespread adoption. To address this challenge, programmatically measurable tests are essential, enabling efficient and comprehensive evaluation of these three critical aspects.

Furthermore, many model developers, enterprises, and third-party auditors may lack the technical capabilities, domain expertise, and human resources necessary to develop their own tests for bias, misinformation, and inaccuracy within LLMs. As a result, a growing number of specialized companies will emerge, offering services specifically designed to assess the safety of these models.

While certain prominent LLM developers like Google may meet the requirements to create their own tests, it is likely that these tests will be proprietary and specific to individual developers. This lack of standardized tests makes comparisons across different LLMs, such as Bard and OpenAI, challenging. Additionally, the alignment of incentives between LLM developers and their customers may differ from the incentives of a third-party test developer like Aymara.

By serving as an independent third-party test developer, Aymara can provide objective assessments that prioritize customer interests and facilitate transparency. Their role is to ensure the development and deployment of robust and comparable tests for bias, misinformation, and inaccuracy in LLMs, thereby advancing the collective goal of making artificial intelligence safe for humanity.