Microsoft announced the OpenAI end-to-end chat reference architecture baseline. This baseline includes information related to components, processes, and security, and also provides detailed information on performance, monitoring, and deployment guidelines. In addition, Microsoft has prepared a reference implementation for deploying and running the solution.
The OpenAI end-to-end chat architecture baseline utilizes components similar to those used in baseline app services Web applications for hosting chat UIs. The architecture prioritizes components used for orchestrating chat flows, data services, and accessing large language models (LLM). It uses Azure Machine Learning to train, deploy, and manage machine learning models, Azure Storage for storing prompt word flow source files, and Azure Container Registry to manage container images. Moreover, Azure OpenAI provides access to LLMs and enterprise features. Azure AI Search supports the search feature in chat applications, implementing the RAG pattern for query extraction and retrieval.
OpenAI End-to-End Chat Architecture Baseline (Source: Microsoft Blog)
The OpenAI end-to-end chat architecture baseline prioritizes network security and identity-based access. Key features include a secure entry point for chat UI traffic, filtered network traffic, and end-to-end encryption of data in transit using TLS. Data exposure is minimized through the use of Private Link. Network resources are logically segmented and isolated, ensuring the robustness of network workflows. The architecture includes calls from the chat UI hosted by App Service being routed to online endpoints in Azure Machine Learning, and then the calls are directed to servers running deployment flows. Calls to Azure PaaS services are routed through hosted private endpoints to enhance security.
The architecture restricts access to Azure Machine Learning workspaces to private endpoints, thereby improving security. Private endpoints are consistently used, allowing the chat UI hosted in App Services to connect securely to PaaS services.
The architecture establishes security measures at both the network and identity levels. Network boundaries only allow access to the chat UI via the internet, while identity boundaries ensure requests are authenticated and authorized. Access to Azure Machine Learning workspaces is managed through default roles (such as Data Scientist and Compute Operator) as well as roles specifically for workspace keys and registry access.
Furthermore, Microsoft has also shared some recommendations and strategies for deployment. These include blue/green deployments or A/B testing, which can improve release and change assessments.
In terms of monitoring, besides Azure Machine Learning and Azure App Service, all services are set up to capture all logs. Azure Machine Learning diagnostics are configured to capture audit logs, which include all resource logs that record customer interactions with data or service settings. For Azure App Service, logging settings include AppServiceHTTPLogs, AppServiceConsoleLogs, AppServiceAppLogs, and AppServicePlatformLogs.
The Azure OpenAI service also provides content filtering to detect and prevent harmful content, including abuse monitoring to detect policy violations, although exemptions can be requested for sensitive data or legal compliance.
In a LinkedIn post, Balz Zuerrer asked if it was possible to build this solution on Azure AI Studio. Tobias Kluge replied:
As I understand it, this blueprint is for the security boundary of the entire application, including sensitive and user-related data. AI Studio can be used for testing, experiencing models, and some data, but it doesn’t specify how to build and deploy the entire application in a secure production environment.
Beyond this question, there were many positive comments below the post. Rishi Nikhilesh added:
It’s surprising that it’s built on the basis of network isolation and has disabled Azure ML workspaces on public endpoints. It’s interesting to see the application service maintain secure communication with the deployed ML tips and flow of communication.
For this kind of deployment scenario, Microsoft engineers have prepared areference implementation.
See the original text in English:
https://www.infoq.com/news/2024/02/chat-ref-arch-openai/
Disclaimer: This article is translated by InfoQ and is forbidden to be reproduced without permission.