Diff Insight Report - openai

最終更新日: 2025-04-29

利用上の注意

このポストは Microsoft 社の Azure 公式ドキュメント(CC BY 4.0 または MIT ライセンス) をもとに生成AIを用いて翻案・要約した派生作品です。 元の文書は MicrosoftDocs/azure-ai-docs にホストされています。

生成AIの性能には限界があり、誤訳や誤解釈が含まれる可能性があります。 本ポストはあくまで参考情報として用い、正確な情報は必ず元の文書を参照してください。

このポストで使用されている商標はそれぞれの所有者に帰属します。これらの商標は技術的な説明のために使用されており、商標権者からの公式な承認や推奨を示すものではありません。

View Diff on GitHub


# Highlights
このコードの差分は、OpenAIサービスにおけるさまざまなモデルやAPIのドキュメントに関するいくつかの更新と新しい機能の追加を示しており、具体的にはモデル情報のアップデート、新しい記事の追加、レスポンス形式の更新、およびAPIバージョンの改訂を含みます。

New features

  • 新しいファイルrealtime-audio-webrtc.mdrealtime-audio-websockets.mdの追加により、GPT-4oリアルタイムAPIをWebRTCおよびWebSocketで使用する方法に関する詳細ガイドが提供。
  • 新しい画像ephemeral-key-webrtc.pngが追加され、エフェメラルキーの使用方法に関する視覚的説明が提供。

Breaking changes

  • realtime-javascript.mdからのJavaScriptウェブサンプルコードの詳細情報が削除され、関連情報を簡略化。
  • realtime-audio-quickstart.mdで古いバージョンのAPIの使用が廃止され、新しいバージョンの使用が指示。

Other updates

  • GPT-4oオーディオモデルに関するモデル情報がマイナーチェンジ。
  • DALL-Eに関するレスポンス形式が具体的に更新。
  • 複数のAPIドキュメントでAPIバージョンの更新。
  • toc.ymlの目次にオーディオ関連の項目が追加され、より広範囲な文書へのアクセスが容易になった。

Insights

差分を見ると、OpenAIのドキュメントは常に最新かつ有用な情報を提供するために継続的に更新されていることがわかります。特に、リアルタイム音声APIに関する新しいガイドラインや、DALL-Eのレスポンス形式に関する詳細の更新は、開発者のためのガイドとして非常に重要です。

WebRTCとWebSocketを用いたリアルタイムインタラクションのガイドが追加されたことで、開発者はこれらのプロトコルをAzureで実装する手順について具体的に学ぶことができ、低遅延コミュニケーションの実装がスムーズになります。また、APIバージョンの更新や削除されたコード部分については、ドキュメントをより簡潔にしつつ、必要な情報を提供するための調整が行われています。これにより、ユーザーは最新の機能と改善を迅速に利用できるようになります。

さらに、オーディオ関連ドキュメントの目次に追加された項目により、ユーザーは関連する情報をより素早く見つけることができ、開発に集中できるようになるでしょう。全体として、これらの更新はOpenAIサービスを使用する際のユーザー体験を向上させる重要な要素です。

Summary Table

Filename Type Title Status A D M
models.md minor update モデル情報の更新 modified 1 1 2
dall-e.md minor update DALL-Eのレスポンス形式に関する情報の更新 modified 1 5 6
realtime-audio-webrtc.md new feature GPT-4oリアルタイムAPIのWebRTC使用方法に関する新規記事 added 328 0 328
realtime-audio-websockets.md new feature GPT-4oリアルタイムAPIのWebSocket使用方法に関する新規記事 added 120 0 120
realtime-audio.md minor update GPT-4oリアルタイムAPIの音声とオーディオに関する記事の修正 modified 11 75 86
latest-inference-preview.md minor update DALL-E 3の生成画像のレスポンス形式に関する修正 modified 5 5 10
global-batch.md minor update グローバルバッチモデルマトリックスにおけるスペイン中央の備考欄の削除 modified 0 1 1
realtime-javascript.md minor update APIバージョンの更新とサンプルコード削除 modified 2 44 46
realtime-python.md minor update APIバージョンの更新 modified 2 2 4
realtime-typescript.md minor update APIバージョンの更新 modified 2 2 4
ephemeral-key-webrtc.png new feature 新しい画像の追加: ephemeral-key-webrtc.png added 0 0 0
realtime-audio-quickstart.md minor update Realtime APIのバージョン情報の更新 modified 1 2 3
realtime-audio-reference.md minor update Realtime APIリファレンスのタイトル変更と更新 modified 2 28 30
toc.yml minor update 目次にオーディオ関連の項目を追加 modified 9 3 12

Modified Contents

articles/ai-services/openai/concepts/models.md

Diff
@@ -258,7 +258,7 @@ Details about maximum request tokens and training data are available in the foll
 |`gpt-4o-mini-realtime-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
 |`gpt-4o-audio-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for audio and text generation. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
 |`gpt-4o-realtime-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
-|`gpt-4o-realtime-preview` (2024-10-01) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
+|`gpt-4o-mini-realtime-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
 
 To compare the availability of GPT-4o audio models across all regions, see the [models table](#global-standard-model-availability).
 

Summary

{
    "modification_type": "minor update",
    "modification_title": "モデル情報の更新"
}

Explanation

この変更では、articles/ai-services/openai/concepts/models.mdファイル内のGPT-4oオーディオモデルに関する情報が更新されました。具体的には、テーブルから古いエントリであるgpt-4o-realtime-preview(2024-10-01)の行が削除され、代わりに新しいエントリであるgpt-4o-mini-realtime-preview(2024-12-17)が追加されました。このアップデートは、リアルタイムオーディオ処理に関する最新のモデルを反映するためのものであり、変更前後のトークンに関する詳細情報はそのままにされています。この修正により、モデルの最新情報が正確に提供され、利用者が最新のサービスと機能を理解しやすくなります。

articles/ai-services/openai/how-to/dall-e.md

Diff
@@ -180,10 +180,6 @@ The default value is `high`.
 
 You can generate between one and 10 images in a single API call. The default value is `1`.
 
-#### Response format
-
-The format in which the generated images are returned. Must be either `url` (a URL pointing to the image) or `b64_json` (the base 64-byte code in JSON format). The default is `url`.
-
 #### User ID
 
 Use the *user* parameter to specify a unique identifier for the user making the request. This is useful for tracking and monitoring usage patterns. The value can be any string, such as a user ID or email address.
@@ -236,7 +232,7 @@ With DALL-E 3, you can't generate more than one image in a single API call: the
 
 #### Response format
 
-The format in which the generated images are returned. Must be one of `url` (a URL pointing to the image) or `b64_json` (the base 64-byte code in JSON format). The default is `url`.
+The format in which dall-e-3 generated images are returned. Must be one of `url` or `b64_json`. This parameter isn't supported for gpt-image-1 which will always return base64-encoded images.
 
 ---
 

Summary

{
    "modification_type": "minor update",
    "modification_title": "DALL-Eのレスポンス形式に関する情報の更新"
}

Explanation

この変更では、articles/ai-services/openai/how-to/dall-e.mdファイル内のDALL-Eに関するレスポンス形式の説明が更新されました。具体的には、レスポンス形式のセクションが削除され、新たにDALL-E 3のレスポンス形式に関する情報が正確に提供されるように修正されました。変更前の情報は、画像生成のレスポンス形式がurlまたはb64_jsonであると説明していましたが、更新後はDALL-E 3の生成された画像のレスポンス形式について、urlまたはb64_jsonでなければならないことが明確に示され、さらにgpt-image-1のレスポンス形式については常にbase64エンコードされた画像が返されることが説明されています。この更新により、ユーザーはDALL-EのAPIの使い方についてより明確な理解が得られるようになります。

articles/ai-services/openai/how-to/realtime-audio-webrtc.md

Diff
@@ -0,0 +1,328 @@
+---
+title: 'How to use the GPT-4o Realtime API via WebRTC (Preview)'
+titleSuffix: Azure OpenAI Service
+description: Learn how to use the GPT-4o Realtime API for speech and audio via WebRTC.
+manager: nitinme
+ms.service: azure-ai-openai
+ms.topic: how-to
+ms.date: 4/28/2025
+author: eric-urban
+ms.author: eur
+ms.custom: references_regions
+recommendations: false
+---
+
+# How to use the GPT-4o Realtime API via WebRTC (Preview)
+
+[!INCLUDE [Feature preview](../includes/preview-feature.md)]
+
+Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. 
+
+You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time. Follow the instructions in this article to get started with the Realtime API via WebRTC.
+
+In most cases, we recommend using the WebRTC API for real-time audio streaming. The WebRTC API is a web standard that enables real-time communication (RTC) between browsers and mobile applications. Here are some reasons why WebRTC is preferred for real-time audio streaming:
+- **Lower Latency**: WebRTC is designed to minimize delay, making it more suitable for audio and video communication where low latency is critical for maintaining quality and synchronization.
+- **Media Handling**: WebRTC has built-in support for audio and video codecs, providing optimized handling of media streams.
+- **Error Correction**: WebRTC includes mechanisms for handling packet loss and jitter, which are essential for maintaining the quality of audio streams over unpredictable networks.
+- **Peer-to-Peer Communication**: WebRTC allows direct communication between clients, reducing the need for a central server to relay audio data, which can further reduce latency.
+
+Use the [Realtime API via WebSockets](./realtime-audio-websockets.md) if you need to stream audio data from a server to a client, or if you need to send and receive data in real time between a client and server. WebSockets aren't recommended for real-time audio streaming because they have higher latency than WebRTC.
+
+## Supported models
+
+The GPT 4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
+- `gpt-4o-mini-realtime-preview` (2024-12-17)
+- `gpt-4o-realtime-preview` (2024-12-17)
+
+You should use API version `2025-04-01-preview` in the URL for the Realtime API. The API version is included in the sessions URL.
+
+For more information about supported models, see the [models and versions documentation](../concepts/models.md#audio-models).
+
+## Prerequisites
+
+Before you can use GPT-4o real-time audio, you need:
+
+- An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>.
+- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](create-resource.md).
+- You need a deployment of the `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model in a supported region as described in the [supported models](#supported-models) section. You can deploy the model from the [Azure AI Foundry model catalog](../../../ai-foundry/how-to/model-catalog-overview.md) or from your project in Azure AI Foundry portal. 
+
+## Connection and authentication
+
+You use different URLs to get an ephemeral API key and connect to the Realtime API via WebRTC. The URLs are constructed as follows:
+
+| URL | Description | 
+|---|---|
+| Sessions URL | The `/realtime/sessions` URL is used to get an ephemeral API key. The sessions URL includes the Azure OpenAI resource URL, deployment name, the `/realtime/sessions` path, and the API version.<br/><br/>You should use API version `2025-04-01-preview` in the URL.<br/><br/>For an example and more information, see the [Sessions URL](#sessions-url) section below.|
+| WebRTC URL | The WebRTC URL is used to establish a WebRTC peer connection with the Realtime API. The WebRTC URL includes the region and the `realtimeapi-preview.ai.azure.com/v1/realtimertc` path.<br/><br/>The supported regions are `eastus2` and `swedencentral`.<br/><br/>For an example and more information, see the [Sessions URL](#webrtc-url) section below.|
+
+### Sessions URL
+Here's an example of a well-constructed `realtime/sessions` URL that you use to get an ephemeral API key:
+
+```http
+https://YourAzureOpenAIResourceName.openai.azure.com/openai/realtimeapi/sessions?api-version=2025-04-01-preview
+```
+### WebRTC URL
+Make sure the region of the WebRTC URL matches the region of your Azure OpenAI resource.
+
+For example:
+- If your Azure OpenAI resource is in the swedencentral region,
+the WebRTC URL should be:
+    ```http
+    https://swedencentral.realtimeapi-preview.ai.azure.com/v1/realtimertc
+    ```
+- If your Azure OpenAI resource is in the eastus2 region, the WebRTC URL should be:
+    ```http
+    https://eastus2.realtimeapi-preview.ai.azure.com/v1/realtimertc
+    ```
+
+The sessions URL includes the Azure OpenAI resource URL, deployment name, the `/realtime/sessions` path, and the API version. The Azure OpenAI resource region isn't part of the sessions URL.
+
+### Ephemeral API key
+
+You can use the ephemeral API key to authenticate a WebRTC session with the Realtime API. The ephemeral key is valid for one minute and is used to establish a secure WebRTC connection between the client and the Realtime API.
+
+Here's how the ephemeral API key is used in the Realtime API:
+
+1. Your client requests an ephemeral API key from your server.
+1. Your server mints the ephemeral API key using the standard API key. 
+    
+    > [!WARNING]
+    > Never use the standard API key in a client application. The standard API key should only be used in a secure backend service.
+
+1. Your server returns the ephemeral API key to your client.
+1. Your client uses the ephemeral API key to authenticate a session with the Realtime API via WebRTC.
+1. You send and receive audio data in real time using the WebRTC peer connection.
+
+The following sequence diagram illustrates the process of minting an ephemeral API key and using it to authenticate a WebRTC session with the Realtime API. 
+
+:::image type="content" source="../media/how-to/real-time/ephemeral-key-webrtc.png" alt-text="Diagram of the ephemeral API key to WebRTC peer connection sequence." lightbox="../media/how-to/real-time/ephemeral-key-webrtc.png":::
+
+<!--
+sequenceDiagram
+  participant Your client
+  participant Your server
+  participant /realtime/sessions
+  participant /realtime via WebRTC
+
+  Your client->>Your server: Request to mint an ephemeral API key
+  Your server->>/realtime/sessions: Request ephemeral key using standard API key
+  /realtime/sessions->>Your server: Return ephemeral key (expires in 1 minute)
+  Your server->>Your client: Return ephemeral key
+  Your client->>/realtime via WebRTC: Authenticate session using ephemeral key (WebRTC peer connection) 
+-->
+
+## WebRTC example via HTML and JavaScript
+
+The following code sample demonstrates how to use the GPT-4o Realtime API via WebRTC. The sample uses the [WebRTC API](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API) to establish a real-time audio connection with the model.
+
+The sample code is an HTML page that allows you to start a session with the GPT-4o Realtime API and send audio input to the model. The model's responses are played back in real-time.
+
+> [!WARNING]
+> The sample code includes the API key hardcoded in the JavaScript. This code isn't recommended for production use. In a production environment, you should use a secure backend service to generate an ephemeral key and return it to the client.
+
+1. Copy the following code into an HTML file and open it in a web browser:
+
+    ```html
+    <!DOCTYPE html>
+    <html lang="en">
+    <head>
+        <meta charset="UTF-8">
+        <meta name="viewport" content="width=device-width, initial-scale=1.0">
+        <title>Azure OpenAI Realtime Session</title>
+    </head>
+    <body>
+        <h1>Azure OpenAI Realtime Session</h1>
+        <p>WARNING: Don't use this code sample in production with the API key hardcoded. Use a protected backend service to call the sessions API and generate the ephemeral key. Then return the ephemeral key to the client.</p>
+        <button onclick="StartSession()">Start Session</button>
+    
+        <!-- Log container for API messages -->
+        <div id="logContainer"></div> 
+    
+        <script>
+            
+            // Make sure the WebRTC URL region matches the region of your Azure OpenAI resource.
+            // For example, if your Azure OpenAI resource is in the swedencentral region,
+            // the WebRTC URL should be https://swedencentral.realtimeapi-preview.ai.azure.com/v1/realtimertc.
+            // If your Azure OpenAI resource is in the eastus2 region, the WebRTC URL should be https://eastus2.realtimeapi-preview.ai.azure.com/v1/realtimertc.
+            const WEBRTC_URL= "https://swedencentral.realtimeapi-preview.ai.azure.com/v1/realtimertc"
+    		
+            // The SESSIONS_URL includes the Azure OpenAI resource URL,
+            // deployment name, the /realtime/sessions path, and the API version.
+            // The Azure OpenAI resource region isn't part of the SESSIONS_URL.
+            const SESSIONS_URL="https://YourAzureOpenAIResourceName.openai.azure.com/openai/realtimeapi/sessions?api-version=2025-04-01-preview"
+    		
+            // The API key of the Azure OpenAI resource.
+            const API_KEY = "YOUR_API_KEY_HERE"; 
+    		
+            // The deployment name might not be the same as the model name.
+            const DEPLOYMENT = "gpt-4o-mini-realtime-preview"
+    		    const VOICE = "verse"
+    
+            async function StartSession() {
+                try {
+    
+                    // WARNING: Don't use this code sample in production
+                    // with the API key hardcoded. 
+                    // Use a protected backend service to call the 
+                    // sessions API and generate the ephemeral key.
+                    // Then return the ephemeral key to the client.
+                    
+                    const response = await fetch(SESSIONS_URL, {
+                        method: "POST",
+                        headers: {
+                            // The Authorization header is commented out because
+                            // currently it isn't supported with the sessions API. 
+                            //"Authorization": `Bearer ${ACCESS_TOKEN}`,
+                            "api-key": API_KEY,
+                            "Content-Type": "application/json"
+                        },
+                        body: JSON.stringify({
+                            model: DEPLOYMENT,
+                            voice: VOICE
+                        })
+                    });
+    
+                    if (!response.ok) {
+                        throw new Error(`API request failed`);
+                    }
+    
+                    const data = await response.json();
+    				
+            				const sessionId = data.id;
+            				const ephemeralKey = data.client_secret?.value; 
+            				console.error("Ephemeral key:", ephemeralKey);
+    				
+                    // Mask the ephemeral key in the log message.
+                    logMessage("Ephemeral Key Received: " + "***");
+    		            logMessage("WebRTC Session Id = " + sessionId );
+                    
+                    // Set up the WebRTC connection using the ephemeral key.
+                    init(ephemeralKey); 
+    
+                } catch (error) {
+                    console.error("Error fetching ephemeral key:", error);
+                    logMessage("Error fetching ephemeral key: " + error.message);
+                }
+            }
+    
+            async function init(ephemeralKey) {
+    
+                let peerConnection = new RTCPeerConnection();
+    
+                // Set up to play remote audio from the model.
+                const audioElement = document.createElement('audio');
+                audioElement.autoplay = true;
+                document.body.appendChild(audioElement);
+    
+                peerConnection.ontrack = (event) => {
+                    audioElement.srcObject = event.streams[0];
+                };
+    
+                // Set up data channel for sending and receiving events
+                const clientMedia = await navigator.mediaDevices.getUserMedia({ audio: true });
+                const audioTrack = clientMedia.getAudioTracks()[0];
+                peerConnection.addTrack(audioTrack);
+    
+                const dataChannel = peerConnection.createDataChannel('realtime-channel');
+    
+                dataChannel.addEventListener('open', () => {
+                    logMessage('Data channel is open');
+                    updateSession(dataChannel);
+                });
+                
+                dataChannel.addEventListener('message', (event) => {
+                    const realtimeEvent = JSON.parse(event.data); 
+                    console.log(realtimeEvent); 
+                    logMessage("Received server event: " + JSON.stringify(realtimeEvent, null, 2));
+                    if (realtimeEvent.type === "session.update") {
+                        const instructions = realtimeEvent.session.instructions;
+                        logMessage("Instructions: " + instructions);
+                    } else if (realtimeEvent.type === "session.error") {
+                        logMessage("Error: " + realtimeEvent.error.message);
+                    } else if (realtimeEvent.type === "session.end") {
+                        logMessage("Session ended.");
+                    }
+                });
+    
+                dataChannel.addEventListener('close', () => {
+                    logMessage('Data channel is closed');
+                });
+    
+    	          // Start the session using the Session Description Protocol (SDP)
+                const offer = await peerConnection.createOffer();
+                await peerConnection.setLocalDescription(offer);
+    
+                const sdpResponse = await fetch(`${WEBRTC_URL}?model=${DEPLOYMENT}`, {
+                    method: "POST",
+                    body: offer.sdp,
+                    headers: {
+                        Authorization: `Bearer ${ephemeralKey}`,
+                        "Content-Type": "application/sdp",
+                    },
+                });
+    
+                const answer = { type: "answer", sdp: await sdpResponse.text() };
+                await peerConnection.setRemoteDescription(answer);
+    
+                const button = document.createElement('button');
+                button.innerText = 'Close Session';
+                button.onclick = stopSession;
+                document.body.appendChild(button);
+    
+                // Send a client event to update the session
+                function updateSession(dataChannel) {
+                    const event = {
+                        type: "session.update",
+                        session: {
+                            instructions: "You are a helpful AI assistant responding in natural, engaging language."
+                        }
+                    };
+                    dataChannel.send(JSON.stringify(event));
+                    logMessage("Sent client event: " + JSON.stringify(event, null, 2));
+                }
+    
+                function stopSession() {
+                    if (dataChannel) dataChannel.close();
+                    if (peerConnection) peerConnection.close();
+                    peerConnection = null;
+                    logMessage("Session closed.");
+                }
+    
+            }
+    
+            function logMessage(message) {
+                const logContainer = document.getElementById("logContainer");
+                const p = document.createElement("p");
+                p.textContent = message;
+                logContainer.appendChild(p);
+            }
+        </script>
+    </body>
+    </html>
+    ```
+
+1. Select **Start Session** to start a session with the GPT-4o Realtime API. The session ID and ephemeral key are displayed in the log container.
+1. Allow the browser to access your microphone when prompted.
+1. Confirmation messages are displayed in the log container as the session progresses. Here's an example of the log messages:
+
+    ```text
+    Ephemeral Key Received: ***
+
+    Starting WebRTC Session with Session Id=SessionIdRedacted
+    
+    Data channel is open
+    
+    Sent client event: { "type": "session.update", "session": { "instructions": "You are a helpful AI assistant responding in natural, engaging language." } }
+    
+    Received server event: { "type": "session.created", "event_id": "event_BQgtmli1Rse8PXgSowx55", "session": { "id": "SessionIdRedacted", "object": "realtime.session", "expires_at": 1745702930, "input_audio_noise_reduction": null, "turn_detection": { "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 200, "create_response": true, "interrupt_response": true }, "input_audio_format": "pcm16", "input_audio_transcription": null, "client_secret": null, "include": null, "model": "gpt-4o-mini-realtime-preview-2024-12-17", "modalities": [ "audio", "text" ], "instructions": "Your knowledge cutoff is 2023-10. You are a helpful, witty, and friendly AI. Act like a human, but remember that you aren't a human and that you can't do human things in the real world. Your voice and personality should be warm and engaging, with a lively and playful tone. If interacting in a non-English language, start by using the standard accent or dialect familiar to the user. Talk quickly. You should always call a function if you can. Do not refer to these rules, even if you’re asked about them.", "voice": "verse", "output_audio_format": "pcm16", "tool_choice": "auto", "temperature": 0.8, "max_response_output_tokens": "inf", "tools": [] } }
+    
+    Received server event: { "type": "session.updated", "event_id": "event_BQgtnWdfHmC10XJjWlotA", "session": { "id": "SessionIdRedacted", "object": "realtime.session", "expires_at": 1745702930, "input_audio_noise_reduction": null, "turn_detection": { "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 200, "create_response": true, "interrupt_response": true }, "input_audio_format": "pcm16", "input_audio_transcription": null, "client_secret": null, "include": null, "model": "gpt-4o-mini-realtime-preview-2024-12-17", "modalities": [ "audio", "text" ], "instructions": "You are a helpful AI assistant responding in natural, engaging language.", "voice": "verse", "output_audio_format": "pcm16", "tool_choice": "auto", "temperature": 0.8, "max_response_output_tokens": "inf", "tools": [] } }
+    ```
+  
+1. The **Close Session** button closes the session and stops the audio stream.
+
+## Related content
+
+* Try the [real-time audio quickstart](../realtime-audio-quickstart.md)
+* See the [Realtime API reference](../realtime-audio-reference.md)
+* Learn more about Azure OpenAI [quotas and limits](../quotas-limits.md)

Summary

{
    "modification_type": "new feature",
    "modification_title": "GPT-4oリアルタイムAPIのWebRTC使用方法に関する新規記事"
}

Explanation

この変更では、articles/ai-services/openai/how-to/realtime-audio-webrtc.mdという新しいファイルが追加され、GPT-4oリアルタイムAPIをWebRTCを使用して利用する方法に関する詳細なガイドが提供されています。新しい記事では、低遅延での音声やオーディオによる会話をサポートするGPT-4oモデルを活用する方法を説明しており、WebRTCを通じて音声入力をモデルに送信し、リアルタイムで音声応答を受け取る手順が記載されています。

記事には、WebRTCを使用する理由、サポートされているモデル、接続と認証の方法、エピヘメラルAPIキーの使用法、さらに具体的なHTMLおよびJavaScriptのコードサンプルが含まれています。これにより、開発者はリアルタイム音声ストリーミングを簡単に実装することができるようになります。また、WebSocketとの違いにも触れており、WebRTCがリアルタイム音声ストリーミングにおいて低遅延である理由が説明されています。この新しいコンテンツは、Azure OpenAIサービスを利用する開発者にとって非常に役立つ情報です。

articles/ai-services/openai/how-to/realtime-audio-websockets.md

Diff
@@ -0,0 +1,120 @@
+---
+title: 'How to use the GPT-4o Realtime API via WebSockets (Preview)'
+titleSuffix: Azure OpenAI Service
+description: Learn how to use the GPT-4o Realtime API for speech and audio via WebSockets.
+manager: nitinme
+ms.service: azure-ai-openai
+ms.topic: how-to
+ms.date: 4/28/2025
+author: eric-urban
+ms.author: eur
+ms.custom: references_regions
+recommendations: false
+---
+
+# How to use the GPT-4o Realtime API via WebSockets (Preview)
+
+[!INCLUDE [Feature preview](../includes/preview-feature.md)]
+
+Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. 
+
+You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time. Follow the instructions in this article to get started with the Realtime API via WebSockets.
+
+Use the Realtime API via WebSockets in server-to-server scenarios where low latency isn't a requirement.
+
+> [!TIP] 
+> In most cases, we recommend using the [Realtime API via WebRTC](./realtime-audio-webrtc.md) for real-time audio streaming in client-side applications such as a web application or mobile app. WebRTC is designed for low-latency, real-time audio streaming and is the best choice for most use cases.
+
+## Supported models
+
+The GPT-4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
+- `gpt-4o-mini-realtime-preview` (2024-12-17)
+- `gpt-4o-realtime-preview` (2024-12-17)
+
+You should use API version `2025-04-01-preview` in the URL for the Realtime API. 
+
+For more information about supported models, see the [models and versions documentation](../concepts/models.md#audio-models).
+
+## Prerequisites
+
+Before you can use GPT-4o real-time audio, you need:
+
+- An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>.
+- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](create-resource.md).
+- You need a deployment of the `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model in a supported region as described in the [supported models](#supported-models) section. You can deploy the model from the [Azure AI Foundry portal model catalog](../../../ai-foundry/how-to/model-catalog-overview.md) or from your project in Azure AI Foundry portal. 
+
+## Connection and authentication
+
+The Realtime API (via `/realtime`) is built on [the WebSockets API](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) to facilitate fully asynchronous streaming communication between the end user and model. 
+
+The Realtime API is accessed via a secure WebSocket connection to the `/realtime` endpoint of your Azure OpenAI resource.
+
+You can construct a full request URI by concatenating:
+
+- The secure WebSocket (`wss://`) protocol.
+- Your Azure OpenAI resource endpoint hostname, for example, `my-aoai-resource.openai.azure.com`
+- The `openai/realtime` API path.
+- An `api-version` query string parameter for a supported API version such as `2024-12-17`
+- A `deployment` query string parameter with the name of your `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model deployment.
+
+The following example is a well-constructed `/realtime` request URI:
+
+```http
+wss://my-eastus2-openai-resource.openai.azure.com/openai/realtime?api-version=2024-12-17&deployment=gpt-4o-mini-realtime-preview-deployment-name
+```
+
+To authenticate:
+- **Microsoft Entra** (recommended): Use token-based authentication with the `/realtime` API for an Azure OpenAI resource with managed identity enabled. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
+- **API key**: An `api-key` can be provided in one of two ways:
+  - Using an `api-key` connection header on the prehandshake connection. This option isn't available in a browser environment.
+  - Using an `api-key` query string parameter on the request URI. Query string parameters are encrypted when using https/wss.
+
+## Realtime API via WebSockets architecture
+
+Once the WebSocket connection session to `/realtime` is established and authenticated, the functional interaction takes place via events for sending and receiving WebSocket messages. These events each take the form of a JSON object. 
+
+:::image type="content" source="../media/how-to/real-time/realtime-api-sequence.png" alt-text="Diagram of the Realtime API authentication and connection sequence." lightbox="../media/how-to/real-time/realtime-api-sequence.png":::
+
+<!--
+sequenceDiagram
+  actor User as End User
+  participant MiddleTier as /realtime host
+  participant AOAI as Azure OpenAI
+  User->>MiddleTier: Begin interaction
+  MiddleTier->>MiddleTier: Authenticate/Validate User
+  MiddleTier--)User: audio information
+  User--)MiddleTier: 
+  MiddleTier--)User: text information
+  User--)MiddleTier: 
+  MiddleTier--)User: control information
+  User--)MiddleTier: 
+  MiddleTier->>AOAI: connect to /realtime
+  MiddleTier->>AOAI: configure session
+  AOAI->>MiddleTier: session start
+  MiddleTier--)AOAI: send/receive WS commands
+  AOAI--)MiddleTier: 
+  AOAI--)MiddleTier: create/start conversation responses
+  AOAI--)MiddleTier: (within responses) create/start/add/finish items
+  AOAI--)MiddleTier: (within items) create/stream/finish content parts
+-->
+
+Events can be sent and received in parallel and applications should generally handle them both concurrently and asynchronously.
+
+- A client-side caller establishes a connection to `/realtime`, which starts a new [`session`](../realtime-audio-reference.md#realtimerequestsession).
+- A `session` automatically creates a default `conversation`. Multiple concurrent conversations aren't supported.
+- The `conversation` accumulates input signals until a `response` is started, either via a direct event by the caller or automatically by voice activity detection (VAD).
+- Each `response` consists of one or more `items`, which can encapsulate messages, function calls, and other information.
+- Each message `item` has `content_part`, allowing multiple modalities (text and audio) to be represented across a single item.
+- The `session` manages configuration of caller input handling (for example, user audio) and common output generation handling.
+- Each caller-initiated [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) can override some of the output [`response`](../realtime-audio-reference.md#realtimeresponse) behavior, if desired.
+- Server-created `item` and the `content_part` in messages can be populated asynchronously and in parallel. For example, receiving audio, text, and function information concurrently in a round robin fashion.
+
+## Try the quickstart
+
+Now that you have the prerequisites, you can follow the instructions in the [Realtime API quickstart](../realtime-audio-quickstart.md) to get started with the Realtime API via WebSockets.
+
+## Related content
+
+* Try the [real-time audio quickstart](../realtime-audio-quickstart.md)
+* See the [Realtime API reference](../realtime-audio-reference.md)
+* Learn more about Azure OpenAI [quotas and limits](../quotas-limits.md)

Summary

{
    "modification_type": "new feature",
    "modification_title": "GPT-4oリアルタイムAPIのWebSocket使用方法に関する新規記事"
}

Explanation

この変更では、articles/ai-services/openai/how-to/realtime-audio-websockets.mdという新しいファイルが追加され、GPT-4oリアルタイムAPIをWebSocketを使用して利用する方法に関する包括的なガイドが提供されています。この記事では、低遅延の音声およびオーディオ用のGPT-4oモデルファミリーについて説明し、WebRTCとWebSocketの両方を利用して音声入力をモデルに送信し、リアルタイムで応答を受け取る方法を解説しています。

特に、WebSocketを使用する場合は、サーバー間のシナリオに適しており、低遅延を必要としない状況での使用が推奨されています。また、Azure OpenAIリソースのセキュアなWebSocket接続を通じてアクセスする方法や、接続および認証の詳細、リアルタイムAPIのアーキテクチャについても説明されています。

この新しく追加された記事により、ユーザーはGPT-4oリアルタイムAPIのWebSocketを通じて音声およびオーディオデータを効率的に管理し、リアルタイムでのインタラクティブなアプリケーションを構築するための手順を理解できるようになります。さらに、関連コンテンツへのリンクも用意されており、迅速に情報を得ることができます。

articles/ai-services/openai/how-to/realtime-audio.md

Diff
@@ -1,11 +1,11 @@
 ---
-title: 'How to use the GPT-4o Realtime API for speech and audio with Azure OpenAI Service'
-titleSuffix: Azure OpenAI
-description: Learn how to use the GPT-4o Realtime API for speech and audio with Azure OpenAI Service.
+title: 'How to use the GPT-4o Realtime API for speech and audio with Azure OpenAI'
+titleSuffix: Azure OpenAI Service
+description: Learn how to use the GPT-4o Realtime API for speech and audio with Azure OpenAI.
 manager: nitinme
 ms.service: azure-ai-openai
 ms.topic: how-to
-ms.date: 3/20/2025
+ms.date: 4/28/2025
 author: eric-urban
 ms.author: eur
 ms.custom: references_regions
@@ -20,12 +20,17 @@ Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o mode
 
 Most users of the Realtime API need to deliver and receive audio from an end-user in real time, including applications that use WebRTC or a telephony system. The Realtime API isn't designed to connect directly to end user devices and relies on client integrations to terminate end user audio streams. 
 
+You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time. In most cases, we recommend using the WebRTC API for low-latency real-time audio streaming. For more information, see:
+- [Realtime API via WebRTC](./realtime-audio-webrtc.md)
+- [Realtime API via WebSockets](./realtime-audio-websockets.md)
+
 ## Supported models
 
 The GPT 4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
 - `gpt-4o-mini-realtime-preview` (2024-12-17)
 - `gpt-4o-realtime-preview` (2024-12-17)
-- `gpt-4o-realtime-preview` (2024-10-01)
+
+You should use API version `2025-04-01-preview` in the URL for the Realtime API. 
 
 See the [models and versions documentation](../concepts/models.md#audio-models) for more information.
 
@@ -39,78 +44,9 @@ Before you can use GPT-4o real-time audio, you need:
 
 Here are some of the ways you can get started with the GPT-4o Realtime API for speech and audio:
 - For steps to deploy and use the `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model, see [the real-time audio quickstart](../realtime-audio-quickstart.md).
-- Download the sample code from the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
+- Try the [WebRTC via HTML and JavaScript example](./realtime-audio-webrtc.md#webrtc-example-via-html-and-javascript) to get started with the Realtime API via WebRTC.
 - [The Azure-Samples/aisearch-openai-rag-audio repo](https://github.com/Azure-Samples/aisearch-openai-rag-audio) contains an example of how to implement RAG support in applications that use voice as their user interface, powered by the GPT-4o realtime API for audio.
 
-## Connection and authentication
-
-The Realtime API (via `/realtime`) is built on [the WebSockets API](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) to facilitate fully asynchronous streaming communication between the end user and model. 
-
-> [!IMPORTANT]
-> Device details like capturing and rendering audio data are outside the scope of the Realtime API. It should be used in the context of a trusted, intermediate service that manages both connections to end users and model endpoint connections. Don't use it directly from untrusted end user devices.
-
-The Realtime API is accessed via a secure WebSocket connection to the `/realtime` endpoint of your Azure OpenAI resource.
-
-You can construct a full request URI by concatenating:
-
-- The secure WebSocket (`wss://`) protocol.
-- Your Azure OpenAI resource endpoint hostname, for example, `my-aoai-resource.openai.azure.com`
-- The `openai/realtime` API path.
-- An `api-version` query string parameter for a supported API version such as `2024-12-17`
-- A `deployment` query string parameter with the name of your `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model deployment.
-
-The following example is a well-constructed `/realtime` request URI:
-
-```http
-wss://my-eastus2-openai-resource.openai.azure.com/openai/realtime?api-version=2024-12-17&deployment=gpt-4o-mini-realtime-preview-deployment-name
-```
-
-To authenticate:
-- **Microsoft Entra** (recommended): Use token-based authentication with the `/realtime` API for an Azure OpenAI Service resource with managed identity enabled. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
-- **API key**: An `api-key` can be provided in one of two ways:
-  - Using an `api-key` connection header on the prehandshake connection. This option isn't available in a browser environment.
-  - Using an `api-key` query string parameter on the request URI. Query string parameters are encrypted when using https/wss.
-
-## Realtime API architecture
-
-Once the WebSocket connection session to `/realtime` is established and authenticated, the functional interaction takes place via events for sending and receiving WebSocket messages. These events each take the form of a JSON object. 
-
-:::image type="content" source="../media/how-to/real-time/realtime-api-sequence.png" alt-text="Diagram of the Realtime API authentication and connection sequence." lightbox="../media/how-to/real-time/realtime-api-sequence.png":::
-
-<!--
-sequenceDiagram
-  actor User as End User
-  participant MiddleTier as /realtime host
-  participant AOAI as Azure OpenAI
-  User->>MiddleTier: Begin interaction
-  MiddleTier->>MiddleTier: Authenticate/Validate User
-  MiddleTier--)User: audio information
-  User--)MiddleTier: 
-  MiddleTier--)User: text information
-  User--)MiddleTier: 
-  MiddleTier--)User: control information
-  User--)MiddleTier: 
-  MiddleTier->>AOAI: connect to /realtime
-  MiddleTier->>AOAI: configure session
-  AOAI->>MiddleTier: session start
-  MiddleTier--)AOAI: send/receive WS commands
-  AOAI--)MiddleTier: 
-  AOAI--)MiddleTier: create/start conversation responses
-  AOAI--)MiddleTier: (within responses) create/start/add/finish items
-  AOAI--)MiddleTier: (within items) create/stream/finish content parts
--->
-
-Events can be sent and received in parallel and applications should generally handle them both concurrently and asynchronously.
-
-- A client-side caller establishes a connection to `/realtime`, which starts a new [`session`](#session-configuration).
-- A `session` automatically creates a default `conversation`. Multiple concurrent conversations aren't supported.
-- The `conversation` accumulates input signals until a `response` is started, either via a direct event by the caller or automatically by voice activity detection (VAD).
-- Each `response` consists of one or more `items`, which can encapsulate messages, function calls, and other information.
-- Each message `item` has `content_part`, allowing multiple modalities (text and audio) to be represented across a single item.
-- The `session` manages configuration of caller input handling (for example, user audio) and common output generation handling.
-- Each caller-initiated [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) can override some of the output [`response`](../realtime-audio-reference.md#realtimeresponse) behavior, if desired.
-- Server-created `item` and the `content_part` in messages can be populated asynchronously and in parallel. For example, receiving audio, text, and function information concurrently in a round robin fashion.
-
 ## Session configuration
 
 Often, the first event sent by the caller on a newly established `/realtime` session is a [`session.update`](../realtime-audio-reference.md#realtimeclienteventsessionupdate) payload. This event controls a wide set of input and output behavior, with output and response generation properties then later overridable using the [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) event.

Summary

{
    "modification_type": "minor update",
    "modification_title": "GPT-4oリアルタイムAPIの音声とオーディオに関する記事の修正"
}

Explanation

この変更では、articles/ai-services/openai/how-to/realtime-audio.mdが修正され、GPT-4oリアルタイムAPIに関する情報が更新および整理されています。主な変更点にはタイトル、日付、説明の微修正が含まれ、さらにWebRTCとWebSocketの使用に関する情報が追加されました。

具体的には、リアルタイムAPIを使用するための推奨事項として、WebRTCとWebSocketに関するリンクが提供されており、どちらを選択するべきかのガイダンスが行われています。また、サポートされているモデルのバージョンが明確に記載されたり、APIのバージョンの更新が言及されています。

さらに、従来の接続および認証セクションは削除され、より重要なトピックに焦点が当てられるように内容が整理されています。これにより、開発者はより便利で重要な情報を含む最新のリソースにアクセスできるようになっています。この更新は、ユーザーがGPT-4oリアルタイムAPIを効果的に使用するための理解を深めることに寄与します。

articles/ai-services/openai/includes/api-versions/latest-inference-preview.md

Diff
@@ -1221,7 +1221,7 @@ Generates a batch of images from a text caption on a given DALL-E or GPT-image-1
 | n | integer | The number of images to generate. | No | 1 |
 | prompt | string | A text description of the desired image(s). The maximum length is 4000 characters. | Yes |  |
 | quality | [imageQuality](#imagequality) | The quality of the image that will be generated. | No | standard (for DALL-E)</br>high (for GPT-image-1) |
-| response_format | [imagesResponseFormat](#imagesresponseformat) | The format in which the generated images are returned. | No | url |
+| response_format | [imagesResponseFormat](#imagesresponseformat) | The format in which dall-e-3 generated images are returned. Must be one of `url` or `b64_json`. U This parameter isn't supported for gpt-image-1 which will always return base64-encoded images. | No | url |
 | size | [imageSize](#imagesize) | The size of the generated images. | No | 1024x1024 |
 | style | [imageStyle](#imagestyle) | The style of the generated images. (DALL-E 3 only)| No | vivid |
 | user | string | A unique identifier representing your end-user, which can help to monitor and detect abuse. | No |  |
@@ -1367,7 +1367,7 @@ Generates an image based on an input image and text prompt instructions. Require
 | prompt | string | A text description of how the input image should be edited. The maximum length is 4000 characters. | Yes |  |
 | mask | file | A mask image to define the area of the input image that the model should edit, using fully transparent pixels (alpha of zero) in those areas. Must be a valid image URL or base64-encoded image. | No |  |
 | quality | string | The quality of the image that will be generated. Values are 'low', 'medium', 'high' | No | high |
-| response_format | [imagesResponseFormat](#imagesresponseformat) | The format in which the generated images are returned. | No | url |
+| response_format | [imagesResponseFormat](#imagesresponseformat) | The format in which dalle-3 generated images are returned. Must be one of `url` or `b64_json`. This parameter isn't supported for gpt-image-1 which will always return base64-encoded images. | No | url |
 | size | [imageSize](#imagesize) | The size of the generated images. | No | 1024x1024 |
 | user | string | A unique identifier representing your end-user, which can help to monitor and detect abuse. | No |  |
 | output_format | [imageOutputFormat](#imageoutputformat) | The format in which the generated images are returned. | No | PNG |
@@ -6163,7 +6163,7 @@ The format in which the generated images are returned.
 
 | Property | Value |
 |----------|-------|
-| **Description** | The format in which the generated images are returned. |
+| **Description** |  The format in which dalle-3 generated images are returned. Must be one of `url` or `b64_json`. This parameter isn't supported for gpt-image-1 which will always return base64-encoded images. |
 | **Type** | string |
 | **Default** | url |
 | **Values** | `url`<br>`b64_json` |
@@ -6210,7 +6210,7 @@ The style of the generated images.
 | n | integer | The number of images to generate. | No | 1 |
 | prompt | string | A text description of the desired image(s). The maximum length is 4000 characters. | Yes |  |
 | quality | [imageQuality](#imagequality) | The quality of the image that will be generated. | No | standard (for DALL-E)</br>high (for GPT-image-1) |
-| response_format | [imagesResponseFormat](#imagesresponseformat) | The format in which the generated images are returned. | No | url |
+| response_format | [imagesResponseFormat](#imagesresponseformat) |  The format in which dalle-3 generated images are returned. Must be one of `url` or `b64_json`. This parameter isn't supported for gpt-image-1 which will always return base64-encoded images. | No | url |
 | size | [imageSize](#imagesize) | The size of the generated images. | No | 1024x1024 |
 | style | [imageStyle](#imagestyle) | The style of the generated images. (DALL-E 3 only)| No | vivid |
 | user | string | A unique identifier representing your end-user, which can help to monitor and detect abuse. | No |  |
@@ -6230,7 +6230,7 @@ The style of the generated images.
 | prompt | string | A text description of how the input image should be edited. The maximum length is 4000 characters. | Yes |  |
 | mask | file | A mask image to define the area of the input image that the model should edit, using fully transparent pixels (alpha of zero) in those areas. Must be a valid image URL or base64-encoded image. | No |  |
 | quality | string | The quality of the image that will be generated. Values are 'low', 'medium', 'high' | No | high |
-| response_format | [imagesResponseFormat](#imagesresponseformat) | The format in which the generated images are returned. | No | url |
+| response_format | [imagesResponseFormat](#imagesresponseformat) |  The format in which dalle-3 generated images are returned. Must be one of `url` or `b64_json`. This parameter isn't supported for gpt-image-1 which will always return base64-encoded images. | No | url |
 | size | [imageSize](#imagesize) | The size of the generated images. | No | 1024x1024 |
 | user | string | A unique identifier representing your end-user, which can help to monitor and detect abuse. | No |  |
 | output_format | [imageOutputFormat](#imageoutputformat) | The format in which the generated images are returned. | No | PNG |

Summary

{
    "modification_type": "minor update",
    "modification_title": "DALL-E 3の生成画像のレスポンス形式に関する修正"
}

Explanation

この変更では、articles/ai-services/openai/includes/api-versions/latest-inference-preview.mdが修正され、DALL-E 3を利用した画像生成に関するレスポンス形式の情報が明確に更新されています。具体的には、response_formatの説明が改訂され、DALL-E 3によって生成される画像がurlまたはb64_jsonのいずれかの形式で返されることが強調されました。また、GPT-image-1については、常にbase64エンコードされた画像が返されることが明記されています。

この変更により、開発者はDALL-E 3のAPIを使用する際に、画像の取得方法に関する理解が深まります。また、各種レスポンス形式に関する情報が整理されることで、APIの利用者が必要なデータ形式を選択する際の手助けとなることが期待されます。

articles/ai-services/openai/includes/model-matrix/global-batch.md

Diff
@@ -27,7 +27,6 @@ ms.date: 04/21/2025
 | southafricanorth   | ✅                        | ✅                       | ✅                       | ✅                       | ✅                            | ✅                | ✅                            | ✅                       | ✅                       | ✅                       |
 | southcentralus     | ✅                        | ✅                       | ✅                       | ✅                       | ✅                            | ✅                | ✅                            | ✅                       | ✅                       | ✅                       |
 | southindia         | ✅                        | ✅                       | ✅                       | ✅                       | ✅                            | ✅                | ✅                            | ✅                       | ✅                       | ✅                       |
-| spaincentral       | ✅                        | -                      | -                      | -                      | -                           | -               | -                           | -                      | -                      | -                      |
 | swedencentral      | ✅                        | ✅                       | ✅                       | ✅                       | ✅                            | ✅                | ✅                            | -                      | ✅                       | ✅                       |
 | switzerlandnorth   | ✅                        | ✅                       | ✅                       | ✅                       | ✅                            | ✅                | ✅                            | -                      | ✅                       | ✅                       |
 | uksouth            | ✅                        | ✅                       | ✅                       | ✅                       | ✅                            | ✅                | ✅                            | -                      | ✅                       | ✅                       |

Summary

{
    "modification_type": "minor update",
    "modification_title": "グローバルバッチモデルマトリックスにおけるスペイン中央の備考欄の削除"
}

Explanation

この変更では、articles/ai-services/openai/includes/model-matrix/global-batch.mdファイルの内容が修正され、グローバルバッチモデルマトリックスにおける「スペイン中央」行の情報が削除されました。具体的には、スペイン中央の行にあった全てのチェックマークが削除され、備考欄に含まれていたチェックボックスも−に置き換えられました。

この更新により、スペイン中央での機能の有無が明確になり、他の地域との比較が容易になります。この変更は、ユーザーが利用可能な機能を正確に理解するための重要な情報の整理を反映しています。

articles/ai-services/openai/includes/realtime-javascript.md

Diff
@@ -73,7 +73,7 @@ For the recommended keyless authentication with Microsoft Entra ID, you need to:
         const endpoint = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
         // Required Azure OpenAI deployment name and API version
         const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || "gpt-4o-mini-realtime-preview";
-        const apiVersion = process.env.OPENAI_API_VERSION || "2024-10-01-preview";
+        const apiVersion = process.env.OPENAI_API_VERSION || "2025-04-01-preview";
         // Keyless authentication 
         const credential = new DefaultAzureCredential();
         const scope = "https://cognitiveservices.azure.com/.default";
@@ -156,7 +156,7 @@ For the recommended keyless authentication with Microsoft Entra ID, you need to:
         const apiKey = process.env.AZURE_OPENAI_API_KEY || "Your API key";
         // Required Azure OpenAI deployment name and API version
         const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || "gpt-4o-mini-realtime-preview";
-        const apiVersion = process.env.OPENAI_API_VERSION || "2024-10-01-preview";
+        const apiVersion = process.env.OPENAI_API_VERSION || "2025-04-01-preview";
         const azureOpenAIClient = new AzureOpenAI({
             apiKey: apiKey,
             apiVersion: apiVersion,
@@ -247,45 +247,3 @@ Received 26400 bytes of audio data.
 
 Connection closed!
 ```
-
-## Web application sample
-
-Our JavaScript web sample [on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk) demonstrates how to use the GPT-4o Realtime API to interact with the model in real time. The sample code includes a simple web interface that captures audio from the user's microphone and sends it to the model for processing. The model responds with text and audio, which the sample code renders in the web interface.
-
-You can run the sample code locally on your machine by following these steps. Refer to the [repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk) for the most up-to-date instructions.
-1. If you don't have Node.js installed, download and install the [LTS version of Node.js](https://nodejs.org/).
-
-1. Clone the repository to your local machine:
-    
-    ```bash
-    git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk.git
-    ```
-
-1. Go to the `javascript/samples/web` folder in your preferred code editor.
-
-    ```bash
-    cd ./javascript/samples
-    ```
-
-1. Run `download-pkg.ps1` or `download-pkg.sh` to download the required packages. 
-
-1. Go to the `web` folder from the `./javascript/samples` folder.
-
-    ```bash
-    cd ./web
-    ```
-
-1. Run `npm install` to install package dependencies.
-
-1. Run `npm run dev` to start the web server, navigating any firewall permissions prompts as needed.
-1. Go to any of the provided URIs from the console output (such as `http://localhost:5173/`) in a browser.
-1. Enter the following information in the web interface:
-    - **Endpoint**: The resource endpoint of an Azure OpenAI resource. You don't need to append the `/realtime` path. An example structure might be `https://my-azure-openai-resource-from-portal.openai.azure.com`.
-    - **API Key**: A corresponding API key for the Azure OpenAI resource.
-    - **Deployment**: The name of the `gpt-4o-mini-realtime-preview` model that [you deployed in the previous section](#deploy-a-model-for-real-time-audio).
-    - **System Message**: Optionally, you can provide a system message such as "You always talk like a friendly pirate."
-    - **Temperature**: Optionally, you can provide a custom temperature.
-    - **Voice**: Optionally, you can select a voice.
-1. Select the **Record** button to start the session. Accept permissions to use your microphone if prompted.
-1. You should see a `<< Session Started >>` message in the main output. Then you can speak into the microphone to start a chat.
-1. You can interrupt the chat at any time by speaking. You can end the chat by selecting the **Stop** button.

Summary

{
    "modification_type": "minor update",
    "modification_title": "APIバージョンの更新とサンプルコード削除"
}

Explanation

この変更では、articles/ai-services/openai/includes/realtime-javascript.mdファイルの内容が修正され、主にAPIバージョンが更新されました。具体的には、apiVersionのデフォルト値が2024-10-01-previewから2025-04-01-previewに変更されました。これにより、最新のAPI機能や改良点が反映されることが期待されます。

さらに、ファイルの大部分が削除され、以前のJavaScriptウェブサンプルに関する詳細な情報が削除されました。これには、サンプルコードの使用方法や、ローカルでの実行手順が含まれていました。この変更により、ドキュメントが簡潔化され、現在の主要な情報に焦点を当てることができるようになっています。

articles/ai-services/openai/includes/realtime-python.md

Diff
@@ -109,7 +109,7 @@ For the recommended keyless authentication with Microsoft Entra ID, you need to:
         client = AsyncAzureOpenAI(
             azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
             azure_ad_token_provider=token_provider,
-            api_version="2024-10-01-preview",
+            api_version="2025-04-01-preview",
         )
         async with client.beta.realtime.connect(
             model="gpt-4o-realtime-preview",  # name of your deployment
@@ -181,7 +181,7 @@ For the recommended keyless authentication with Microsoft Entra ID, you need to:
         client = AsyncAzureOpenAI(
             azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
             api_key=os.environ["AZURE_OPENAI_API_KEY"],
-            api_version="2024-10-01-preview",
+            api_version="2025-04-01-preview",
         )
         async with client.beta.realtime.connect(
             model="gpt-4o-realtime-preview",  # deployment name of your model

Summary

{
    "modification_type": "minor update",
    "modification_title": "APIバージョンの更新"
}

Explanation

この変更では、articles/ai-services/openai/includes/realtime-python.mdファイルにおいて、APIバージョンが更新されました。具体的には、AsyncAzureOpenAIクライアントの設定において、api_versionの値が2024-10-01-previewから2025-04-01-previewに変更されました。この変更により、ユーザーは最新のAPI機能や改善にアクセスできるようになります。

加えて、全体で4つの行が変更されており、コードが最新の仕様に従って調整されています。このアップデートは、Pythonにおけるリアルタイムの操作に関するドキュメントを最新のものに保つために重要です。

articles/ai-services/openai/includes/realtime-typescript.md

Diff
@@ -84,7 +84,7 @@ For the recommended keyless authentication with Microsoft Entra ID, you need to:
         
         // Required Azure OpenAI deployment name and API version
         const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || "gpt-4o-mini-realtime-preview";
-        const apiVersion = process.env.OPENAI_API_VERSION || "2024-10-01-preview"; 
+        const apiVersion = process.env.OPENAI_API_VERSION || "2025-04-01-preview"; 
     
         // Keyless authentication 
         const credential = new DefaultAzureCredential();
@@ -197,7 +197,7 @@ For the recommended keyless authentication with Microsoft Entra ID, you need to:
         
         // Required Azure OpenAI deployment name and API version
         const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || "gpt-4o-mini-realtime-preview";
-        const apiVersion = process.env.OPENAI_API_VERSION || "2024-10-01-preview"; 
+        const apiVersion = process.env.OPENAI_API_VERSION || "2025-04-01-preview"; 
         
         const azureOpenAIClient = new AzureOpenAI({
             apiKey: apiKey,

Summary

{
    "modification_type": "minor update",
    "modification_title": "APIバージョンの更新"
}

Explanation

この変更では、articles/ai-services/openai/includes/realtime-typescript.mdファイルにおいて、APIバージョンが更新されました。具体的には、apiVersionのデフォルト値が2024-10-01-previewから2025-04-01-previewに変更されています。この更新により、ユーザーは最新のAPI機能や改善にアクセスできるようになります。

変更は合計で4行あり、このファイル内で同様の修正が2箇所で行われました。このアップデートは、TypeScriptを用いたAzure OpenAIサービスのドキュメントを最新の状態に保つために重要です。

articles/ai-services/openai/media/how-to/real-time/ephemeral-key-webrtc.png

Summary

{
    "modification_type": "new feature",
    "modification_title": "新しい画像の追加: ephemeral-key-webrtc.png"
}

Explanation

この変更では、articles/ai-services/openai/media/how-to/real-time/ephemeral-key-webrtc.pngという新しい画像ファイルが追加されました。このファイルは、リアルタイム通信におけるエフェメラルキーの使用方法を示すためのものです。

この追加により、関連するドキュメントやガイドラインに視覚的な情報が提供され、ユーザーが手順を理解しやすくなることが期待されます。ファイルに対する変更はなく、新規追加のみのため、事実上の変更はありませんが、情報の価値を高める要素として重要です。

articles/ai-services/openai/realtime-audio-quickstart.md

Diff
@@ -26,13 +26,12 @@ Most users of the Realtime API need to deliver and receive audio from an end-use
 The GPT 4o real-time models are available for global deployments.
 - `gpt-4o-realtime-preview` (version `2024-12-17`)
 - `gpt-4o-mini-realtime-preview` (version `2024-12-17`)
-- `gpt-4o-realtime-preview` (version `2024-10-01`)
 
 See the [models and versions documentation](./concepts/models.md#audio-models) for more information.
 
 ## API support
 
-Support for the Realtime API was first added in API version `2024-10-01-preview`. Use version `2024-10-01-preview` to access the Realtime API. 
+Support for the Realtime API was first added in API version `2024-10-01-preview` (retired). Use version `2025-04-01-preview` to access the latest Realtime API features. 
 
 ::: zone pivot="ai-foundry-portal"
 

Summary

{
    "modification_type": "minor update",
    "modification_title": "Realtime APIのバージョン情報の更新"
}

Explanation

この変更では、articles/ai-services/openai/realtime-audio-quickstart.mdファイルにおいて、Realtime APIに関するバージョン情報が更新されました。具体的には、以下のような変更が行われました。

  1. gpt-4o-realtime-previewgpt-4o-mini-realtime-previewのバージョンが維持されていますが、以前のバージョンであるgpt-4o-realtime-preview(バージョン2024-10-01)が削除され、新しいバージョン2025-04-01-previewを使用するように指示が追加されました。
  2. APIサポートに関するセクションでは、2024-10-01-previewバージョンが「廃止」とされ、最新のRealtime API機能にアクセスするためには2025-04-01-previewを使用するように記載されています。

この更新により、ユーザーは最新のAPIバージョンを利用することが推奨され、情報が最新の状態に保たれています。

articles/ai-services/openai/realtime-audio-reference.md

Diff
@@ -5,13 +5,13 @@ description: Learn how to use the Realtime API to interact with the Azure OpenAI
 manager: nitinme
 ms.service: azure-ai-openai
 ms.topic: conceptual
-ms.date: 3/20/2025
+ms.date: 4/28/2025
 author: eric-urban
 ms.author: eur
 recommendations: false
 ---
 
-# Realtime API (Preview) reference
+# Realtime events reference
 
 [!INCLUDE [Feature preview](includes/preview-feature.md)]
 
@@ -22,32 +22,6 @@ The Realtime API (via `/realtime`) is built on [the WebSockets API](https://deve
 > [!TIP]
 > To get started with the Realtime API, see the [quickstart](realtime-audio-quickstart.md) and [how-to guide](./how-to/realtime-audio.md).
 
-## Connection
-
-The Realtime API requires an existing Azure OpenAI resource endpoint in a supported region. The API is accessed via a secure WebSocket connection to the `/realtime` endpoint of your Azure OpenAI resource.
-
-You can construct a full request URI by concatenating:
-
-- The secure WebSocket (`wss://`) protocol.
-- Your Azure OpenAI resource endpoint hostname, for example, `my-aoai-resource.openai.azure.com`
-- The `openai/realtime` API path.
-- An `api-version` query string parameter for a supported API version such as `2024-10-01-preview`.
-- A `deployment` query string parameter with the name of your `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model deployment.
-
-The following example is a well-constructed `/realtime` request URI:
-
-```http
-wss://my-eastus2-openai-resource.openai.azure.com/openai/realtime?api-version=2024-10-01-preview&deployment=gpt-4o-realtime-preview
-```
-
-## Authentication
-
-To authenticate:
-- **Microsoft Entra** (recommended): Use token-based authentication with the `/realtime` API for an Azure OpenAI Service resource with managed identity enabled. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
-- **API key**: An `api-key` can be provided in one of two ways:
-  - Using an `api-key` connection header on the prehandshake connection. This option isn't available in a browser environment.
-  - Using an `api-key` query string parameter on the request URI. Query string parameters are encrypted when using https/wss.
-
 ## Client events
 
 There are nine client events that can be sent from the client to the server:

Summary

{
    "modification_type": "minor update",
    "modification_title": "Realtime APIリファレンスのタイトル変更と更新"
}

Explanation

この変更では、articles/ai-services/openai/realtime-audio-reference.mdファイルが修正され、主に以下の内容が更新されました。

  1. メタデータの日付が2025年3月20日から2025年4月28日に変更されました。
  2. ドキュメントのタイトルが「Realtime API (Preview) reference」から「Realtime events reference」に変更されました。
  3. 説明文の一部が簡略化され、新しいセクションが追加される形で、特定の接続や認証の詳細が削除されました。

具体的には、接続パラメータや認証方法に関する詳細な説明が削除されたため、ドキュメントがより簡潔になりました。しかし、ユーザーはAPIの使用方法などの基本的な情報にアクセスするためのチップやガイドへリンクを介して導かれています。これにより、記事全体がより明確に、そして使いやすくなっています。

articles/ai-services/openai/toc.yml

Diff
@@ -112,6 +112,14 @@ items:
           href: ./how-to/assistants-logic-apps.md
       - name: File search
         href: ./how-to/file-search.md
+  - name: Audio
+    items:
+      - name: Realtime API for speech and audio (preview)
+        href: ./how-to/realtime-audio.md
+      - name: Realtime API via WebRTC (preview)
+        href: ./how-to/realtime-audio-webrtc.md
+      - name: Realtime API via WebSockets (preview)
+        href: ./how-to/realtime-audio-websockets.md
   - name: Batch
     href: ./how-to/batch.md
   - name: Responses & chat completions
@@ -191,7 +199,7 @@ items:
         displayName: finetuning, fine-tuning  
       - name: Troubleshooting guidance
         href: ./how-to/fine-tuning-troubleshoot.md
-        displayName: finetuning, fine-tuning  
+        displayName: finetuning, fine-tuning
   - name: Stored completions
     href: ./how-to/stored-completions.md
   - name: Use your data
@@ -207,8 +215,6 @@ items:
         href: ./how-to/azure-developer-cli.md
       - name: Troubleshooting and best practices
         href: ./how-to/on-your-data-best-practices.md
-  - name: Use the Realtime API (preview)
-    href: ./how-to/realtime-audio.md
   - name: Migrate to OpenAI Python v1.x
     href: ./how-to/migration.md
   - name: Migrate to OpenAI JavaScript v4.x

Summary

{
    "modification_type": "minor update",
    "modification_title": "目次にオーディオ関連の項目を追加"
}

Explanation

この変更では、articles/ai-services/openai/toc.ymlファイルが更新され、主にオーディオ関連の項目が目次に追加されました。具体的な変更内容は以下の通りです。

  1. 新しい「Audio」という項目が追加され、その下に以下の3つのサブアイテムが追加されました:

    • Realtime API for speech and audio (preview)
    • Realtime API via WebRTC (preview)
    • Realtime API via WebSockets (preview)
  2. これにより、ユーザーはオーディオ関連のRealtime APIに関するドキュメントに簡単にアクセスできるようになります。

  3. さらに、既存の「fine-tuning」に関する項目のdisplayNameが変更され、冗長な表現が統一されました。

この変更は、オーディオ関連の機能が十分に文書化されていることを反映しており、ユーザーが必要な情報をより容易に見つけることができるようにする目的があります。