Highlights

本Diffは、Azure AI Searchのドキュメントに関連するいくつかの重要な更新を示しています。新しい画像およびドキュメントの追加に加え、既存ドキュメントのマイナーな更新や目次の改善が含まれています。

新しい機能

新しい画像ファイルdata-lake-storage-account.pngが追加され、関連するドキュメントを視覚的に補完します。
search-how-to-integrated-vectorization.mdという新しいドキュメントが追加され、統合ベクトル化の使用方法に関する詳細なガイドが提供されます。
目次(toc.yml)に「統合ベクトル化の使用」へのリンクが追加され、情報へのアクセス性が向上しました。

破壊的な変更

この差分には破壊的な変更は含まれていません。

その他の更新

search-indexer-overview.mdの修正には、インデクサーがサポートするデータソースリストへのOneLakeの追加が含まれています。これにより、新しいデータソースタイプをインデクシング可能としました。

Insights

このDiffでは、Azure AI Searchドキュメントへの貢献が3つの重要な方向で進化しています。まず視覚的なサポートとして新しい画像が追加され、ドキュメントの理解を進めるための補助役割を果たします。次に、統合ベクトル化に関する新しいドキュメントが提供され、ユーザーが具体的な設定やプロセスの流れを理解する助けとなります。特に、REST APIを使用して、どのようにデータを準備し、スキルセットを作成し、データソースからのベクトル化を行うかが詳しく説明されています。

また、目次の改善により、ユーザーは求めている情報に迅速にアクセスでき、ナビゲーション効率が向上しました。特筆すべきは、検索インデクサーの概要における更新で、OneLakeが新しくサポートされるデータソースに加わったことです。これはAzure AI Searchが対応するデータソースの範囲を広げ、さらに多様なニーズに応えられるようになることを示しています。

トータルでこれらの更新は、Azure AI Searchドキュメントの包括性とユーザーエクスペリエンスを高める重要なステップであり、今後の実装や利用においてより豊富な情報を提供します。

Summary Table

Filename	Type	Title	Status	A	D	M
data-lake-storage-account.png	new feature	新しい画像の追加: data-lake-storage-account.png	added	0	0	0
search-how-to-integrated-vectorization.md	new feature	新しいドキュメントの追加: search-how-to-integrated-vectorization.md	added	891	0	891
search-indexer-overview.md	minor update	検索インデクサーの概要の更新	modified	2	1	3
toc.yml	minor update	トピックの目次に「統合ベクトル化の使用」を追加	modified	2	0	2

Modified Contents

articles/search/media/search-how-to-integrated-vectorization/data-lake-storage-account.png

Summary

{
    "modification_type": "new feature",
    "modification_title": "新しい画像の追加: data-lake-storage-account.png"
}

Explanation

この変更は、articles/search/media/search-how-to-integrated-vectorization/ディレクトリに新しい画像ファイルdata-lake-storage-account.pngを追加したことを示しています。この画像は、ビジュアルコンテンツを提供するために使われ、検索機能に関連するドキュメントを補完する役割を果たします。現在、変更内容は追加されたものであり、削除された要素や変更はありません。新しい画像は、[こちらのリンク]からアクセスできます。

articles/search/search-how-to-integrated-vectorization.md

Diff

@@ -0,0 +1,891 @@
+---
+title: Integrated Vectorization Using REST APIs
+titleSuffix: Azure AI Search
+description: Learn how to use skills to automate data chunking and vectorization during indexing and query execution.
+manager: nitinme
+author: haileytap
+ms.author: haileytapia
+ms.service: azure-ai-search
+ms.topic: how-to
+ms.date: 04/29/2025
+---
+
+# Set up integrated vectorization in Azure AI Search using REST
+
+In this article, you learn how to use a skillset to chunk and vectorize content from a [supported data source](#supported-data-sources). The skillset calls the [Text Split skill](cognitive-search-skill-textsplit.md) or [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) for chunking and an embedding skill that's attached to a [supported embedding model](#supported-embedding-models) for chunk vectorization. You also learn how to store the chunked and vectorized content in a [vector index](vector-search-how-to-create-index.md).
+
+This article describes the end-to-end workflow for [integrated vectorization](vector-search-integrated-vectorization.md) using REST<!--or Python-->. For portal-based instructions, see [Quickstart: Vectorize text and images in the Azure portal](search-get-started-portal-import-vectors.md).
+
+## Prerequisites
+
++ An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
+
++ An [Azure AI Search service](search-create-service-portal.md). We recommend the Basic tier or higher.
+
++ A [supported data source](#supported-data-sources).
+
++ A [supported embedding model](#supported-embedding-models).
+
++ Completion of [Quickstart: Connect without keys](search-get-started-rbac.md) and [Configure a system-assigned managed identity](search-howto-managed-identities-data-sources.md#create-a-system-managed-identity). Although you can use key-based authentication for data plane operations, this article assumes [roles and managed identities](#role-based-access), which are more secure.
+
++ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client)<!--or the [Python extension](https://marketplace.visualstudio.com/items?itemName=ms-python.python) and [Jupyter package](https://pypi.org/project/jupyter/)-->.
+
+### Supported data sources
+
+Integrated vectorization works with [all supported data sources](search-indexer-overview.md#supported-data-sources). However, this article focuses on the most commonly used data sources, which are described in the following table.
+
+| Supported data source | Description |
+|--|--|
+| [Azure Blob Storage](search-howto-indexing-azure-blob-storage.md) | This data source works with blobs and tables. You must use a standard performance (general-purpose v2) account. Access tiers can be hot, cool, or cold. |
+| [Azure Data Lake Storage (ADLS) Gen2](/azure/storage/blobs/create-data-lake-storage-account) | This is an Azure Storage account with a hierarchical namespace enabled. To confirm that you have Data Lake Storage, check the **Properties** tab on the **Overview** page.<br><br> :::image type="content" source="media/search-how-to-integrated-vectorization/data-lake-storage-account.png" alt-text="Screenshot of an Azure Data Lake Storage account in the Azure portal." border="true" lightbox="media/search-how-to-integrated-vectorization/data-lake-storage-account.png"::: |
+<!--| [OneLake](search-how-to-index-onelake-files.md) | This data source is currently in preview. For information about limitations and supported shortcuts, see [OneLake indexing](search-how-to-index-onelake-files.md). |-->
+
+### Supported embedding models
+
+For integrated vectorization, you must use one of the following embedding models on an Azure AI platform in the [same region as Azure AI Search](search-create-service-portal.md#regions-with-the-most-overlap). Deployment instructions are provided in a [later section](#prepare-your-embedding-model).
+
+| Provider | Supported models |
+|--|--|
+| [Azure OpenAI Service](/azure/ai-services/openai/how-to/create-resource) <sup>1, 2</sup> | text-embedding-ada-002<br>text-embedding-3-small<br>text-embedding-3-large |
+| [Azure AI services multi-service resource](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills) <sup>3</sup> | For text and images: [Azure AI Vision multimodal](/azure/ai-services/computer-vision/how-to/image-retrieval) <sup>4</sup></li> |
+<!--| [Azure AI Foundry model catalog](/azure/ai-foundry/what-is-azure-ai-foundry) | For text:<br>Cohere-embed-v3-english<br>Cohere-embed-v3-multilingual<br><br>For images:<br>Facebook-DinoV2-Image-Embeddings-ViT-Base<br>Facebook-DinoV2-Image-Embeddings-ViT-Giant |-->
+
+<sup>1</sup> The endpoint of your Azure OpenAI resource must have a [custom subdomain](/azure/ai-services/cognitive-services-custom-subdomains), such as `https://my-unique-name.openai.azure.com`. If you created your resource in the [Azure portal](https://portal.azure.com/), this subdomain was automatically generated during resource setup.
+
+<sup>2</sup> Azure OpenAI resources (with access to embedding models) that were created in the [Azure AI Foundry portal](https://ai.azure.com/) aren't supported. Only Azure OpenAI resources created in the Azure portal are compatible with the [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md) integration.
+
+<sup>3</sup> For billing purposes, you must [attach your Azure AI multi-service resource](cognitive-search-attach-cognitive-services.md) to the skillset in your Azure AI Search service. Unless you use a [keyless connection (preview)](cognitive-search-attach-cognitive-services.md#bill-through-a-keyless-connection) to create the skillset, both resources must be in the same region.
+
+<sup>4</sup> The Azure AI Vision multimodal embedding model is available in [select regions](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
+
+### Role-based access
+
+You can use Microsoft Entra ID with role assignments or key-based authentication with full-access connection strings. For Azure AI Search connections to other resources, we recommend role assignments.
+
+To configure role-based access for integrated vectorization:
+
+1. On your search service, [enable roles](search-security-enable-roles.md) and [configure a system-assigned managed identity](search-howto-managed-identities-data-sources.md#create-a-system-managed-identity).
+
+1. On your data source platform and embedding model provider, create role assignments that allow your search service to access data and models. See [Prepare your data](#prepare-your-data) and [Prepare your embedding model](#prepare-your-embedding-model).
+
+> [!NOTE]
+> Free search services support role-based connections to Azure AI Search. However, they don't support managed identities on outbound connections to Azure Storage or Azure AI Vision. This lack of support requires key-based authentication on connections between free search services and other Azure resources.
+>
+> For more secure connections, use the Basic tier or higher. You can then enable roles and configure a managed identity for authorized access.
+
+## Get connection information for Azure AI Search
+
+In this section, you retrieve the endpoint and Microsoft Entra token for your Azure AI Search service. Both values are necessary to establish connections in REST<!--and Python--> requests.
+
+> [!TIP]
+> The following steps assume that you're using [role-based access](#role-based-access) for proof-of-concept testing. If you want to use integrated vectorization for app development, see [Connect your app to Azure AI Search using identities](keyless-connections.md).
+
+1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Azure AI Search service.
+
+1. To obtain your search endpoint, copy the URL on the **Overview** page. An example search endpoint is `https://my-service.search.windows.net`.
+
+1. To obtain your Microsoft Entra token, run the following command on your local system. This step requires completion of [Quickstart: Connect without keys](search-get-started-rbac.md).
+
+   ```Azure CLI
+   az account get-access-token --scope https://search.azure.com/.default --query accessToken --output tsv
+   ```
+
+## Prepare your data
+
+In this section, you prepare your data for integrated vectorization by uploading files to a [supported data source](#supported-data-sources), assigning roles, and obtaining connection information.
+
+### [Azure Blob Storage](#tab/prepare-data-storage)
+
+1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Azure Storage account.
+
+1. From the left pane, select **Data storage** > **Containers**.
+
+1. Create a container or select an existing container, and then upload your files to the container.
+
+1. To assign roles:
+
+   1. From the left pane, select **Access Control (IAM)**.
+
+   1. Select **Add** > **Add role assignment**.
+
+   1. Under **Job function roles**, select **[Storage Blob Data Reader](search-howto-managed-identities-data-sources.md#assign-a-role)**, and then select **Next**.
+
+   1. Under **Members**, select **Managed identity**, and then select **Select members**.
+
+   1. Select your subscription and the managed identity of your search service.
+
+1. To obtain a connection string:
+
+   1. From the left pane, select **Security + networking** > **Access keys**.
+
+   1. Copy either connection string, which you specify later in [Set variables](#set-variables).
+
+1. (Optional) Synchronize deletions in your container with deletions in the search index. To configure your indexer for deletion detection:
+
+   1. [Enable soft delete](/azure/storage/blobs/soft-delete-blob-enable?tabs=azure-portal#enable-blob-soft-delete-hierarchical-namespace) on your storage account. If you're using [native soft delete](search-howto-index-changed-deleted-blobs.md#native-blob-soft-delete), the next step isn't required.
+
+   1. [Add custom metadata](search-howto-index-changed-deleted-blobs.md#soft-delete-strategy-using-custom-metadata) that an indexer can scan to determine which blobs are marked for deletion. Give your custom property a descriptive name. For example, you can name the property "IsDeleted" and set it to false. Repeat this step for every blob in the container. When you want to delete the blob, change the property to true. For more information, see [Change and delete detection when indexing from Azure Storage](search-howto-index-changed-deleted-blobs.md).
+
+### [ADLS Gen2](#tab/prepare-data-adlsgen2)
+
+1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Azure Storage account.
+
+1. From the left pane, select **Data storage** > **Containers**.
+
+1. Create a container or select an existing container, and then upload your files to the container.
+
+1. To assign roles:
+
+   1. From the left pane, select **Access Control (IAM)**.
+
+   1. Select **Add** > **Add role assignment**.
+
+   1. Under **Job function roles**, select **[Storage Blob Data Reader](search-howto-managed-identities-data-sources.md#assign-a-role)**, and then select **Next**.
+
+   1. Under **Members**, select **Managed identity**, and then select **Select members**.
+
+   1. Select your subscription and the managed identity of your search service.
+
+1. To obtain a connection string:
+
+   1. From the left pane, select **Security + networking** > **Access keys**.
+
+   1. Copy either connection string, which you specify later in [Set variables](#set-variables).
+
+1. (Optional) Synchronize deletions in your container with deletions in the search index. To configure your indexer for deletion detection:
+
+   1. [Enable soft delete](/azure/storage/blobs/soft-delete-blob-enable?tabs=azure-portal#enable-blob-soft-delete-hierarchical-namespace) on your storage account.
+
+   1. [Add custom metadata](search-howto-index-changed-deleted-blobs.md#soft-delete-strategy-using-custom-metadata) that an indexer can scan to determine which blobs are deleted. Give your custom property a descriptive name. For example, you can name the property "IsDeleted" and set it to false. Repeat this step for every blob in the container. When you want to delete the blob, change the property to true. For more information, see [Change and delete detection when indexing from Azure Storage](search-howto-index-changed-deleted-blobs.md).
+
+<!--### [OneLake](#tab/prepare-data-onelake)
+
+1. Sign in to [Power BI](https://powerbi.com/) and [create a workspace](/fabric/data-engineering/tutorial-lakehouse-get-started).
+
+1. From the left pane, select your new workspace.
+
+1. To assign roles to your workspace:
+
+   1. In the upper-right corner, select **Manage access**.
+
+   1. Select **Add people or groups**.
+
+   1. Enter the name of your search service. For example, if the URL is `https://my-demo-service.search.windows.net`, the service name is `my-demo-service`.
+
+   1. Select a role. The default is **Viewer**, but you need **Contributor** to pull data into a search index.
+
+1. To create a lakehouse and upload your data:
+
+   1. In the upper-left corner, select **New item**.
+
+   1. Select the **Lakehouse** tile.
+
+   1. Enter a name for your lakehouse, and then select **Create**.
+
+   1. On the **Home** tab of your lakehouse, select **Upload files**.
+
+1. To obtain connection IDs:
+
+   1. At the top of your browser, locate the lakehouse URL, which has the following format: `https://msit.powerbi.com/groups/00000000-0000-0000-0000-000000000000/lakehouses/11111111-1111-1111-1111-111111111111?experience=power-bi`.
+
+   1. Copy the workspace ID, which is listed after "groups" in the URL. You specify this ID later in [Set variables](#set-variables). In our example, the workspace ID is `00000000-0000-0000-0000-000000000000`.
+
+   1. Copy the lakehouse ID, which is listed after "lakehouses" in the URL. You specify this ID later in [Set variables](#set-variables). In our example, the lakehouse ID is `11111111-1111-1111-1111-111111111111`.-->
+
+---
+
+## Prepare your embedding model
+
+In this section, you prepare your Azure AI resource for integrated vectorization by assigning roles, obtaining an endpoint, and deploying a [supported embedding model](#supported-embedding-models).
+
+### [Azure OpenAI](#tab/prepare-model-aoai)
+
+Azure AI Search supports text-embedding-ada-002, text-embedding-3-small, and text-embedding-3-large. Internally, Azure AI Search calls the [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md) to connect to Azure OpenAI.
+
+1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Azure OpenAI resource.
+
+1. To assign roles:
+
+   1. From the left pane, select **Access control (IAM)**.
+
+   1. Select **Add** > **Add role assignment**.
+
+   1. Under **Job function roles**, select **[Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles)**, and then select **Next**.
+
+   1. Under **Members**, select **Managed identity**, and then select **Select members**.
+
+   1. Select your subscription and the managed identity of your search service.
+
+1. To obtain an endpoint:
+
+   1. From the left pane, select **Resource Management** > **Keys and Endpoint**.
+
+   1. Copy the endpoint for your Azure OpenAI resource. You specify this URL later in [Set variables](#set-variables).
+
+1. To deploy an embedding model:
+
+   1. Sign in to the [Azure AI Foundry portal](https://ai.azure.com/) and select your Azure OpenAI resource.
+
+   1. From the left pane, select **Model catalog**.
+
+   1. Deploy a [supported embedding model](#supported-embedding-models).
+
+   1. Copy the deployment and model names, which you specify later in [Set variables](#set-variables). The deployment name is the custom name you chose, while the model name is the model you deployed, such as `text-embedding-ada-002`.
+
+### [Azure AI Vision](#tab/prepare-model-ai-vision)
+
+Azure AI Search supports Azure AI Vision image retrieval through multimodal embeddings (version 4.0). Internally, Azure AI Search calls the [multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md) to connect to Azure AI Vision.
+
+1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Azure AI multi-service resource.
+
+1. To assign roles:
+
+   1. From the left pane, select **Access control (IAM)**.
+
+   1. Select **Add** > **Add role assignment**.
+
+   1. Under **Job function roles**, select **Cognitive Services User**, and then select **Next**.
+
+   1. Under **Members**, select **Managed identity**, and then select **Select members**.
+
+   1. Select your subscription and the managed identity of your search service.
+
+1. To obtain an endpoint:
+
+   1. From the left pane, select **Resource Management** > **Keys and Endpoint**.
+
+   1. Copy the endpoint for your Azure AI multi-service resource. You specify this URL later in [Set variables](#set-variables).
+
+   > [!NOTE]
+   > The multimodal embeddings are built into your Azure AI multi-service resource, so there's no model deployment step.
+
+<!--### [Azure AI Foundry model catalog](#tab/prepare-model-catalog)
+
+Azure AI Search supports Azure, Cohere, and Facebook embedding models in the [Azure AI Foundry](https://ai.azure.com/) model catalog, but it doesn't currently support the OpenAI CLIP models. Internally, Azure AI Search calls the [Azure Machine Learning (AML) skill](cognitive-search-aml-skill.md) to connect to the catalog.
+
+For the model catalog, you should have an [Azure AI Foundry project](/azure/ai-foundry/how-to/create-projects) with a [hub that's connected to an Azure OpenAI resource and an Azure AI Search service](/azure/ai-foundry/how-to/create-projects#create-a-project).
+
+1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Azure OpenAI resource.
+
+1. To assign roles:
+
+   1. From the left pane, select **Access control (IAM)**.
+
+   1. Select **Add** > **Add role assignment**.
+
+   1. Under **Job function roles**, select **Cognitive Services User**, and then select **Next**.
+
+   1. Under **Members**, select **Managed identity**, and then select **Select members**.
+
+   1. Select your subscription and the managed identity of your search service.
+
+1. To deploy an embedding model:
+
+   1. Sign in to the [Azure AI Foundry portal](https://ai.azure.com/) and select your project.
+
+   1. From the left pane, select **Model catalog**.
+
+   1. Deploy a [supported embedding model](#supported-embedding-models).
+
+   1. Copy the deployment and model names, which you specify later in [Set variables](#set-variables). The deployment name is the custom name you chose, while the model name is the model you deployed, such as `Cohere-embed-v3-english`.
+
+1. To obtain an endpoint:
+
+   1. From the left pane, select **My assets** > **Models + endpoints**.
+
+   1. Select the model you deployed.
+
+   1. On the **Details** tab, copy the endpoint. You specify this URL later in [Set variables](#set-variables).-->
+
+---
+
+## Set variables
+
+In this section, you specify the connection information for your Azure AI Search service, your [supported data source](#supported-data-sources), and your [supported embedding model](#supported-embedding-models).
+
+<!--### [REST](#tab/set-endpoints-rest)-->
+
+1. In Visual Studio Code, paste the following placeholders into your `.rest` or `.http` file.
+
+   ```HTTP
+   @baseUrl = PUT-YOUR-SEARCH-SERVICE-URL-HERE
+   @token = PUT-YOUR-MICROSOFT-ENTRA-TOKEN-HERE
+   ```
+
+1. Replace `@baseUrl` with the search endpoint and `@token` with the Microsoft Entra token you obtained in [Get connection information for Azure AI Search](#get-connection-information-for-azure-ai-search).
+
+1. Depending on your data source, add the following variables.
+
+   | Data source | Variables | Enter this information |
+   |--|--|--|
+   | Azure Blob Storage | `@storageConnectionString` and `@blobContainer` | The connection string and the name of the container you created in [Prepare your data](#prepare-your-data). |
+   | ADLS Gen2 | `@storageConnectionString` and `@blobContainer` | The connection string and the name of the container you created in [Prepare your data](#prepare-your-data). |
+   <!--| OneLake | `@workspaceId` and `@lakehouseId` | The workspace and lakehouse IDs you obtained in [Prepare your data](#prepare-your-data). |-->
+
+1. Depending on your embedding model provider, add the following variables.
+
+   | Embedding model provider | Variables | Enter this information |
+   |--|--|--|
+   | Azure OpenAI | `@aoaiEndpoint`, `@aoaiDeploymentName`, and `@aoaiModelName` | The endpoint, deployment name, and model name you obtained in [Prepare your embedding model](#prepare-your-embedding-model). |
+   | Azure AI Vision | `@aiMultiServiceEndpoint` | The endpoint you obtained in [Prepare your embedding model](#prepare-your-embedding-model). |
+   <!--| Azure AI Foundry model catalog | `@aoaiEndpoint`, `@aiFoundryDeploymentName`, and `@aiFoundryModelName` | The endpoint, deployment name, and model name you obtained in [Prepare your embedding model](#prepare-your-embedding-model). |-->
+
+1. To verify the variables, send the following request.
+
+   ```HTTP
+   ### List existing indexes by name
+   GET {{baseUrl}}/indexes?api-version=2024-07-01  HTTP/1.1
+     Content-Type: application/json
+     Authorization: Bearer {{token}}
+   ```
+
+   A response should appear in an adjacent pane. If you have existing indexes, they're listed. Otherwise, the list is empty. If the HTTP code is `200 OK`, you're ready to proceed.
+
+<!--
+### [Python](#tab/set-endpoints-python)
+
+1. In Visual Studio Code, paste the following placeholder into your Jupyter notebook.
+
+   ```Python
+   AZURE_SEARCH_SERVICE: str = "PUT YOUR SEARCH SERVICE URL HERE"
+   ```
+
+1. Replace `AZURE_SEARCH_SERVICE` with the endpoint you obtained in [Get connection information for Azure AI Search](#get-connection-information-for-azure-ai-search).
+
+1. Depending on your data source, add the following variables.
+
+   | Data source | Variables | Enter this information |
+   |--|--|--|
+   | Azure Blob Storage | `AZURE_STORAGE_CONNECTION` | The connection string you obtained in [Prepare your data](#prepare-your-data). |
+   | ADLS Gen2 | `AZURE_STORAGE_CONNECTION` | The connection string you obtained in [Prepare your data](#prepare-your-data). |
+   | OneLake | `XYZ` | ... |
+
+1. Depending on your embedding model provider, add the following variables.
+
+   | Embedding model provider | Variables | Enter this information |
+   |--|--|--|
+   | Azure OpenAI | `AZURE_OPENAI_ACCOUNT` and `AZURE_DEPLOYMENT_MODEL` | The endpoint, model name, and deployment name you obtained in [Prepare your embedding model](#prepare-your-embedding-model). |
+   | Azure AI Vision | `XYZ` | ... |
+   | Azure AI Foundry model catalog | `XYZ` | ... |
+
+---
+-->
+
+## Connect to your data
+
+In this section, you connect to a [supported data source](#supported-data-sources) for indexer-based indexing. An [indexer](search-indexer-overview.md) in Azure AI Search requires a data source that specifies the type, credentials, and container.
+
+<!--### [REST](#tab/connect-data-rest)-->
+
+1. Use [Create Data Source](/rest/api/searchservice/data-sources/create) to define a data source that provides connection information during indexing.
+
+   ```HTTP
+   ### Create a data source
+   POST {{baseUrl}}/datasources?api-version=2024-07-01  HTTP/1.1
+     Content-Type: application/json
+     Authorization: Bearer {{token}}
+
+     {
+       "name": "my-data-source",
+       "type": "azureblob",
+       "subtype": null,
+       "credentials": {
+           "connectionString": "{{storageConnectionString}}"
+       },
+       "container": {
+           "name": "{{blobContainer}}",
+           "query": null
+       },
+       "dataChangeDetectionPolicy": null,
+       "dataDeletionDetectionPolicy": null
+     }
+   ```
+
+1. Set `type` to your data source: `azureblob` or `adlsgen2`<!--`azureblob`, `adlsgen2`, or `onelake`-->.
+
+1. To create the data source, select **Send request**.
+
+<!--1. If you're using OneLake, set `credentials.connectionString` to `ResourceId={{workspaceId}}` and `container.name` to `{{lakehouseId}}`.-->
+
+<!--
+### [Python](#tab/connect-data-python)
+
+1. Define a data source that provides connection information during indexing.
+
+   ```Python
+   from azure.search.documents.indexes import SearchIndexerClient
+   from azure.search.documents.indexes.models import (
+       SearchIndexerDataContainer,
+       SearchIndexerDataSourceConnection
+   )
+
+   # Create a data source 
+   indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)
+   container = SearchIndexerDataContainer(name="PUT YOUR CONTAINER NAME OR LAKEHOUSE ID HERE")
+   data_source_connection = SearchIndexerDataSourceConnection(
+       name="mydatasource",
+       type="azureblob",
+       connection_string=AZURE_STORAGE_CONNECTION,
+       container=container
+   )
+   data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)
+
+   print(f"Data source '{data_source.name}' created or updated")
+   ```
+
+1. Set `type` to your data source: `azureblob`, `azureadlsgen2`, or `onelake`.
+
+1. If you're using OneLake, set `connection_string` to `XYZ` and...
+
+---
+-->
+
+## Create a skillset
+
+In this section, you create a [skillset](cognitive-search-working-with-skillsets.md) that calls a built-in skill to chunk your content and an embedding skill to create vector representations of the chunks. The skillset is executed during indexing in a [later section](#create-an-indexer).
+
+### Call a built-in skill to chunk your content
+
+Partitioning your content into chunks helps you meet the requirements of your embedding model and prevents data loss due to truncation. For more information about chunking, see [Chunk large documents for vector search solutions](vector-search-how-to-chunk-documents.md).
+
+For built-in data chunking, Azure AI Search offers the [Text Split skill](cognitive-search-skill-textsplit.md) and [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md). The Text Split skill breaks text into sentences or pages of a particular length, while the Document Layout skill breaks content based on paragraph boundaries.
+
+<!--### [REST](#tab/built-in-skill-rest)-->
+
+1. Use [Create Skillset](/rest/api/searchservice/skillsets/create) to define a skillset.
+
+   ```HTTP
+   ### Create a skillset
+   POST {{baseUrl}}/skillsets?api-version=2024-07-01  HTTP/1.1
+     Content-Type: application/json
+     Authorization: Bearer {{token}}
+
+     {
+       "name": "my-skillset",
+       "skills": []
+     }
+   ```
+
+1. In the `skills` array, call the Text Split skill or Document Layout skill. You can paste one of the following definitions.
+
+   ```HTTP
+       "skills": [
+        {
+          "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
+          "name": "my-text-split-skill",
+          "textSplitMode": "pages",
+          "maximumPageLength": 2000,
+          "pageOverlapLength": 500,
+          "maximumPagesToTake": 0,
+          "unit": "characters",
+          "defaultLanguageCode": "en",
+          "inputs": [
+           {
+             "name": "text",
+             "source": "/document/text",
+             "inputs": []
+           }
+          ],
+          "outputs": [
+           {
+             "name": "textItems"
+           }
+          ]
+        },
+        {
+          "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
+          "name": "my-document-layout-skill",
+          "context": "/document",
+          "outputMode": "oneToMany",
+          "markdownHeaderDepth": "h3",
+          "inputs": [
+           {
+             "name": "file_data",
+             "source": "/document/file_data"
+           }
+          ],
+          "outputs": [
+           {
+             "name": "markdown_document"
+           }
+          ]
+        }
+       ]
+   ```
+
+   > [!NOTE]
+   > The Document Layout skill is in public preview. If you want to call this skill, use a preview API, such as [`2025-03-01-preview`](/rest/api/searchservice/skillsets/create?view=rest-searchservice-2025-03-01-preview&preserve-view=true).
+
+<!--### [Python](#tab/built-in-skill-python)
+
+---
+-->
+
+### Call an embedding skill to vectorize the chunks
+
+To vectorize your chunked content, the skillset needs an embedding skill that points to a [supported embedding model](#supported-embedding-models).
+
+<!--### [REST](#tab/embedding-skill-rest)-->
+
+1. After the built-in chunking skill in the `skills` array, call the [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md)
+or [Azure AI Vision skill](cognitive-search-skill-vision-vectorize.md)<!--[Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md), [Azure AI Vision skill](cognitive-search-skill-vision-vectorize.md), or [AML skill](cognitive-search-aml-skill.md) (for the Azure AI Foundry model catalog)-->. You can paste one of the following definitions.
+
+   ```HTTP
+        {
+          "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
+          "resourceUri": "{{aoaiEndpoint}}",
+          "deploymentId": "{{aoaiDeploymentName}}",
+          "modelName": "{{aoaiModelName}}",
+          "dimensions": 1536,
+          "inputs": [
+            {
+              "name": "text",
+              "source": "/document/text"
+            }
+          ],
+          "outputs": [
+            {
+              "name": "embedding"
+            }
+          ]
+        },
+        {
+          "@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill",
+          "context": "/document",
+          "modelVersion": "2023-04-15", 
+          "inputs": [
+            {
+              "name": "url",
+              "source": "/document/metadata_storage_path"
+            },
+            {
+              "name": "queryString",
+              "source": "/document/metadata_storage_sas_token"
+            }
+          ],
+          "outputs": [
+            {
+              "name": "vector"
+            }
+          ]
+        }
+   ```
+
+   > [!NOTE]
+   > The Azure AI Vision skill is in public preview. If you want to call this skill, use a preview API, such as [`2025-03-01-preview`](/rest/api/searchservice/skillsets/create?view=rest-searchservice-2025-03-01-preview&preserve-view=true).
+
+1. If you're using the Azure OpenAI Embedding skill, set `dimensions` to the [number of embeddings generated by your embedding model](cognitive-search-skill-azure-openai-embedding.md#supported-dimensions-by-modelname).
+
+1. If you're using the Azure AI Vision skill, [attach your Azure AI multi-service resource](cognitive-search-attach-cognitive-services.md) after the `skills` array. This attachment is for billing purposes.
+
+   ```HTTP
+       "skills": [ ... ],
+       "cognitiveServices": {
+         "@odata.type": "#Microsoft.Azure.Search.AIServicesByIdentity",
+         "subdomainUrl": "{{aiMultiServiceEndpoint}}"
+        }
+   ```
+
+1. To create the skillset, select **Send request**.
+
+<!--
+   ```HTTP
+        {
+          "@odata.type": "#Microsoft.Skills.Custom.AmlSkill",
+          "resourceId": "{{modelEndpoint}}",
+          "context": "/document",
+          "inputs": [
+            {
+              "name": "text",
+              "source": "/document/text"
+            }
+          ],
+          "outputs": [
+            {
+              "name": "detected_language_code"
+            }
+          ]
+        }
+   ```
+
+The Azure AI Vision skill and AML skill (for indexer connections to the Azure AI Foundry model catalog) are in public preview. If you want to call these skills, use a preview API, such as [`2025-03-01-preview`](/rest/api/searchservice/skillsets/create?view=rest-searchservice-2025-03-01-preview&preserve-view=true).
+
+### [Python](#tab/embedding-skill-python)
+
+---
+-->
+
+## Create a vector index
+
+In this section, you set up physical data structures on your Azure AI Search service by creating a [vector index](vector-store.md). The schema of a vector index requires the following:
+
++ Name
++ Key field (string)
++ One or more vector fields
++ Vector configuration
+
+Vector fields store numerical representations of your chunked data. They must be searchable and retrievable, but they can't be filterable, facetable, or sortable. They also can't have analyzers, normalizers, or synonym map assignments.
+
+In addition to vector fields, the sample index in the following steps contains nonvector fields for human-readable content. It's common to include plain-text equivalents of the content you want to vectorize. For more information, see [Create a vector index](vector-search-how-to-create-index.md).
+
+<!--### [REST](#tab/vector-index-rest)-->
+
+1. Use [Create Index](/rest/api/searchservice/indexes/create) to define the schema of a vector index.
+
+   ```HTTP
+   ### Create a vector index
+   POST {{baseUrl}}/indexes?api-version=2024-07-01  HTTP/1.1
+     Content-Type: application/json
+     Authorization: Bearer {{token}}
+
+     {
+       "name": "my-vector-index",
+       "fields": [],
+       "vectorSearch": []
+     }
+   ```
+
+1. Add a [vector search configuration](vector-search-how-to-create-index.md#add-a-vector-search-configuration) to the `vectorSearch` section.
+
+   ```HTTP
+       "vectorSearch": {
+         "algorithms": [
+           {
+             "name": "hnsw-algorithm",
+             "kind": "hnsw",
+             "hnswParameters": {
+               "m": 4,
+               "efConstruction": 400,
+               "efSearch": 100,
+               "metric": "cosine"
+             }
+           }
+         ],
+         "profiles": [
+           {
+             "name": "vector-profile-hnsw",
+             "algorithm": "hnsw-algorithm",
+           }
+         ]
+       }
+   ```
+
+   `vectorSearch.algorithms` specifies the algorithm used for indexing and querying vector fields, while `vectorSearch.profiles` links the algorithm configuration to a profile you can assign to vector fields.
+
+1. Depending on your embedding model, update `vectorSearch.algorithms.metric`. [Valid values for distance metrics](/rest/api/searchservice/indexes/create-or-update#vectorsearchalgorithmmetric) are `cosine`, `dotproduct`, `euclidean`, and `hamming`.
+
+1. Add fields to the `fields` arrays. Include a key field for document identification, nonvector fields for human-readable content, and vector fields for embeddings.
+
+   ```HTTP
+       "fields": [
+         {
+           "name": "id",
+           "type": "Edm.String",
+           "key": true,
+           "filterable": true
+         },
+         {
+           "name": "title",
+           "type": "Edm.String",
+            "searchable": true,
+            "filterable": true,
+            "sortable": true,
+            "retrievable": true
+         },
+         {
+           "name": "titleVector",
+           "type": "Collection(Edm.Single)",
+            "searchable": true,
+            "retrievable": false,
+            "stored": true,
+            "dimensions": 1536,
+            "vectorSearchProfile": "vector-profile-hnsw"
+         },
+         {
+           "name": "content",
+           "type": "Edm.String",
+            "searchable": true,
+            "retrievable": true
+         },
+         {
+           "name": "contentVector",
+           "type": "Collection(Edm.Single)",
+            "searchable": true,
+            "retrievable": false,
+            "stored": false,
+            "dimensions": 1536,
+            "vectorSearchProfile": "vector-profile-hnsw"
+         }
+       ]
+   ```
+
+1. Depending on your embedding skill, set `dimensions` for each vector field to the following value.
+
+   | Embedding skill | Enter this value |
+   |--|--|
+   | Azure OpenAI | The [number of embeddings generated by your embedding model](cognitive-search-skill-azure-openai-embedding.md#supported-dimensions-by-modelname). |
+   | Azure AI Vision | `1024` |
+   <!--| AML | The [number of embeddings generated by your embedding model](vector-search-vectorizer-azure-machine-learning-ai-studio-catalog.md#expected-field-dimensions). |-->
+
+<!--### [Python](#tab/vector-index-python)
+
+---
+-->
+
+## Add a vectorizer to the index
+
+In this section, you enable vectorization at query time by [defining a vectorizer](vector-search-how-to-configure-vectorizer.md) in your index. The vectorizer uses the embedding model that indexes your data to decode a search string or image into a vector for vector search.
+
+<!--### [REST](#tab/vectorizer-rest)-->
+
+1. Add the [Azure OpenAI vectorizer](vector-search-vectorizer-azure-open-ai.md) or [Azure AI Vision vectorizer](vector-search-vectorizer-ai-services-vision.md)<!--[Azure OpenAI vectorizer](vector-search-vectorizer-azure-open-ai.md), [Azure AI Vision vectorizer](vector-search-vectorizer-ai-services-vision.md), or [Azure AI Foundry model catalog vectorizer](vector-search-vectorizer-azure-machine-learning-ai-studio-catalog.md)--> after `vectorSearch.profiles`. You can paste one of the following definitions.
+
+   ```HTTP
+         "profiles": [ ... ],
+         "vectorizers": [
+           {
+             "name": "my-openai-vectorizer",
+             "kind": "azureOpenAI",
+             "azureOpenAIParameters": {
+               "resourceUri": "{{aoaiEndpoint}}",
+               "deploymentId": "{{aoaiDeploymentName}}",
+               "modelName": "{{aoaiModelName}}"
+             }
+           },
+           {
+             "name": "my-ai-services-vision-vectorizer",
+             "kind": "aiServicesVision",
+             "aiServicesVisionParameters": {
+               "resourceUri": "{{aiMultiServiceEndpoint}}",
+               "modelVersion": "2023-04-15"
+             }
+           }
+         ]
+   ```
+
+   > [!NOTE]
+   > The Azure AI Vision vectorizer is in public preview. If you want to call this vectorizer, use a preview API, such as [`2025-03-01-preview`](/rest/api/searchservice/indexes/create?view=rest-searchservice-2025-03-01-preview&preserve-view=true).
+
+1. Specify your vectorizer in `vectorSearch.profiles`.
+
+   ```HTTP
+         "profiles": [
+           {
+             "name": "vector-profile-hnsw",
+             "algorithm": "hnsw-algorithm",
+             "vectorizer": "my-openai-vectorizer"
+           }
+         ]
+   ```
+
+1. To create the vector index, select **Send request**.
+
+<!--
+   ```HTTP
+        "vectorizers": [
+         {
+           "name": "my-model-catalog-vectorizer",
+           "kind": "aml",
+           "amlParameters": {
+               "uri": "{{aoaiEndpoint}}",
+               "modelName": "{{aiFoundryModelName}}",
+           }
+         }
+       ]
+   ```
+
+   > [!NOTE]
+   > The Azure AI Vision vectorizer and Azure AI Foundry model catalog vectorizer are in public preview. If you want to call these vectorizers, use a preview API, such as [`2025-03-01-preview`](/rest/api/searchservice/indexes/create?view=rest-searchservice-2025-03-01-preview&preserve-view=true).
+
+<!--### [Python](#tab/vectorizer-python)
+
+---
+-->
+
+## Create an indexer
+
+In this section, you create an [indexer](search-indexer-overview.md) to drive the entire vectorization pipeline, from data retrieval to skillset execution to indexing. We recommend that you [run the indexer on a schedule](search-howto-schedule-indexers.md) to process changes or documents that were missed due to throttling.
+
+<!--### [REST](#tab/indexer-rest)-->
+
+1. Use [Create Indexer](/rest/api/searchservice/indexers/create) to define an indexer that executes the vectorization pipeline.
+
+   ```HTTP
+   ### Create an indexer
+   POST {{baseUrl}}/indexers?api-version=2024-07-01  HTTP/1.1
+     Content-Type: application/json
+     Authorization: Bearer {{token}}
+
+     {
+       "name": "my-indexer",
+       "dataSourceName": "my-data-source",
+       "targetIndexName": "my-vector-index",
+       "skillsetName": "my-skillset",
+       "schedule": {
+         "interval": "PT2H"
+       },
+       "parameters": {
+         "batchSize": null,
+         "maxFailedItems": null,
+         "maxFailedItemsPerBatch": null
+       }
+     }
+   ```
+
+1. To create the indexer, select **Send request**.
+
+<!--### [Python](#tab/indexer-python)
+
+---
+-->
+
+## Run a vector query to confirm indexing
+
+In this section, you verify that your content was successfully indexed by [creating a vector query](vector-search-how-to-query.md). Because you configured a vectorizer in a [previous section](#add-a-vectorizer-to-the-index), the search engine can decode plain text or an image into a vector for query execution.
+
+<!--### [REST](#tab/vector-queries-rest)-->
+
+1. Use [Documents - Search Post](/rest/api/searchservice/documents/search-post) to define a query that's vectorized at query time.
+
+   ```HTTP
+   ### Run a vector query
+   POST {{baseUrl}}/indexes('my-vector-index')/docs/search.post.search?api-version=2024-07-01  HTTP/1.1
+     Content-Type: application/json
+     Authorization: Bearer {{token}}
+
+     {
+       "count": true,
+       "select": "title, content",
+       "vectorQueries": [
+           {
+             "kind": "text",
+             "text": "a sample text string for integrated vectorization",
+             "fields": "titleVector, contentVector",
+             "k": "3"
+           }
+       ]
+     }
+   ```
+
+   > [!NOTE]
+   > The Azure AI Vision vectorizer is in public preview. If you previously called this vectorizer, use a preview API, such as [`2025-03-01-preview`](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2025-03-01-preview&preserve-view=true).
+
+   For queries that invoke integrated vectorization, `kind` must be set to `text`, and `text` must specify a text string. This string is passed to the vectorizer assigned to the vector field. For more information, see [Query with integrated vectorization](vector-search-how-to-query.md#query-with-integrated-vectorization).
+
+1. To run the vector query, select **Send request**.
+
+<!--
+
+   > [!NOTE]
+   > The Azure AI Vision vectorizer and Azure AI Foundry model catalog vectorizer are in public preview. If you previously called these vectorizers, use a preview API, such as [`2025-03-01-preview`](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2025-03-01-preview&preserve-view=true).
+
+### [Python](#tab/vector-queries-python)
+
+---
+-->
+
+## Related content
+
++ [Integrated vectorization in Azure AI Search](vector-search-integrated-vectorization.md)
++ [Quickstart: Vectorize text and images in the Azure portal](search-get-started-portal-import-vectors.md)
++ [Python sample for integrated vectorization](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/integrated-vectorization/azure-search-integrated-vectorization-sample.ipynb)

Summary

{
    "modification_type": "new feature",
    "modification_title": "新しいドキュメントの追加: search-how-to-integrated-vectorization.md"
}

Explanation

この変更は、新しいMarkdownファイルsearch-how-to-integrated-vectorization.mdをarticles/search/ディレクトリに追加したことを示しています。このファイルには、Azure AI Searchにおける統合ベクトル化を使用する方法についての詳細な説明が含まれています。具体的には、REST APIを使用して、スキルセットを設定し、データソースからのコンテンツのチャンク化とベクトル化を行うプロセスについて解説しています。

ドキュメントは891行の内容で構成されており、主要なトピックには次のような内容が含まれます：

統合ベクトル化の設定方法や前提条件
サポートされているデータソースおよび埋め込みモデルの説明
ロールベースのアクセス制御や接続情報の確認方法
データの準備とスキルセットの作成方法
ベクトルインデックスの作成方法およびクエリの確認方法

この新しいドキュメントは、ユーザーが統合ベクトル化を実装する際の参考資料となります。詳細は[こちらのリンク]から入手可能です。

articles/search/search-indexer-overview.md

Diff

@@ -10,7 +10,7 @@ ms.service: azure-ai-search
 ms.custom:
   - ignite-2023
 ms.topic: conceptual
-ms.date: 04/09/2025
+ms.date: 04/18/2025
 ---
 
 # Indexers in Azure AI Search
@@ -62,6 +62,7 @@ Indexers crawl data stores on Azure and outside of Azure.
 + [SharePoint in Microsoft 365](search-howto-index-sharepoint-online.md) (in preview)
 + [Azure Cosmos DB for MongoDB](search-howto-index-cosmosdb-mongodb.md) (in preview)
 + [Azure Cosmos DB for Apache Gremlin](search-howto-index-cosmosdb-gremlin.md) (in preview)
++ [OneLake](search-how-to-index-onelake-files.md) (in preview)
 
 Azure Cosmos DB for Cassandra is not supported.

Summary

{
    "modification_type": "minor update",
    "modification_title": "検索インデクサーの概要の更新"
}

Explanation

この変更は、search-indexer-overview.mdファイルの修正を示しており、主にメタデータと内容の一部が更新されています。具体的には、以下の変更が行われました：

日付の変更: ms.dateフィールドが04/09/2025から04/18/2025に更新され、新しい日付が反映されました。これは、ドキュメントの最新性を保つための重要な修正です。
新しい項目の追加: インデクサーがサポートするデータソースリストに、OneLakeが追加されました。これにより、ユーザーはOneLakeに対してもインデクシングを行うことができるようになります。
内容の微調整: 追加された新しい項目は、OneLakeに関するものであり、「プレビュー中」との注記が付けられています。この変更は、ドキュメントの正確さと包括性を向上させるものです。

このファイルは、Azure AI Search内でのインデクサーの機能と利用可能なデータソースについての概観を提供しており、ユーザーにとって重要な情報を含んでいます。更新された内容の詳細は[こちらのリンク]から確認できます。

articles/search/toc.yml

Diff

@@ -347,6 +347,8 @@ items:
       href: search-how-to-semantic-chunking.md
     - name: Generate embeddings
       href: vector-search-how-to-generate-embeddings.md
+    - name: Use integrated vectorization
+      href: search-how-to-integrated-vectorization.md
     - name: Use embedding models from Azure AI Foundry
       href: vector-search-integrated-vectorization-ai-studio.md
     - name: Reduce vector size

Summary

{
    "modification_type": "minor update",
    "modification_title": "トピックの目次に「統合ベクトル化の使用」を追加"
}

Explanation

この変更は、toc.ymlファイルに対して行われた修正を示しています。具体的には、以下の内容が追加されました：

新しい項目の追加: 「Use integrated vectorization（統合ベクトル化の使用）」という新しい項目が目次に追加されました。この項目は、search-how-to-integrated-vectorization.mdというドキュメントへのリンクを提供しています。この更新により、ユーザーが統合ベクトル化の利用方法に関する情報を簡単に見つけられるようになります。
全体の構造の改善: 新しい項目が追加されたことにより、目次がさらに充実し、関連する情報へのナビゲーションが容易になります。これにより、ユーザー体験が向上し、必要な情報に迅速にアクセスできるようになります。

更新された目次の詳細は[こちらのリンク]から確認できます。

Diff Insight Report - search

Highlights

新しい機能

破壊的な変更

その他の更新

Insights

Summary Table

Modified Contents

articles/search/media/search-how-to-integrated-vectorization/data-lake-storage-account.png

Summary

Explanation

articles/search/search-how-to-integrated-vectorization.md

Summary

Explanation

articles/search/search-indexer-overview.md

Summary

Explanation

articles/search/toc.yml

Summary

Explanation