@@ -1,7 +1,7 @@
---
title: Document layout analysis - Document Intelligence
titleSuffix: Azure AI services
-description: Extract text, tables, selections, titles, section headings, page headers, page footers, and more with layout analysis model from Document Intelligence.
+description: Extract text, tables, selections, titles, section headings, page headers, page footers, and more with the layout analysis model from Document Intelligence.
author: laujan
manager: nitinme
ms.service: azure-ai-document-intelligence
@@ -13,131 +13,122 @@ ms.author: lajanuar
<!-- markdownlint-disable MD051 -->
<!-- markdownlint-disable MD024 -->
-# What is Document Intelligence layout model?
+# What is the Document Intelligence layout model?
<!---------------------- v4.0 content ---------------------->
:::moniker range="doc-intel-4.0.0"
[!INCLUDE [applies to v4.0](../includes/applies-to-v40.md)]
-Document Intelligence layout model is an advanced machine-learning based document analysis API available in the Document Intelligence cloud. It enables you to take documents in various formats and return structured data representations of the documents. It combines an enhanced version of our powerful [Optical Character Recognition (OCR)](../../../ai-services/computer-vision/overview-ocr.md) capabilities with deep learning models to extract text, tables, selection marks, and document structure.
+The Azure AI Document Intelligence layout model is an advanced document-analysis API based on machine learning. The model is available in the Document Intelligence cloud. You can use it to take documents in various formats and return structured data representations of the documents. The model combines an enhanced version of the powerful [optical character recognition (OCR)](../../../ai-services/computer-vision/overview-ocr.md) capabilities with deep learning models to extract text, tables, selection marks, and document structure.
## Document structure layout analysis
-Document structure layout analysis is the process of analyzing a document to extract regions of interest and their inter-relationships. The goal is to extract text and structural elements from the page to build better semantic understanding models. There are two types of roles in a document layout:
+Document structure layout analysis is the process of analyzing a document to extract regions of interest and their interrelationships. The goal is to extract text and structural elements from the page to build better semantic understanding models. There are two types of roles in a document layout:
* **Geometric roles**: Text, tables, figures, and selection marks are examples of geometric roles.
* **Logical roles**: Titles, headings, and footers are examples of logical roles of texts.
The following illustration shows the typical components in an image of a sample page.
-:::image type="content" source="../media/document-layout-example-new.png" alt-text="Illustration of document layout example.":::
+:::image type="content" source="../media/document-layout-example-new.png" alt-text="Illustration that shows a document layout example.":::
## Development options
-Document Intelligence v4.0: **2024-11-30** (GA) supports the following tools, applications, and libraries:
+Document Intelligence v4.0: 2024-11-30 (GA) supports the following tools, applications, and libraries.
| Feature | Resources | Model ID |
|----------|-------------|-----------|
-|**Layout model**|• [**Document Intelligence Studio**](https://documentintelligence.ai.azure.com)</br>• [**REST API**](/rest/api/aiservices/operation-groups?view=rest-aiservices-v4.0%20(2024-11-30)&preserve-view=true)</br>• [**C# SDK**](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true)</br>• [**Python SDK**](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true)</br>• [**Java SDK**](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true)</br>• [**JavaScript SDK**](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true)|**prebuilt-layout**|
+|Layout model|• [Document Intelligence Studio](https://documentintelligence.ai.azure.com)</br>• [REST API](/rest/api/aiservices/operation-groups?view=rest-aiservices-v4.0%20(2024-11-30)&preserve-view=true)</br>• [C# SDK](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true)</br>• [Python SDK](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true)</br>• [Java SDK](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true)</br>• [JavaScript SDK](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true)|`prebuilt-layout`|
## Supported languages
-See [Language Support—document analysis models](../language-support/ocr.md) for a complete list of supported languages.
+For a complete list of supported languages, see [Language support: Document analysis models](../language-support/ocr.md).
## Supported file types
-Document Intelligence v4.0: **2024-11-30** (GA) layout model supports the following file formats:
+Document Intelligence v4.0: 2024-11-30 (GA) layout model supports the following file formats:
-|Model | PDF |Image: </br>`JPEG/JPG`, `PNG`, `BMP`, `TIFF`, `HEIF` | Microsoft Office: </br> Word (`DOCX`), Excel (`XLSX`), PowerPoint (`PPTX`), HTML|
+|Model | PDF |Image: </br>JPEG/JPG, PNG, BMP, TIFF, HEIF | Office: </br> Word (DOCX), Excel (XLS), PowerPoint (PPTX), HTML|
|--------|:----:|:-----:|:---------------:|
|Layout | ✔ | ✔ | ✔ |
## Input requirements
-* For best results, provide one clear photo or high-quality scan per document.
+* **Photos and scans**: For best results, provide one clear photo or high-quality scan per document.
+* **PDFs and TIFFs**: For PDFs and TIFFs, up to 2,000 pages can be processed. (With a free-tier subscription, only the first two pages are processed.)
+* **Password locks**: If your PDFs are password-locked, you must remove the lock before submission.
+* **File size**: The file size for analyzing documents is 500 MB for the paid (S0) tier and 4 MB for the free (F0) tier.
+* **Image dimensions**: The image dimensions must be between 50 pixels x 50 pixels and 10,000 pixels x 10,000 pixels.
+* **Text height**: The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about 8-point text at 150 dots per inch.
+* **Custom model training**: The maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
+* **Custom extraction model training**: The total size of training data is 50 MB for the template model and 1 GB for the neural model.
+* **Custom classification model training**: The total size of training data is 1 GB with a maximum of 10,000 pages. For 2024-11-30 (GA), the total size of training data is 2 GB with a maximum of 10,000 pages.
+* **Office file types (DOCX, XLSX, PPTX)**: The maximum string length limit is 8 million characters.
-* For PDF and TIFF, up to 2,000 pages can be processed (with a free tier subscription, only the first two pages are processed).
+For more information on model usage, quotas, and service limits, see [Service limits](../service-limits.md).
-* If your PDFs are password-locked, you must remove the lock before submission.
+### Get started with the layout model
-* The file size for analyzing documents is 500 MB for paid (S0) tier and `4` MB for free (F0) tier.
+See how data, including text, tables, table headers, selection marks, and structure information, is extracted from documents by using Document Intelligence. You need the following resources:
-* Image dimensions must be between 50 pixels x 50 pixels and 10,000 pixels x 10,000 pixels.
+* An Azure subscription. You can [create one for free](https://azure.microsoft.com/free/cognitive-services/).
+* A [Document Intelligence instance](https://portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer) in the Azure portal. You can use the free pricing tier (F0) to try the service. After your resource deploys, select **Go to resource** to get your key and endpoint.
-* The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about `8` point text at 150 dots per inch (DPI).
+ :::image type="content" source="../media/containers/keys-and-endpoint.png" alt-text="Screenshot that shows the keys and endpoint location in the Azure portal.":::
-* For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
-
- * For custom extraction model training, the total size of training data is 50 MB for template model and `1` GB for the neural model.
-
- * For custom classification model training, the total size of training data is `1` GB with a maximum of 10,000 pages. For `2024-11-30` (GA), the total size of training data is `2` GB with a maximum of 10,000 pages.
-
-For more information on model usage, quotas, and service limits, *see* [service limits](../service-limits.md).
-
-### Get started with Layout model
-
-See how data, including text, tables, table headers, selection marks, and structure information is extracted from documents using Document Intelligence. You need the following resources:
-
-* An Azure subscription—you can [create one for free](https://azure.microsoft.com/free/cognitive-services/).
-
-* A [Document Intelligence instance](https://portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer) in the Azure portal. You can use the free pricing tier (`F0`) to try the service. After your resource deploys, select **Go to resource** to get your key and endpoint.
-
- :::image type="content" source="../media/containers/keys-and-endpoint.png" alt-text="Screenshot of keys and endpoint location in the Azure portal.":::
-
-After you retrieve your key and endpoint, use the following development options to build and deploy your Document Intelligence applications:
+After you retrieve your key and endpoint, use the following development options to build and deploy your Document Intelligence applications.
### [REST API](#tab/rest)
* [Document Intelligence REST API](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-v4.0%20(2024-11-30)&preserve-view=true&tabs=HTTP&)
-* [How to guide](../how-to-guides/use-sdk-rest-api.md#use-document-intelligence-models)
+* [Get started: Document Intelligence Studio](../how-to-guides/use-sdk-rest-api.md#use-document-intelligence-models)
# [Client libraries](#tab/sdks)
-* [**C# SDK**](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true#layout-model)
-* [**Python SDK**](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true#layout-model)
-* [**Java SDK**](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true#layout-model)
-* [**JavaScript**](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true#layout-model)
+* [C# SDK](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true#layout-model)
+* [Python SDK](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true#layout-model)
+* [Java SDK](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true#layout-model)
+* [JavaScript](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-4.0.0&preserve-view=true#layout-model)
### [Document Intelligence Studio](#tab/studio)
-* [Studio](https://documentintelligence.ai.azure.com/studio)
-* [How-to guide](../quickstarts/get-started-studio.md#authentication-in-studio)
+* [Document Intelligence Studio](https://documentintelligence.ai.azure.com/studio)
+* [Get started: Document Intelligence Studio](../quickstarts/get-started-studio.md#authentication-in-document-intelligence-studio)
---
## Data extraction
-The layout model extracts structural elements from your documents. To follow are descriptions of these structural elements with guidance on how to extract them from your document input:
-
-* [**Pages**](#pages)
-* [**Paragraphs**](#paragraphs)
-* [**Text, lines, and words**](#text-lines-and-words)
-* [**Selection marks**](#selection-marks)
-* [**Tables**](#tables)
-* [**Output response to markdown**](#output-response-to-markdown-format)
-* [**Figures**](#figures)
-* [**Sections**](#sections)
+The layout model extracts structural elements from your documents. The following structural elements are described in the remainder of this article along with guidance on how to extract them from your document input:
-Run the sample layout document analysis within [Document Intelligence Studio](https://documentintelligence.ai.azure.com/studio), then navigate to the results tab and access the full JSON output.
+* [Pages](#pages)
+* [Paragraphs](#paragraphs)
+* [Text, lines, and words](#text-lines-and-words)
+* [Selection marks](#selection-marks)
+* [Tables](#tables)
+* [Output response to markdown](#output-response-to-markdown-format)
+* [Figures](#figures)
+* [Sections](#sections)
- :::image type="content" source="../media/studio/json-output-tab.png" alt-text="Screenshot of results JSON output tab in the Document Intelligence Studio.":::
+Run the sample layout document analysis within [Document Intelligence Studio](https://documentintelligence.ai.azure.com/studio). Then go to the results tab and access the full JSON output.
+ :::image type="content" source="../media/studio/json-output-tab.png" alt-text="Screenshot that shows results on the JSON output tab in Document Intelligence Studio.":::
### Pages
-The pages collection is a list of pages within the document. Each page is represented sequentially within the document and includes the orientation angle indicating if the page is rotated and the width and height (dimensions in pixels). The page units in the model output are computed as shown:
+The `pages` collection is a list of pages within the document. Each page is represented sequentially within the document and includes the orientation angle, which indicates if the page is rotated, and the width and height (dimensions in pixels). The page units in the model output are computed as shown in the following table.
-| **File format** | **Computed page unit** | **Total pages** |
+| File format | Computed page unit | Total pages |
| --- | --- | --- |
-|Images (JPEG/JPG, PNG, BMP, HEIF) | Each image = 1 page unit | Total images |
-|PDF | Each page in the PDF = 1 page unit | Total pages in the PDF |
-|TIFF | Each image in the TIFF = 1 page unit | Total images in the TIFF |
-|Word (DOCX) | Up to 3,000 characters = 1 page unit, embedded or linked images not supported | Total pages of up to 3,000 characters each |
-|Excel (XLSX) | Each worksheet = 1 page unit, embedded or linked images not supported | Total worksheets |
-|PowerPoint (PPTX) | Each slide = 1 page unit, embedded or linked images not supported | Total slides |
-|HTML | Up to 3,000 characters = 1 page unit, embedded or linked images not supported | Total pages of up to 3,000 characters each |
+|Images (JPEG/JPG, PNG, BMP, HEIF) | Each image = 1 page unit. | Total images |
+|PDF | Each page in the PDF = 1 page unit. | Total pages in the PDF |
+|TIFF | Each image in the TIFF = 1 page unit. | Total images in the TIFF |
+|Word (DOCX) | Up to 3,000 characters = 1 page unit. Embedded or linked images aren't supported. | Total pages of up to 3,000 characters each |
+|Excel (XLSX) | Each worksheet = 1 page unit. Embedded or linked images aren't supported. | Total worksheets |
+|PowerPoint (PPTX) | Each slide = 1 page unit. Embedded or linked images aren't supported. | Total slides |
+|HTML | Up to 3,000 characters = 1 page unit. Embedded or linked images aren't supported. | Total pages of up to 3,000 characters each |
#### [Sample code](#tab/sample-code)
@@ -173,11 +164,11 @@ print(f"Page has width: {page.width} and height: {page.height}, measured with un
#### Extract selected pages
-For large multi-page documents, use the `pages` query parameter to indicate specific page numbers or page ranges for text extraction.
+For large multipage documents, use the `pages` query parameter to indicate specific page numbers or page ranges for text extraction.
### Paragraphs
-The Layout model extracts all identified blocks of text in the `paragraphs` collection as a top level object under `analyzeResults`. Each entry in this collection represents a text block and ../includes the extracted text as`content`and the bounding `polygon` coordinates. The `span` information points to the text fragment within the top level `content` property that contains the full text from the document.
+The layout model extracts all identified blocks of text in the `paragraphs` collection as a top-level object under `analyzeResults`. Each entry in this collection represents a text block and includes the extracted text as `content` and the bounding `polygon` coordinates. The `spans` information points to the text fragment within the top-level `content` property that contains the full text from the document.
```json
@@ -192,16 +183,18 @@ The Layout model extracts all identified blocks of text in the `paragraphs` coll
#### Paragraph roles
-The new machine-learning based page object detection extracts logical roles like titles, section headings, page headers, page footers, and more. The Document Intelligence Layout model assigns certain text blocks in the `paragraphs` collection with their specialized role or type predicted by the model. It's best to use paragraph roles with unstructured documents to help understand the layout of the extracted content for a richer semantic analysis. The following paragraph roles are supported:
+The new page object detection based on machine learning extracts logical roles like titles, section headings, page headers, page footers, and more. The Document Intelligence layout model assigns certain text blocks in the `paragraphs` collection with their specialized role or type predicted by the model.
-| **Predicted role** | **Description** | **Supported file types** |
+It's best to use paragraph roles with unstructured documents to help understand the layout of the extracted content for a richer semantic analysis. The following paragraph roles are supported.
+
+| Predicted role | Description | Supported file types |
| --- | --- | --- |
-| `title` | The main headings in the page | pdf, image, docx, pptx, xlsx, html |
-| `sectionHeading` | One or more subheadings on the page | pdf, image, docx, xlsx, html |
-| `footnote` | Text near the bottom of the page | pdf, image |
-| `pageHeader` | Text near the top edge of the page | pdf, image, docx |
-| `pageFooter` | Text near the bottom edge of the page | pdf, image, docx, pptx, html |
-| `pageNumber` | Page number | pdf, image |
+| `title` | The main headings on the page | PDF, Image, DOCX, PPTX, XLSX, HTML |
+| `sectionHeading` | One or more subheadings on the page | PDF, Image, DOCX, XLSX, HTML |
+| `footnote` | Text near the bottom of the page | PDF, Image |
+| `pageHeader` | Text near the top edge of the page | PDF, Image, DOCX |
+| `pageFooter` | Text near the bottom edge of the page | PDF, Image, DOCX, PPTX, HTML |
+| `pageNumber` | Page number | PDF, Image |
```json
{
@@ -225,9 +218,9 @@ The new machine-learning based page object detection extracts logical roles like
### Text, lines, and words
-The document layout model in Document Intelligence extracts print and handwritten style text as `lines` and `words`. The `styles` collection includes any handwritten style for lines if detected along with the spans pointing to the associated text. This feature applies to [supported handwritten languages](../language-support/prebuilt.md).
+The document layout model in Document Intelligence extracts print and handwritten-style text as `lines` and `words`. The `styles` collection includes any handwritten style for lines, if detected, along with the spans that point to the associated text. This feature applies to [supported handwritten languages](../language-support/prebuilt.md).
-For Microsoft Word, Excel, PowerPoint, and HTML, Document Intelligence v4.0 `2024-11-30` (GA) Layout model extract all embedded text as is. Texts are extracted as words and paragraphs. Embedded images aren't supported.
+For Microsoft Word, Excel, PowerPoint, and HTML, the Document Intelligence v4.0 2024-11-30 (GA) layout model extracts all embedded text as is. Texts are extracted as words and paragraphs. Embedded images aren't supported.
#### [Sample code](#tab/sample-code)
@@ -274,7 +267,7 @@ if page.lines:
#### Handwritten style for text lines
-The response ../includes classifying whether each text line is of handwriting style or not, along with a confidence score. For more information. See [Handwritten language support](../language-support/ocr.md). The following example shows an example JSON snippet.
+The response includes whether each text line is in a handwritten style or not, along with a confidence score. For more information, see [Handwritten language support](../language-support/ocr.md). The following example shows an example JSON snippet.
```json
"styles": [
@@ -290,11 +283,11 @@ The response ../includes classifying whether each text line is of handwriting st
}
```
-If you enable the [font/style addon capability](../concept-add-on-capabilities.md#font-property-extraction), you also get the font/style result as part of the `styles` object.
+If you enable the [font/style add-on capability](../concept-add-on-capabilities.md#font-property-extraction), you also get the font/style result as part of the `styles` object.
### Selection marks
-The Layout model also extracts selection marks from documents. Extracted selection marks appear within the `pages` collection for each page. They include the bounding `polygon`, `confidence`, and selection `state` (`selected/unselected`). The text representation (that is, `:selected:` and `:unselected`) is also included as the starting index (`offset`) and `length` that references the top level `content` property that contains the full text from the document.
+The layout model also extracts selection marks from documents. Extracted selection marks appear within the `pages` collection for each page. They include the bounding `polygon`, `confidence`, and selection `state` (`selected/unselected`). The text representation (that is, `:selected:` and `:unselected`) is also included as the starting index (`offset`) and `length` that references the top-level `content` property that contains the full text from the document.
#### [Sample code](#tab/sample-code)
@@ -334,22 +327,20 @@ if page.selection_marks:
### Tables
-Extracting tables is a key requirement for processing documents containing large volumes of data typically formatted as tables. The Layout model extracts tables in the `pageResults` section of the JSON output. Extracted table information ../includes the number of columns and rows, row span, and column span. Each cell with its bounding polygon is output along with information whether the area is recognized as a `columnHeader` or not. The model supports extracting tables that are rotated. Each table cell contains the row and column index and bounding polygon coordinates. For the cell text, the model outputs the `span` information containing the starting index (`offset`). The model also outputs the `length` within the top-level content that contains the full text from the document.
+Extracting tables is a key requirement for processing documents that contain large volumes of data typically formatted as tables. The layout model extracts tables in the `pageResults` section of the JSON output. Extracted table information includes the number of columns and rows, row span, and column span.
-Here are a few factors to consider when using the Document Intelligence bale extraction capability:
+Each cell with its bounding polygon is output along with information whether the area is recognized as `columnHeader` or not. The model supports extracting tables that are rotated. Each table cell contains the row and column index and bounding polygon coordinates. For the cell text, the model outputs the `span` information that contains the starting index (`offset`). The model also outputs the `length` within the top-level content that contains the full text from the document.
-* Is the data that you want to extract presented as a table, and is the table structure meaningful?
+Here are a few factors to consider when you use the Document Intelligence bale extraction capability:
+* Is the data that you want to extract presented as a table, and is the table structure meaningful?
* Can the data fit in a two-dimensional grid if the data isn't in a table format?
-
-* Do your tables span multiple pages? If so, to avoid having to label all the pages, split the PDF into pages before sending it to Document Intelligence. After the analysis, post-process the pages to a single table.
-
-* Refer to [Tabular fields](../train/custom-labels.md#tabular-fields) if you're creating custom models. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
+* Do your tables span multiple pages? If so, to avoid having to label all the pages, split the PDF into pages before you send it to Document Intelligence. After the analysis, post-process the pages to a single table.
+* See [Tabular fields](../train/custom-labels.md#tabular-fields) if you create custom models. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
> [!NOTE]
>
-> * Table analysis isn't supported if the input file is XLSX.
-> * For `2024-11-30` (GA), the bounding regions for figures and tables cover only the core content and exclude associated caption and footnotes.
+> Table analysis isn't supported if the input file is XLSX. For 2024-11-30 (GA), the bounding regions for figures and tables cover only the core content and exclude the associated caption and footnotes.
#### [Sample code](#tab/sample-code)
@@ -399,13 +390,13 @@ if result.tables:
---
-### Output response to markdown format
+### Output response to Markdown format
-The Layout API can output the extracted text in markdown format. Use the `outputContentFormat=markdown` to specify the output format in markdown. The markdown content is output as part of the `content` section.
+The layout API can output the extracted text in Markdown format. Use the `outputContentFormat=markdown` to specify the output format in Markdown. The Markdown content is output as part of the `content` section.
> [!NOTE]
->
-> For v4.0 `2024-11-30` (GA), the representation of tables is changed to HTML tables to enable rendering of merged cells, multi-row headers, etc. Another related change is to use Unicode checkbox characters ☒ and ☐ for selection marks instead of `:selected:` and `:unselected:`. This update means that the content of selection mark fields contains `:selected:` even though their spans refer to Unicode characters in the top-level span. Refer to the [Markdown Output Format](../concept/markdown-elements.md) for full definition of Markdown elements.
+>
+> For v4.0 2024-11-30 (GA), the representation of tables is changed to HTML tables to enable rendering of items like merged cells and multirow headers. Another related change is to use the Unicode checkbox characters ☒ and ☐ for selection marks instead of `:selected:` and `:unselected:`. This update means that the content of selection-mark fields contains `:selected:` even though their spans refer to Unicode characters in the top-level span. For a full definition of Markdown elements, see [Markdown output format](../concept/markdown-elements.md).
#### [Sample code](#tab/sample-code)
@@ -472,13 +463,16 @@ PageFooter="1 | Page"
### Figures
-Figures (charts, images) in documents play a crucial role in complementing and enhancing the textual content, providing visual representations that aid in the understanding of complex information. The figures object detected by the Layout model has key properties like `boundingRegions` (the spatial locations of the figure on the document pages, including the page number and the polygon coordinates that outline the figure's boundary), `spans` (details the text spans related to the figure, specifying their offsets and lengths within the document's text. This connection helps in associating the figure with its relevant textual context), `elements` (the identifiers for text elements or paragraphs within the document that are related to or describe the figure) and `caption` if there's any.
+Figures (charts and images) in documents play a crucial role in complementing and enhancing the textual content. They provide visual representations that aid in the understanding of complex information. The `figures` object detected by the layout model has key properties like:
-When *output=figures* is specified during the initial analyze operation, the service generates cropped images for all detected figures that can be accessed via `/analyeResults/{resultId}/figures/{figureId}`.
-`FigureId` is included in each figure object, following an undocumented convention of `{pageNumber}.{figureIndex}` where `figureIndex` resets to one per page.
+- `boundingRegions`: The spatial locations of the figure on the document pages, including the page number and the polygon coordinates that outline the figure's boundary.
+- `spans`: The text spans related to the figure that specify their offsets and lengths within the document's text. This connection helps in associating the figure with its relevant textual context.
+- `elements`: The identifiers for text elements or paragraphs within the document that are related to or describe the figure.
+- `caption`: The description if there is one.
-> [!NOTE]
-> For v4.0 `2024-11-30` (GA), the bounding regions for figures and tables cover only the core content and exclude associated caption and footnotes.
+When `output=figures` is specified during the initial analyze operation, the service generates cropped images for all detected figures that can be accessed via `/analyeResults/{resultId}/figures/{figureId}`. The `FigureId` value is the ID included in each figure object, following an undocumented convention of `{pageNumber}.{figureIndex}` where `figureIndex` resets to one per page.
+
+For v4.0 2024-11-30 (GA), the bounding regions for figures and tables cover only the core content and exclude the associated caption and footnotes.
#### [Sample code](#tab/sample-code)
@@ -526,7 +520,9 @@ if result.figures:
### Sections
-Hierarchical document structure analysis is pivotal in organizing, comprehending, and processing extensive documents. This approach is vital for semantically segmenting long documents to boost comprehension, facilitate navigation, and improve information retrieval. The advent of [retrieval-augmented generation (RAG)](../concept/retrieval-augmented-generation.md) in document generative AI underscores the significance of hierarchical document structure analysis. The Layout model supports sections and subsections in the output, which identifies the relationship of sections and object within each section. The hierarchical structure is maintained in `elements` of each section. You can use [output response to markdown format](#output-response-to-markdown-format) to easily get the sections and subsections in markdown.
+Hierarchical document structure analysis is pivotal in organizing, comprehending, and processing extensive documents. This approach is vital for semantically segmenting long documents to boost comprehension, facilitate navigation, and improve information retrieval. The advent of [retrieval-augmented generation (RAG)](../concept/retrieval-augmented-generation.md) in document-generative AI underscores the significance of hierarchical document structure analysis.
+
+The layout model supports sections and subsections in the output, which identifies the relationship of sections and objects within each section. The hierarchical structure is maintained in `elements` for each section. You can use the [output response to Markdown format](#output-response-to-markdown-format) to easily get the sections and subsections in Markdown.
#### [Sample code](#tab/sample-code)
@@ -561,7 +557,7 @@ poller = document_intelligence_client.begin_analyze_document(
}
```
-:::image type="content" source="../media/document-layout-example-section.png" alt-text="Screenshot of examples of document sections.":::
+:::image type="content" source="../media/document-layout-example-section.png" alt-text="Screenshot that shows examples of document sections.":::
---
@@ -583,32 +579,32 @@ poller = document_intelligence_client.begin_analyze_document(
:::moniker range="<=doc-intel-3.1.0"
-Document Intelligence layout model is an advanced machine-learning based document analysis API available in the Document Intelligence cloud. It enables you to take documents in various formats and return structured data representations of the documents. It combines an enhanced version of our powerful [Optical Character Recognition (OCR)](../../../ai-services/computer-vision/overview-ocr.md) capabilities with deep learning models to extract text, tables, selection marks, and document structure.
+The Document Intelligence layout model is an advanced document-analysis API. The model is based on machine learning and is available in the Document Intelligence cloud. You can use it to take documents in various formats and return structured data representations of the documents. It combines an enhanced version of the powerful [OCR](../../../ai-services/computer-vision/overview-ocr.md) capabilities with deep learning models. You can use it to extract text, tables, selection marks, and document structure.
## Document layout analysis
-Document structure layout analysis is the process of analyzing a document to extract regions of interest and their inter-relationships. The goal is to extract text and structural elements from the page to build better semantic understanding models. There are two types of roles in a document layout:
+Document structure layout analysis is the process of analyzing a document to extract regions of interest and their interrelationships. The goal is to extract text and structural elements from the page to build better semantic understanding models. There are two types of roles in a document layout:
* **Geometric roles**: Text, tables, figures, and selection marks are examples of geometric roles.
* **Logical roles**: Titles, headings, and footers are examples of logical roles of texts.
The following illustration shows the typical components in an image of a sample page.
-:::image type="content" source="../media/document-layout-example-new.png" alt-text="Illustration of document layout example.":::
+:::image type="content" source="../media/document-layout-example-new.png" alt-text="Illustration that shows a document layout example.":::
## Supported languages and locales
-*See* our [Language Support—document analysis models](../language-support/ocr.md) page for a complete list of supported languages.
+For a complete list of supported languages, see [Language support: Document analysis models](../language-support/ocr.md).
:::moniker-end
:::moniker range="doc-intel-2.1.0"
-Document Intelligence v2.1 supports the following tools, applications, and libraries:
+Document Intelligence v2.1 supports the following tools, applications, and libraries.
| Feature | Resources |
|----------|-------------------------|
-|**Layout model**|• [**Document Intelligence labeling tool**](https://fott-2-1.azurewebsites.net/prebuilts-analyze)</br>• [**REST API**](../how-to-guides/use-sdk-rest-api.md?pivots=programming-language-rest-api&view=doc-intel-2.1.0&preserve-view=true&tabs=windows)</br>• [**Client-library SDK**](../how-to-guides/use-sdk-rest-api.md?view=doc-intel-2.1.0&preserve-view=true)</br>• [**Document Intelligence Docker container**](../containers/install-run.md?tabs=id-document#run-the-container-with-the-docker-compose-up-command)|
+|Layout model|• [Document Intelligence labeling tool](https://fott-2-1.azurewebsites.net/prebuilts-analyze)</br>• [REST API](../how-to-guides/use-sdk-rest-api.md?pivots=programming-language-rest-api&view=doc-intel-2.1.0&preserve-view=true&tabs=windows)</br>• [Client-library SDK](../how-to-guides/use-sdk-rest-api.md?view=doc-intel-2.1.0&preserve-view=true)</br>• [Document Intelligence Docker container](../containers/install-run.md?tabs=id-document#run-the-container-with-the-docker-compose-up-command)|
:::moniker-end
@@ -618,83 +614,74 @@ Document Intelligence v2.1 supports the following tools, applications, and libra
Supported file formats:
-|Model | PDF |Image: </br>`JPEG/JPG`, `PNG`, `BMP`, `TIFF`, `HEIF` | Microsoft Office: </br> Word (`DOCX`), Excel (`XLSX`), PowerPoint (`PPTX`), HTML|
+|Model | PDF |Image: </br>JPEG/JPG, PNG, BMP, TIFF, HEIF | Office: </br> Word (DOCX), Excel (XLSX), PowerPoint (PPTX), HTML|
|--------|:----:|:-----:|:---------------:|
|Read | ✔ | ✔ | ✔ |
|Layout | ✔ | ✔ | |
-|General Document| ✔ | ✔ | |
+|General document| ✔ | ✔ | |
|Prebuilt | ✔ | ✔ | |
|Custom extraction | ✔ | ✔ | |
|Custom classification | ✔ | ✔ | ✔ |
-* For best results, provide one clear photo or high-quality scan per document.
-
-* For PDF and TIFF, up to 2,000 pages can be processed (with a free tier subscription, only the first two pages are processed).
-
-* The file size for analyzing documents is 500 MB for paid (S0) tier and `4` MB for free (F0) tier.
-
-* Image dimensions must be between 50 pixels x 50 pixels and 10,000 pixels x 10,000 pixels.
-
-* If your PDFs are password-locked, you must remove the lock before submission.
-
-* The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about `8` point text at 150 dots per inch (DPI).
-
-* For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
-
- * For custom extraction model training, the total size of training data is 50 MB for template model and `1` GB for the neural model.
-
- * For custom classification model training, the total size of training data is `1` GB with a maximum of 10,000 pages. For `2024-11-30` (GA), the total size of training data is `2` GB with a maximum of 10,000 pages.
+* **Photos and scans**: For best results, provide one clear photo or high-quality scan per document.
+* **PDFs and TIFFs**: For PDFs and TIFFs, up to 2,000 pages can be processed with a free-tier subscription. Only the first two pages are processed.
+* **File size**: The file size for analyzing documents is 500 MB for the paid (S0) tier and 4 MB for the free (F0) tier.
+* **Image dimensions**: The image dimensions must be between 50 pixels x 50 pixels and 10,000 pixels x 10,000 pixels.
+* **Password locks**: If your PDFs are password-locked, you must remove the lock before submission.
+* **Text height**: The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about 8-point text at 150 dots per inch.
+* **Custom model training**: The maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
+* **Custom extraction model training**: The total size of training data is 50 MB for the template model and 1 GB for the neural model.
+* **Custom classification model training**: The total size of training data is 1 GB with a maximum of 10,000 pages. For 2024-11-30 (GA), the total size of training data is 2 GB with a maximum of 10,000 pages.
+* **Office file types (DOCX, XLSX, PPTX)**: The maximum string length limit is 8 million characters.
:::moniker-end
:::moniker range="doc-intel-2.1.0"
## Input guide
-* Supported file formats: JPEG, PNG, PDF, and TIFF.
-* Supported number of pages: For PDF and TIFF, up to 2,000 pages are processed. For free tier subscribers, only the first two pages are processed.
-* Supported file size: the file size must be less than 50 MB and dimensions at least 50 x 50 pixels and at most 10,000 x 10,000 pixels.
+* **Supported file formats**: JPEG, PNG, PDF, and TIFF.
+* **Supported number of pages**: For PDF and TIFF, up to 2,000 pages are processed. For free tier subscribers, only the first two pages are processed.
+* **Supported file size**: The file size must be less than 50 MB, and the dimensions must be at least 50 x 50 pixels and at most 10,000 x 10,000 pixels.
:::moniker-end
:::moniker range="<=doc-intel-3.1.0"
### Get started
-See how data, including text, tables, table headers, selection marks, and structure information is extracted from documents using Document Intelligence. You need the following resources:
-
-* An Azure subscription—you can [create one for free](https://azure.microsoft.com/free/cognitive-services/).
+You can use Document Intelligence to extract data such as text, tables, table headers, selection marks, and structure information from documents. You need the following resources:
-* A [Document Intelligence instance](https://portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer) in the Azure portal. You can use the free pricing tier (`F0`) to try the service. After your resource deploys, select **Go to resource** to get your key and endpoint.
+* An Azure subscription. You can [create one for free](https://azure.microsoft.com/free/cognitive-services/).
+* A [Document Intelligence instance](https://portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer) in the Azure portal. You can use the free pricing tier (F0) to try the service. After your resource deploys, select **Go to resource** to get your key and endpoint.
-:::image type="content" source="../media/containers/keys-and-endpoint.png" alt-text="Screenshot of keys and endpoint location in the Azure portal.":::
+:::image type="content" source="../media/containers/keys-and-endpoint.png" alt-text="Screenshot that shows the keys and endpoint location in the Azure portal.":::
:::moniker-end
:::moniker range="doc-intel-3.1.0 || doc-intel-3.0.0"
-After you retrieve you key and endpoint, you can use the following development options to build and deploy your Document Intelligence applications:
+After you retrieve your key and endpoint, you can use the following development options to build and deploy your Document Intelligence applications.
> [!NOTE]
> Document Intelligence Studio is available with v3.0 APIs and later versions.
### [REST API](#tab/rest)
-
-* [`2023-07-31` GA (v3.1)](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-2023-07-31&preserve-view=true&tabs=HTTP)
-* [`2022-08-31` GA (v3.0)](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-v3.0%20(2022-08-31)&preserve-view=true&tabs=HTTP)
+* [2023-07-31` GA (v3.1)](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-2023-07-31&preserve-view=true&tabs=HTTP)
+* [2022-08-31` GA (v3.0)](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-v3.0%20(2022-08-31)&preserve-view=true&tabs=HTTP)
# [Client libraries](#tab/sdks)
-* [**C# SDK**](/dotnet/api/overview/azure/ai.documentintelligence-readme?view=azure-dotnet-preview&preserve-view=true)
-* [**Java SDK**](/java/api/overview/azure/ai-documentintelligence-readme?view=azure-java-preview&preserve-view=true)
-* [**JavaScript**](/javascript/api/overview/azure/ai-document-intelligence-rest-readme?view=azure-node-preview&preserve-view=true)
-* [**Python SDK**](/python/api/overview/azure/ai-documentintelligence-readme?view=azure-python-preview&preserve-view=true)
+* [C# SDK](/dotnet/api/overview/azure/ai.documentintelligence-readme?view=azure-dotnet-preview&preserve-view=true)
+* [Java SDK](/java/api/overview/azure/ai-documentintelligence-readme?view=azure-java-preview&preserve-view=true)
+* [JavaScript](/javascript/api/overview/azure/ai-document-intelligence-rest-readme?view=azure-node-preview&preserve-view=true)
+* [Python SDK](/python/api/overview/azure/ai-documentintelligence-readme?view=azure-python-preview&preserve-view=true)
### [Document Intelligence Studio](#tab/studio)
-* [Studio](https://documentintelligence.ai.azure.com/studio)
-* [How-to guide](../quickstarts/get-started-studio.md#authentication-in-studio)
+* [Document Intelligence Studio](https://documentintelligence.ai.azure.com/studio)
+* [Get started: Document Intelligence Studio](../quickstarts/get-started-studio.md#authentication-in-document-intelligence-studio)
---
@@ -708,88 +695,88 @@ After you retrieve you key and endpoint, you can use the following development o
## Document Intelligence Sample Labeling tool
-1. Navigate to the [Document Intelligence sample tool](https://fott-2-1.azurewebsites.net/).
+1. Go to the [Document Intelligence Sample Labeling tool](https://fott-2-1.azurewebsites.net/).
1. On the sample tool home page, select **Use Layout to get text, tables and selection marks**.
- :::image type="content" source="../media/label-tool/layout-1.jpg" alt-text="Screenshot of connection settings for the Document Intelligence layout process.":::
+ :::image type="content" source="../media/label-tool/layout-1.jpg" alt-text="Screenshot that shows connection settings for the Document Intelligence layout process.":::
1. In the **Document Intelligence service endpoint** field, paste the endpoint that you obtained with your Document Intelligence subscription.
-1. In the **key** field, paste the key you obtained from your Document Intelligence resource.
+1. In the **key** field, paste the key that you obtained from your Document Intelligence resource.
-1. In the **Source** field, select **URL** from the dropdown menu You can use our sample document:
+1. In the **Source** field, select **URL** from the dropdown menu. You can use the sample document:
- * [**Sample document**](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/layout-page-001.jpg).
+ * [Sample document](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/layout-page-001.jpg).
- * Select the **Fetch** button.
+ * Select **Fetch**.
-1. Select **Run Layout**. The Document Intelligence Sample Labeling tool calls the `Analyze Layout` API to analyze the document.
+1. Select **Run Layout**. The Document Intelligence Sample Labeling tool calls the Analyze Layout API to analyze the document.
- :::image type="content" source="../media/fott-layout.png" alt-text="Screenshot of `Layout` dropdown window.":::
+ :::image type="content" source="../media/fott-layout.png" alt-text="Screenshot that shows the Layout dropdown pane.":::
-1. View the results - see the highlighted extracted text, detected selection marks, and detected tables.
+1. View the results. See the highlighted extracted text, detected selection marks, and detected tables.
- :::image type="content" source="../media/label-tool/layout-3.jpg" alt-text="Screenshot of connection settings for the Document Intelligence Sample Labeling tool.":::
+ :::image type="content" source="../media/label-tool/layout-3.jpg" alt-text="Screenshot that shows connection settings for the Document Intelligence Sample Labeling tool.":::
:::moniker-end
:::moniker range="doc-intel-2.1.0"
-Document Intelligence v2.1 supports the following tools, applications, and libraries:
+Document Intelligence v2.1 supports the following tools, applications, and libraries.
| Feature | Resources |
|----------|-------------------------|
-|**Layout API**| • [**Document Intelligence labeling tool**](https://fott-2-1.azurewebsites.net/layout-analyze)</br>• [**REST API**](../how-to-guides/use-sdk-rest-api.md?pivots=programming-language-rest-api&tabs=windows&view=doc-intel-2.1.0&preserve-view=true)</br>• [**Client-library SDK**](../how-to-guides/use-sdk-rest-api.md?view=doc-intel-2.1.0&preserve-view=true)</br>• [**Document Intelligence Docker container**](../containers/install-run.md?branch=main&tabs=layout#run-the-container-with-the-docker-compose-up-command)|
+|Layout API| • [Document Intelligence labeling tool](https://fott-2-1.azurewebsites.net/layout-analyze)</br>• [REST API](../how-to-guides/use-sdk-rest-api.md?pivots=programming-language-rest-api&tabs=windows&view=doc-intel-2.1.0&preserve-view=true)</br>• [Client-library SDK](../how-to-guides/use-sdk-rest-api.md?view=doc-intel-2.1.0&preserve-view=true)</br>• [Document Intelligence Docker container](../containers/install-run.md?branch=main&tabs=layout#run-the-container-with-the-docker-compose-up-command)|
:::moniker-end
:::moniker range="doc-intel-3.0.0 || doc-intel-3.1.0"
## Extract data
-The layout model extracts structural elements from your documents. To follow are descriptions of these structural elements with guidance on how to extract them from your document input:
+The layout model extracts structural elements from your documents. The structural elements are described here, and the following guidance shows you how to extract them from your document input.
-* [**Page**](#page)
-* [**Paragraph**](#paragraph)
-* [**Text, line, and word**](#text-lines-and-words)
-* [**Selection mark**](#selection-marks)
-* [**Table**](#tables)
-* [**Annotations**](#annotations)
+* [Page](#page)
+* [Paragraph](#paragraph)
+* [Text, line, and word](#text-lines-and-words)
+* [Selection mark](#selection-marks)
+* [Table](#tables)
+* [Annotations](#annotations)
:::moniker-end
:::moniker range="doc-intel-2.1.0"
## Extract data
-The layout model extracts structural elements from your documents. To follow are descriptions of these structural elements with guidance on how to extract them from your document input:
+The layout model extracts structural elements from your documents. The structural elements are described here, and the following guidance shows you how to extract them from your document input.
-* [**Page**](#page)
-* [**Paragraph**](#paragraph)
-* [**Text, line, and word**](#text-lines-and-words)
-* [**Selection mark**](#selection-marks)
-* [**Table**](#tables)
-* [**Natural reading order**](#natural-reading-order-output-latin-only)
-* [**Select page number or range**](#select-page-number-or-range-for-text-extraction)
+* [Page](#page)
+* [Paragraph](#paragraph)
+* [Text, line, and word](#text-lines-and-words)
+* [Selection mark](#selection-marks)
+* [Table](#tables)
+* [Natural reading order](#natural-reading-order-output-latin-only)
+* [Select page number or range](#select-page-number-or-range-for-text-extraction)
:::moniker-end
:::moniker range="<=doc-intel-3.1.0"
### Page
-The pages collection is a list of pages within the document. Each page is represented sequentially within the document and ../includes the orientation angle indicating if the page is rotated and the width and height (dimensions in pixels). The page units in the model output are computed as shown:
+The `pages` collection is a list of pages within the document. Each page is represented sequentially within the document and includes the orientation angle that indicates if the page is rotated and the width and height (dimensions in pixels). The page units in the model output are computed as shown in the following table.
-| **File format** | **Computed page unit** | **Total pages** |
+| File format | Computed page unit | Total pages |
| --- | --- | --- |
-|Images (JPEG/JPG, PNG, BMP, HEIF) | Each image = 1 page unit | Total images |
-|PDF | Each page in the PDF = 1 page unit | Total pages in the PDF |
-|TIFF | Each image in the TIFF = 1 page unit | Total images in the TIFF |
-|Word (DOCX) | Up to 3,000 characters = 1 page unit, embedded or linked images not supported | Total pages of up to 3,000 characters each |
-|Excel (XLSX) | Each worksheet = 1 page unit, embedded or linked images not supported | Total worksheets |
-|PowerPoint (PPTX) | Each slide = 1 page unit, embedded or linked images not supported | Total slides |
-|HTML | Up to 3,000 characters = 1 page unit, embedded or linked images not supported | Total pages of up to 3,000 characters each |
+|Images (JPEG/JPG, PNG, BMP, HEIF) | Each image = 1 page unit. | Total images |
+|PDF | Each page in the PDF = 1 page unit. | Total pages in the PDF |
+|TIFF | Each image in the TIFF = 1 page unit. | Total images in the TIFF |
+|Word (DOCX) | Up to 3,000 characters = 1 page unit. Embedded or linked images aren't supported. | Total pages of up to 3,000 characters each |
+|Excel (XLSX) | Each worksheet = 1 page unit. Embedded or linked images aren't supported. | Total worksheets |
+|PowerPoint (PPTX) | Each slide = 1 page unit. Embedded or linked images aren't supported. | Total slides |
+|HTML | Up to 3,000 characters = 1 page unit. Embedded or linked images aren't supported. | Total pages of up to 3,000 characters each |
:::moniker-end
@@ -853,11 +840,11 @@ for page in result.pages:
### Extract selected pages from documents
-For large multi-page documents, use the `pages` query parameter to indicate specific page numbers or page ranges for text extraction.
+For large multipage documents, use the `pages` query parameter to indicate specific page numbers or page ranges for text extraction.
### Paragraph
-The Layout model extracts all identified blocks of text in the `paragraphs` collection as a top level object under `analyzeResults`. Each entry in this collection represents a text block and ../includes the extracted text as`content`and the bounding `polygon` coordinates. The `span` information points to the text fragment within the top level `content` property that contains the full text from the document.
+The layout model extracts all identified blocks of text in the `paragraphs` collection as a top-level object under `analyzeResults`. Each entry in this collection represents a text block and includes the extracted text as `content`and the bounding `polygon` coordinates. The `span` information points to the text fragment within the top-level `content` property that contains the full text from the document.
```json
@@ -872,16 +859,16 @@ The Layout model extracts all identified blocks of text in the `paragraphs` coll
#### Paragraph role
-The new machine-learning based page object detection extracts logical roles like titles, section headings, page headers, page footers, and more. The Document Intelligence Layout model assigns certain text blocks in the `paragraphs` collection with their specialized role or type predicted by the model. It's best to use paragraph roles with unstructured documents to help understand the layout of the extracted content for a richer semantic analysis. The following paragraph roles are supported:
+The new page object detection based on machine learning extracts logical roles like titles, section headings, page headers, page footers, and more. The Document Intelligence layout model assigns certain text blocks in the `paragraphs` collection with their specialized role or type predicted by the model. It's best to use paragraph roles with unstructured documents to help understand the layout of the extracted content for a richer semantic analysis. The following paragraph roles are supported.
-| **Predicted role** | **Description** | **Supported file types** |
+| Predicted role | Description | Supported file types |
| --- | --- | --- |
-| `title` | The main headings in the page | pdf, image, docx, pptx, xlsx, html |
-| `sectionHeading` | One or more subheadings on the page | pdf, image, docx, xlsx, html |
-| `footnote` | Text near the bottom of the page | pdf, image |
-| `pageHeader` | Text near the top edge of the page | pdf, image, docx |
-| `pageFooter` | Text near the bottom edge of the page | pdf, image, docx, pptx, html |
-| `pageNumber` | Page number | pdf, image |
+| `title` | The main headings in the page | PDF, Image, DOCX, PPTX, XLSX, HTML |
+| `sectionHeading` | One or more subheadings on the page | PDF, Image, DOCX, XLSX, HTML |
+| `footnote` | Text near the bottom of the page | PDF, Image |
+| `pageHeader` | Text near the top edge of the page | PDF, Image, DOCX |
+| `pageFooter` | Text near the bottom edge of the page | PDF, Image, DOCX, PPTX, HTML |
+| `pageNumber` | Page number | PDF, Image |
```json
{
@@ -905,9 +892,9 @@ The new machine-learning based page object detection extracts logical roles like
### Text, line, and word
-The document layout model in Document Intelligence extracts print and handwritten style text as `lines` and `words`. The `styles` collection ../includes any handwritten style for lines if detected along with the spans pointing to the associated text. This feature applies to [supported handwritten languages](../language-support/prebuilt.md).
+The document layout model in Document Intelligence extracts print and handwritten-style text as lines and words. The `styles` collection includes any handwritten style for lines if detected along with the spans that point to the associated text. This feature applies to [supported handwritten languages](../language-support/prebuilt.md).
-For Microsoft Word, Excel, PowerPoint, and HTML, Document Intelligence v4.0 `2024-11-30` (GA) Layout model extract all embedded text as is. Texts are extracted as words and paragraphs. Embedded images aren't supported.
+For Word, Excel, PowerPoint, and HTML, the Document Intelligence v4.0 2024-11-30 (GA) layout model extracts all embedded text as is. Texts are extracted as words and paragraphs. Embedded images aren't supported.
:::moniker-end
@@ -984,7 +971,7 @@ for line_idx, line in enumerate(page.lines):
### Handwritten style
-The response ../includes classifying whether each text line is of handwriting style or not, along with a confidence score. For more information. See [Handwritten language support](../language-support/ocr.md). The following example shows an example JSON snippet.
+The response includes classifying whether each text line is of handwriting style or not, along with a confidence score. For more information, see [Handwritten language support](../language-support/ocr.md). The following example shows an example JSON snippet.
```json
"styles": [
@@ -1000,11 +987,11 @@ The response ../includes classifying whether each text line is of handwriting st
}
```
-If you enable the [font/style addon capability](../concept-add-on-capabilities.md#font-property-extraction), you also get the font/style result as part of the `styles` object.
+If you enable the [font/style add-on capability](../concept-add-on-capabilities.md#font-property-extraction), you also get the font/style result as part of the `styles` object.
### Selection mark
-The Layout model also extracts selection marks from documents. Extracted selection marks appear within the `pages` collection for each page. They include the bounding `polygon`, `confidence`, and selection `state` (`selected/unselected`). The text representation (that is, `:selected:` and `:unselected`) is also included as the starting index (`offset`) and `length` that references the top level `content` property that contains the full text from the document.
+The layout model also extracts selection marks from documents. Extracted selection marks appear within the `pages` collection for each page. They include the bounding `polygon`, `confidence`, and selection `state` (`selected/unselected`). The text representation (that is, `:selected:` and `:unselected`) is also included as the starting index (`offset`) and `length` that references the top-level `content` property that contains the full text from the document.
:::moniker-end
@@ -1070,22 +1057,20 @@ for selection_mark in page.selection_marks:
### Table
-Extracting tables is a key requirement for processing documents containing large volumes of data typically formatted as tables. The Layout model extracts tables in the `pageResults` section of the JSON output. Extracted table information ../includes the number of columns and rows, row span, and column span. Each cell with its bounding polygon is output along with information whether the area is recognized as a `columnHeader` or not. The model supports extracting tables that are rotated. Each table cell contains the row and column index and bounding polygon coordinates. For the cell text, the model outputs the `span` information containing the starting index (`offset`). The model also outputs the `length` within the top-level content that contains the full text from the document.
+Extracting tables is a key requirement for processing documents that contain large volumes of data typically formatted as tables. The layout model extracts tables in the `pageResults` section of the JSON output. Extracted table information includes the number of columns and rows, row span, and column span. Each cell with its bounding polygon is output along with information whether the area is recognized as `columnHeader` or not.
-Here are a few factors to consider when using the Document Intelligence bale extraction capability:
+The model supports extracting tables that are rotated. Each table cell contains the row and column index and bounding polygon coordinates. For the cell text, the model outputs the `span` information that contains the starting index (`offset`). The model also outputs the `length` within the top-level content that contains the full text from the document.
-* Is the data that you want to extract presented as a table, and is the table structure meaningful?
+Here are a few factors to consider when you use the Document Intelligence bale extraction capability:
+* Is the data that you want to extract presented as a table, and is the table structure meaningful?
* Can the data fit in a two-dimensional grid if the data isn't in a table format?
-
-* Do your tables span multiple pages? If so, to avoid having to label all the pages, split the PDF into pages before sending it to Document Intelligence. After the analysis, post-process the pages to a single table.
-
-* Refer to [Tabular fields](../train/custom-labels.md#tabular-fields) if you're creating custom models. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
+* Do your tables span multiple pages? If so, to avoid having to label all the pages, split the PDF into pages before you send it to Document Intelligence. After the analysis, post-process the pages to a single table.
+* See [Tabular fields](../train/custom-labels.md#tabular-fields) if you create custom models. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
> [!NOTE]
>
-> * Table analysis isn't supported if the input file is XLSX.
- > * Document Intelligence v4.0 `2024-11-30` (GA) supports bounding regions for figures and tables that cover only the core content and exclude associated caption and footnotes.
+> Table analysis isn't supported if the input file is XLSX. Document Intelligence v4.0 2024-11-30 (GA) supports bounding regions for figures and tables that cover only the core content and excludes the associated caption and footnotes.
:::moniker-end
@@ -1177,7 +1162,7 @@ for table_idx, table in enumerate(result.tables):
### Annotations
-The Layout model extracts annotations in documents, such as checks and crosses. The response ../includes the kind of annotation, along with a confidence score and bounding polygon.
+The layout model extracts annotations in documents, such as checks and crosses. The response includes the kind of annotation, along with a confidence score and bounding polygon.
```json
{
@@ -1201,73 +1186,74 @@ The Layout model extracts annotations in documents, such as checks and crosses.
### Natural reading order output (Latin only)
-You can specify the order in which the text lines are output with the `readingOrder` query parameter. Use `natural` for a more human-friendly reading order output as shown in the following example. This feature is only supported for Latin languages.
+You can specify the order in which the text lines are output with the `readingOrder` query parameter. Use `natural` for a more human-friendly reading order output, as shown in the following example. This feature is supported only for Latin languages.
-:::image type="content" source="../media/layout-reading-order-example.png" alt-text="Screenshot of `layout` model reading order processing." lightbox="../../../ai-services/Computer-vision/Images/ocr-reading-order-example.png":::
+:::image type="content" source="../media/layout-reading-order-example.png" alt-text="Screenshot of the layout model reading order processing." lightbox="../../../ai-services/Computer-vision/Images/ocr-reading-order-example.png":::
### Select page number or range for text extraction
-For large multi-page documents, use the `pages` query parameter to indicate specific page numbers or page ranges for text extraction. The following example shows a document with 10 pages, with text extracted for both cases - all pages (1-10) and selected pages (3-6).
+For large multipage documents, use the `pages` query parameter to indicate specific page numbers or page ranges for text extraction. The following example shows a document with 10 pages, with text extracted for both cases, all pages (1-10), and selected pages (3-6).
-:::image type="content" source="../media/layout-select-pages.png" alt-text="Screen shot of the layout model selected pages output.":::
+:::image type="content" source="../media/layout-select-pages.png" alt-text="Screenshot that shows the layout model selected pages output.":::
## The Get Analyze Layout Result operation
-The second step is to call the [Get Analyze Layout Result](https://westcentralus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-1/operations/GetAnalyzeLayoutResult) operation. This operation takes as input the Result ID the `Analyze Layout` operation created. It returns a JSON response that contains a **status** field with the following possible values.
+The second step is to call the [Get Analyze Layout Result](https://westcentralus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-1/operations/GetAnalyzeLayoutResult) operation. This operation takes as input the Result ID that the `Analyze Layout` operation created. It returns a JSON response that contains a **status** field with the following possible values.
|Field| Type | Possible values |
|:-----|:----:|:----|
-|status | string | `notStarted`: The analysis operation isn't started.</br></br>`running`: The analysis operation is in progress.</br></br>`failed`: The analysis operation failed.</br></br>`succeeded`: The analysis operation succeeded.|
+|**status** | string | `notStarted`: The analysis operation isn't started.</br></br>`running`: The analysis operation is in progress.</br></br>`failed`: The analysis operation failed.</br></br>`succeeded`: The analysis operation succeeded.|
-Call this operation iteratively until it returns the `succeeded` value. To avoid exceeding the requests per second (RPS) rate, use an interval of 3 to 5 seconds.
+Call this operation iteratively until it returns the `succeeded` value. To avoid exceeding the requests-per-second rate, use an interval of three to five seconds.
-When the **status** field has the `succeeded` value, the JSON response ../includes the extracted layout, text, tables, and selection marks. The extracted data ../includes extracted text lines and words, bounding boxes, text appearance with handwritten indication, tables, and selection marks with selected/unselected indicated.
+When the **status** field has the `succeeded` value, the JSON response includes the extracted layout, text, tables, and selection marks. The extracted data includes extracted text lines and words, bounding boxes, text appearance with handwritten indication, tables, and selection marks with selected/unselected indicated.
### Handwritten classification for text lines (Latin only)
-The response ../includes classifying whether each text line is of handwriting style or not, along with a confidence score. This feature is only supported for Latin languages. The following example shows the handwritten classification for the text in the image.
+The response includes classifying whether each text line is of a handwritten style or not, along with a confidence score. This feature is supported only for Latin languages. The following example shows the handwritten classification for the text in the image.
-:::image type="content" source="../media/layout-handwriting-classification.png" alt-text="Screenshot of `layout` model handwriting classification process.":::
+:::image type="content" source="../media/layout-handwriting-classification.png" alt-text="Screenshot that shows the layout model handwriting classification process.":::
### Sample JSON output
-The response to the *Get Analyze Layout Result* operation is a structured representation of the document with all the information extracted.
-See here for a [sample document file](https://github.com/Azure-Samples/cognitive-services-REST-api-samples/tree/master/curl/form-recognizer/sample-layout.pdf) and its structured output [sample layout output](https://github.com/Azure-Samples/cognitive-services-REST-api-samples/tree/master/curl/form-recognizer/sample-layout-output.json).
+The response to the `Get Analyze Layout Result` operation is a structured representation of the document with all the information extracted.
+See a [sample document file](https://github.com/Azure-Samples/cognitive-services-REST-api-samples/tree/master/curl/form-recognizer/sample-layout.pdf) and its structured output [sample layout output](https://github.com/Azure-Samples/cognitive-services-REST-api-samples/tree/master/curl/form-recognizer/sample-layout-output.json).
The JSON output has two parts:
-* `readResults` node contains all of the recognized text and selection mark. The text presentation hierarchy is page, then line, then individual words.
-* `pageResults` node contains the tables and cells extracted with their bounding boxes, confidence, and a reference to the lines and words in "readResults" field.
+* The `readResults` node contains all the recognized text and selection mark. The text presentation hierarchy is page, then line, and then individual words.
+* The `pageResults` node contains the tables and cells extracted with their bounding boxes, confidence, and a reference to the lines and words in the `readResults` field.
-## Example Output
+## Example output
### Text
-Layout API extracts text from documents and images with multiple text angles and colors. It accepts photos of documents, faxes, printed and/or handwritten (English only) text, and mixed modes. Text is extracted with information provided on lines, words, bounding boxes, confidence scores, and style (handwritten or other). All the text information is included in the `readResults` section of the JSON output.
+The layout API extracts text from documents and images with multiple text angles and colors. It accepts photos of documents, faxes, printed and/or handwritten (English-only) text, and mixed modes. Text is extracted with information provided on lines, words, bounding boxes, confidence scores, and style (handwritten or other). All the text information is included in the `readResults` section of the JSON output.
### Tables with headers
-Layout API extracts tables in the `pageResults` section of the JSON output. Documents can be scanned, photographed, or digitized. Tables can be complex with merged cells or columns, with or without borders, and with odd angles. Extracted table information ../includes the number of columns and rows, row span, and column span. Each cell with its bounding box is output along with whether the area is recognized as part of a header or not. The model predicted header cells can span multiple rows and aren't necessarily the first rows in a table. They also work with rotated tables. Each table cell also ../includes the full text with references to the individual words in the `readResults` section.
+The Layout API extracts tables in the `pageResults` section of the JSON output. You can scan, photograph, or digitize documents. Tables can be complex with merged cells or columns, with or without borders, and with odd angles.
-
+Extracted table information includes the number of columns and rows, row span, and column span. Each cell with its bounding box is output along with whether the area is recognized as part of a header or not. The model-predicted header cells can span multiple rows and aren't necessarily the first rows in a table. They also work with rotated tables. Each table cell also includes the full text with references to the individual words in the `readResults` section.
+
+
### Selection marks (documents)
-Layout API also extracts selection marks from documents. Extracted selection marks include the bounding box, confidence, and state (selected/unselected). Selection mark information is extracted in the `readResults` section of the JSON output.
+The layout API also extracts selection marks from documents. Extracted selection marks include the bounding box, confidence, and state (selected/unselected). Selection mark information is extracted in the `readResults` section of the JSON output.
### Migration guide
-* Follow our [**Document Intelligence v3.1 migration guide**](../v3-1-migration-guide.md) to learn how to use the v3.1 version in your applications and workflows.
+* To learn how to use the v3.1 version in your applications and workflows, follow the steps in the [Document Intelligence v3.1 migration guide](../v3-1-migration-guide.md).
:::moniker-end
-## Next steps
+## Related content
:::moniker range="doc-intel-4.0.0 || doc-intel-3.1.0"
-* [Learn how to process your own forms and documents](../studio-overview.md) with the [Document Intelligence Studio](https://documentintelligence.ai.azure.com/studio).
-
-* Complete a [Document Intelligence quickstart](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-3.1.0&preserve-view=true) and get started creating a document processing app in the development language of your choice.
+* Learn how to [process your own forms and documents](../studio-overview.md) with the [Document Intelligence Studio](https://documentintelligence.ai.azure.com/studio).
+* Finish a [Document Intelligence quickstart](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-3.1.0&preserve-view=true), and create a document processing app in the development language of your choice.
:::moniker-end
@@ -1285,8 +1271,7 @@ Layout API also extracts selection marks from documents. Extracted selection mar
:::moniker range="doc-intel-2.1.0"
-* [Learn how to process your own forms and documents](../quickstarts/try-sample-label-tool.md) with the [Document Intelligence Sample Labeling tool](https://fott-2-1.azurewebsites.net/).
+* Learn how to [process your own forms and documents](../quickstarts/try-sample-label-tool.md) with the [Document Intelligence Sample Labeling tool](https://fott-2-1.azurewebsites.net/).
+* Finish a [Document Intelligence quickstart](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-2.1.0&preserve-view=true), and create a document processing app in the development language of your choice.
-* Complete a [Document Intelligence quickstart](../quickstarts/get-started-sdks-rest-api.md?view=doc-intel-2.1.0&preserve-view=true) and get started creating a document processing app in the development language of your choice.
-
-:::moniker-end
\ No newline at end of file
+:::moniker-end