Skip to content

Retrieve Results

Overview

The documents/* endpoints are used to get extraction results of specific documents sent to Neurolinker through the Request endpoint.


POST /api/v1/documents/markdown

Parameters
Name In Type Description
document_ids body array of strings Ids of the documents
content_types body array of strings (optional) Filter by content type: text, formula, tables, images. If omitted, all types are returned.
{
  "document_ids": ["string"],
  "content_types": ["text", "formula", "tables", "images"]
}
Response

200

Response Schema

Name Type Description
success boolean
results array of result objects
total int Total number of documents requested
successful int Number of successfully retrieved documents
failed int Number of failed retrievals
message string Human-readable status message

Result object Schema

Name Type Description
document_id string
filename string
storage_path string
download_url string
file_size int
content string Markdown Content
{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": "string"
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

The content field is a Markdown string that can assume different HTML tags during the extraction process

  • <header>
  • <caption>
  • <figure>
  • <figurecaption>
  • <table>
  • <tablecaption>
  • <thead>
  • <tbody>
  • <tfoot>
  • <tr>
  • <td>
  • <th>

Table representation in Markdown:

  • Table structure is preserved as in the source (including rowspan and colspan).

  • A merged header/value usually appears once in the table structure, because span attributes define how many rows/columns it covers.

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/v1/documents/markdown' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST /api/v1/documents/json

Parameters
Name In Type Description
document_ids body array of strings Ids of the documents
content_types body array of strings (optional) Filter by content type: text, formula, tables, images. If omitted, all types are returned.
{
  "document_ids": ["string"],
  "content_types": ["text", "formula", "tables", "images"]
}
Response

200

Response Schema

Name Type Description
success boolean
results array of result objects
total int Total number of documents requested
successful int Number of successfully retrieved documents
failed int Number of failed retrievals
message string Human-readable status message

Result object Schema

Name Type Description
document_id string
filename string
storage_path string
download_url string
file_size int
content string JSON Schema
{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": "string"
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

The JSON Schema for the content field is as follows, this was made out of an example pdf document:

{
  "pdf_name": "DOCUMENT_ID",
  "category": "DOCUMENT_CATEGORY",
  "pages": {
    "1": {
      "page_metadata": {
        "page_dim": [PAGE_WIDTH_PX, PAGE_HEIGHT_PX]
      },
      "page_header": [
        {
          "id": "header-element-id",
          "label": "Page-header",
          "content": "Header text",
          "bbox": [x0, y0, x1, y1],
          "images": [],
          "tables": []
        }
      ],
      "page_footer": [
        {
          "id": "footer-element-id",
          "label": "Page-footer",
          "content": "Footer text",
          "bbox": [x0, y0, x1, y1],
          "images": [],
          "tables": []
        }
      ],
      "body": {
        "sections": {
          "0": {
            "title": {
              "id": "section-title-id",
              "label": "Section-header",
              "content": "Section Title",
              "bbox": [x0, y0, x1, y1],
              "title_feats": {
                "font": "FONT_NAME",
                "height": FONT_SIZE
              }
            },
            "texts": [
              {
                "id": "text-id",
                "label": "Text",
                "content": "Paragraph text",
                "bbox": [x0, y0, x1, y1]
              }
            ],
            "tables": [
              {
                "id": "table-id",
                "label": "Table",
                "columns": ["Column A", "Column B"],
                "data": [
                  ["Value A1", "Value B1"],
                  ["Value A2", "Value B2"]
                ],
                "description": "Table description",
                "bbox": [x0, y0, x1, y1]
              }
            ],
            "images": [
              {
                "id": "image-id",
                "label": "Picture",
                "description": "Image description",
                "extracted_text": "Raw text found inside the image",
                "src": "IMAGE_URL",
                "bbox": [x0, y0, x1, y1]
              }
            ],
            "subsections": [],
            "supersection": [],
            "level": 1
          }
        }
      }
    }
  }
}

Table representation in JSON:

  • Tables are normalized into a regular grid (columns + data), so each row has the same number of cells.

  • Cells coming from rowspan/colspan are repeated to fill the covered grid positions.

  • This is intentional: duplication keeps row/column relationships explicit in a fixed grid, making downstream parsing and indexing simpler and more consistent.

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/v1/documents/json' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST /api/v1/documents/images

Parameters
Name In Type Description
document_ids body array of strings Ids of the documents
{
  "document_ids": ["string"]
}
Response

200

Response Schema

Name Type Description
success boolean
results array of result objects
total int Total number of documents requested
successful int Number of successfully retrieved documents
failed int Number of failed retrievals
total_images int Total number of images retrieved across all documents
message string Human-readable status message

Result object Schema

Name Type Description
document_id string
format string equals "images"
total_images int
images array of objects See Schema ImageResult

ImageResult Schema

Name Type Description
filename string
storage_path string equals "images"
url string
{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "format": "images",
      "total_images": 0,
      "images": [
        {
          "filename": "string",
          "storage_path": "string",
          "url": "string"
        }
      ]
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "total_images": 0,
  "message": "string"
}

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/v1/documents/images' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST/api/v1/documents/page-summaries

Parameters
Name In Type Description
document_ids body array of strings Ids of the documents
{
  "document_ids": ["string"]
}
Response

200

Response Schema

Name Type Description
success boolean
results array of result objects
total int Total number of documents requested
successful int Number of successfully retrieved documents
failed int Number of failed retrievals
message string Human-readable status message

Result object Schema

Name Type Description
document_id string
filename string
storage_path string
download_url string
file_size int
content string Markdown Content
{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": "string"
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/v1/documents/page-summaries' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST /api/v1/documents/document-summary

Returns a global document summary as a Markdown string. Use summary_type to choose between a page-based or section-based summary.

Parameters
Name In Type Description
document_ids body array of strings Ids of the documents
summary_type body string page (page-based) or section (section-based)
{
  "document_ids": ["string"],
  "summary_type": "page"
}
Response

200

Response Schema

Name Type Description
success boolean
results array of result objects
total int Total number of documents requested
successful int Number of successfully retrieved documents
failed int Number of failed retrievals
message string Human-readable status message

Result object Schema

Name Type Description
document_id string
filename string
storage_path string
download_url string
file_size int
content string Markdown content
{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": "string"
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/api/v1/documents/document-summary' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ],
    "summary_type": "page"
    }'

POST /api/v1/documents/section-summaries

Returns a summary for each individual section of the document.

How section-based summarization works:

  • Sections are processed in document order (which follows the detected heading hierarchy and reading flow).

  • Example hierarchy: # section 1 -> ## section 2 -> ### section 3, then # section 4, # section 5 -> ## section 6, then # section 7.

  • In this case, summaries are generated section by section in that sequence (section 1, section 2, section 3, and so on).

  • The response keys are section IDs from the extracted document structure.

Parameters
Name In Type Description
document_ids body array of strings Ids of the documents
{
  "document_ids": ["string"]
}
Response

200

Response Schema

Name Type Description
success boolean
results array of result objects
total int Total number of documents requested
successful int Number of successfully retrieved documents
failed int Number of failed retrievals
message string Human-readable status message

Result object Schema

Name Type Description
document_id string
filename string
storage_path string
download_url string
file_size int
content object JSON object where each key is a section ID and the value is a Markdown-formatted summary of that section
{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": {
        "0": "**Main Topic and Purpose**\nSummary of the first section...",
        "1": "**Main Topic and Purpose**\nSummary of the second section...",
        "2": "..."
      }
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/api/v1/documents/section-summaries' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST /api/v1/make-zip

Creates a ZIP archive of the processed output for a given request and returns a signed download URL (valid for 1 hour).

Parameters
Name In Type Description
request_id body string The job/request identifier
document_id body string (optional) Restricts the archive to a single document within the request
local_images body boolean If true, rewrites Firebase Storage URLs to local ./filename references. Defaults to false
content_types body array of strings (optional) Filter by content type: text, formula, tables, images. PNGs included only if images is selected. If omitted, all content is included.
{
  "request_id": "string",
  "document_id": "string",
  "local_images": false,
  "content_types": ["text", "formula", "tables", "images"]
}
Response

200

Name Type Description
success boolean
url string Signed URL to download the ZIP (expires in 1 hour)
message string Human-readable status message
{
  "success": true,
  "url": "string",
  "message": "ZIP archive created successfully"
}

400 — Request or document not yet completed

404 — No files found for the given request/document

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/api/v1/make-zip' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "request_id": "1d6986aa-80ac-4a4d-9009-5a323c9046b1",
    "local_images": false
    }'