Retrieve Results

Overview

The documents/* endpoints are used to get extraction results of specific documents sent to Neurolinker through the Request endpoint.

POST /api/v1/documents/markdown

Parameters

Name	In	Type	Description
`document_ids`	body	array of strings	Ids of the documents
`content_types`	body	array of strings (optional)	Filter by content type: `text`, `formula`, `tables`, `images`. If omitted, all types are returned.

{
  "document_ids": ["string"],
  "content_types": ["text", "formula", "tables", "images"]
}

Response

200

Response Schema

Name	Type	Description
`success`	boolean
`results`	array of result objects
`total`	int	Total number of documents requested
`successful`	int	Number of successfully retrieved documents
`failed`	int	Number of failed retrievals
`message`	string	Human-readable status message

Result object Schema

Name	Type	Description
`document_id`	string
`filename`	string
`storage_path`	string
`download_url`	string
`file_size`	int
`content`	string	Markdown Content

{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": "string"
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

The content field is a Markdown string that can assume different HTML tags during the extraction process

<header>
<caption>
<figure>
<figurecaption>
<table>
<tablecaption>
<thead>
<tbody>
<tfoot>
<tr>
<td>
<th>

Table representation in Markdown:

Table structure is preserved as in the source (including rowspan and colspan).
A merged header/value usually appears once in the table structure, because span attributes define how many rows/columns it covers.

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/v1/documents/markdown' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST /api/v1/documents/json

Parameters

Name	In	Type	Description
`document_ids`	body	array of strings	Ids of the documents
`content_types`	body	array of strings (optional)	Filter by content type: `text`, `formula`, `tables`, `images`. If omitted, all types are returned.

{
  "document_ids": ["string"],
  "content_types": ["text", "formula", "tables", "images"]
}

Response

200

Response Schema

Name	Type	Description
`success`	boolean
`results`	array of result objects
`total`	int	Total number of documents requested
`successful`	int	Number of successfully retrieved documents
`failed`	int	Number of failed retrievals
`message`	string	Human-readable status message

Result object Schema

Name	Type	Description
`document_id`	string
`filename`	string
`storage_path`	string
`download_url`	string
`file_size`	int
`content`	string	JSON Schema

{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": "string"
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

The JSON Schema for the content field is as follows, this was made out of an

example pdf document: href="#__codelineno-5-1">{ "pdf_name": "DOCUMENT_ID", "category": "DOCUMENT_CATEGORY", "pages": { "1": { "page_metadata": { "page_dim": [PAGE_WIDTH_PX, PAGE_HEIGHT_PX] }, "page_header": [ { "id": "header-element-id", "label": "Page-header", "content": "Header text", "bbox": [x0, y0, x1, y1], "images": [], "tables": [] } ], "page_footer": [ { "id": "footer-element-id", "label": "Page-footer", "content": "Footer text", "bbox": [x0, y0, x1, y1], "images": [], "tables": [] } ], "body": { "sections": { "0": { "title": { "id": "section-title-id", "label": "Section-header", "content": "Section Title", "bbox": [x0, y0, x1, y1], "title_feats": { "font": "FONT_NAME", "height": FONT_SIZE } }, "texts": [ { "id": "text-id", "label": "Text", "content": "Paragraph text", "bbox": [x0, y0, x1, y1] } ], "tables": [ { "id": "table-id", "label": "Table", "columns": ["Column A", "Column B"], "data": [ ["Value A1", "Value B1"], ["Value A2", "Value B2"] ], "description": "Table description", "bbox": [x0, y0, x1, y1] } ], "images": [ { "id": "image-id", "label": "Picture", "description": "Image description", "extracted_text": "Raw text found inside the image", "src": "IMAGE_URL", "bbox": [x0, y0, x1, y1] } ], "subsections": [], "supersection": [], "level": 1 } } } } } }

Table representation in JSON:

Tables are normalized into a regular grid (columns + data), so each row has the same number of cells.
Cells coming from rowspan/colspan are repeated to fill the covered grid positions.
This is intentional: duplication keeps row/column relationships explicit in a fixed grid, making downstream parsing and indexing simpler and more consistent.

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/v1/documents/json' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST /api/v1/documents/images

Parameters

Name	In	Type	Description
`document_ids`	body	array of strings	Ids of the documents

{
  "document_ids": ["string"]
}

Response

200

Response Schema

Name	Type	Description
`success`	boolean
`results`	array of result objects
`total`	int	Total number of documents requested
`successful`	int	Number of successfully retrieved documents
`failed`	int	Number of failed retrievals
`total_images`	int	Total number of images retrieved across all documents
`message`	string	Human-readable status message

Result object Schema

Name	Type	Description
`document_id`	string
`format`	string	equals "images"
`total_images`	int
`images`	array of objects	See Schema ImageResult

ImageResult Schema

Name	Type	Description
`filename`	string
`storage_path`	string	equals "images"
`url`	string

{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "format": "images",
      "total_images": 0,
      "images": [
        {
          "filename": "string",
          "storage_path": "string",
          "url": "string"
        }
      ]
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "total_images": 0,
  "message": "string"
}

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/v1/documents/images' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST/api/v1/documents/page-summaries

Parameters

Name	In	Type	Description
`document_ids`	body	array of strings	Ids of the documents

{
  "document_ids": ["string"]
}

Response

200

Response Schema

Name	Type	Description
`success`	boolean
`results`	array of result objects
`total`	int	Total number of documents requested
`successful`	int	Number of successfully retrieved documents
`failed`	int	Number of failed retrievals
`message`	string	Human-readable status message

Result object Schema

Name	Type	Description
`document_id`	string
`filename`	string
`storage_path`	string
`download_url`	string
`file_size`	int
`content`	string	Markdown Content

{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": "string"
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/v1/documents/page-summaries' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST /api/v1/documents/document-summary

Returns a global document summary as a Markdown string. Use summary_type to choose between a page-based or section-based summary.

Parameters

Name	In	Type	Description
`document_ids`	body	array of strings	Ids of the documents
`summary_type`	body	string	`page` (page-based) or `section` (section-based)

{
  "document_ids": ["string"],
  "summary_type": "page"
}

Response

200

Response Schema

Name	Type	Description
`success`	boolean
`results`	array of result objects
`total`	int	Total number of documents requested
`successful`	int	Number of successfully retrieved documents
`failed`	int	Number of failed retrievals
`message`	string	Human-readable status message

Result object Schema

Name	Type	Description
`document_id`	string
`filename`	string
`storage_path`	string
`download_url`	string
`file_size`	int
`content`	string	Markdown content

{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": "string"
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/api/v1/documents/document-summary' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ],
    "summary_type": "page"
    }'

POST /api/v1/documents/section-summaries

Returns a summary for each individual section of the document.

How section-based summarization works:

Sections are processed in document order (which follows the detected heading hierarchy and reading flow).
Example hierarchy: # section 1 -> ## section 2 -> ### section 3, then # section 4, # section 5 -> ## section 6, then # section 7.
In this case, summaries are generated section by section in that sequence (section 1, section 2, section 3, and so on).
The response keys are section IDs from the extracted document structure.

Parameters

Name	In	Type	Description
`document_ids`	body	array of strings	Ids of the documents

{
  "document_ids": ["string"]
}

Response

200

Response Schema

Name	Type	Description
`success`	boolean
`results`	array of result objects
`total`	int	Total number of documents requested
`successful`	int	Number of successfully retrieved documents
`failed`	int	Number of failed retrievals
`message`	string	Human-readable status message

Result object Schema

Name	Type	Description
`document_id`	string
`filename`	string
`storage_path`	string
`download_url`	string
`file_size`	int
`content`	object	JSON object where each key is a section ID and the value is a Markdown-formatted summary of that section

{
  "success": true,
  "results": [
    {
      "document_id": "string",
      "filename": "string",
      "storage_path": "string",
      "download_url": "string",
      "file_size": 0,
      "content": {
        "0": "**Main Topic and Purpose**\nSummary of the first section...",
        "1": "**Main Topic and Purpose**\nSummary of the second section...",
        "2": "..."
      }
    }
    //...
  ],
  "total": 0,
  "successful": 0,
  "failed": 0,
  "message": "string"
}

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/api/v1/documents/section-summaries' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "document_ids": [
        "1d6986aa-80ac-4a4d-9009-5a323c9046b1"
    ]
    }'

POST /api/v1/make-zip

Creates a ZIP archive of the processed output for a given request and returns a signed download URL (valid for 1 hour).

Parameters

Name	In	Type	Description
`request_id`	body	string	The job/request identifier
`document_id`	body	string (optional)	Restricts the archive to a single document within the request
`local_images`	body	boolean	If `true`, rewrites Firebase Storage URLs to local `./filename` references. Defaults to `false`
`content_types`	body	array of strings (optional)	Filter by content type: `text`, `formula`, `tables`, `images`. PNGs included only if `images` is selected. If omitted, all content is included.

{
  "request_id": "string",
  "document_id": "string",
  "local_images": false,
  "content_types": ["text", "formula", "tables", "images"]
}

Response

200

Name	Type	Description
`success`	boolean
`url`	string	Signed URL to download the ZIP (expires in 1 hour)
`message`	string	Human-readable status message

{
  "success": true,
  "url": "string",
  "message": "ZIP archive created successfully"
}

400 — Request or document not yet completed

404 — No files found for the given request/document

Example "Try it out!"

    curl -X 'POST' \
    'https://neurolinker.api.ainexxo.com/api/v1/make-zip' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer nl_********************************' \
    -H 'Content-Type: application/json' \
    -d '{
    "request_id": "1d6986aa-80ac-4a4d-9009-5a323c9046b1",
    "local_images": false
    }'