Retrieve Results
Overview
The documents/* endpoints are used to get extraction results of specific documents sent to Neurolinker through the Request endpoint.
POST /api/v1/documents/markdown
Parameters
| Name | In | Type | Description |
|---|---|---|---|
document_ids |
body | array of strings | Ids of the documents |
content_types |
body | array of strings (optional) | Filter by content type: text, formula, tables, images. If omitted, all types are returned. |
Response
200
Response Schema
| Name | Type | Description |
|---|---|---|
success |
boolean | |
results |
array of result objects | |
total |
int | Total number of documents requested |
successful |
int | Number of successfully retrieved documents |
failed |
int | Number of failed retrievals |
message |
string | Human-readable status message |
Result object Schema
| Name | Type | Description |
|---|---|---|
document_id |
string | |
filename |
string | |
storage_path |
string | |
download_url |
string | |
file_size |
int | |
content |
string | Markdown Content |
{
"success": true,
"results": [
{
"document_id": "string",
"filename": "string",
"storage_path": "string",
"download_url": "string",
"file_size": 0,
"content": "string"
}
//...
],
"total": 0,
"successful": 0,
"failed": 0,
"message": "string"
}
The content field is a Markdown string that can assume different HTML tags during the extraction process
<header><caption><figure><figurecaption><table><tablecaption><thead><tbody><tfoot><tr><td><th>
Table representation in Markdown:
-
Table structure is preserved as in the source (including
rowspanandcolspan). -
A merged header/value usually appears once in the table structure, because span attributes define how many rows/columns it covers.
Example "Try it out!"
curl -X 'POST' \
'https://neurolinker.api.ainexxo.com/v1/documents/markdown' \
-H 'accept: application/json' \
-H 'Authorization: Bearer nl_********************************' \
-H 'Content-Type: application/json' \
-d '{
"document_ids": [
"1d6986aa-80ac-4a4d-9009-5a323c9046b1"
]
}'
POST /api/v1/documents/json
Parameters
| Name | In | Type | Description |
|---|---|---|---|
document_ids |
body | array of strings | Ids of the documents |
content_types |
body | array of strings (optional) | Filter by content type: text, formula, tables, images. If omitted, all types are returned. |
Response
200
Response Schema
| Name | Type | Description |
|---|---|---|
success |
boolean | |
results |
array of result objects | |
total |
int | Total number of documents requested |
successful |
int | Number of successfully retrieved documents |
failed |
int | Number of failed retrievals |
message |
string | Human-readable status message |
Result object Schema
| Name | Type | Description |
|---|---|---|
document_id |
string | |
filename |
string | |
storage_path |
string | |
download_url |
string | |
file_size |
int | |
content |
string | JSON Schema |
{
"success": true,
"results": [
{
"document_id": "string",
"filename": "string",
"storage_path": "string",
"download_url": "string",
"file_size": 0,
"content": "string"
}
//...
],
"total": 0,
"successful": 0,
"failed": 0,
"message": "string"
}
The JSON Schema for the content field is as follows, this was made out of an example pdf document:
{
"pdf_name": "DOCUMENT_ID",
"category": "DOCUMENT_CATEGORY",
"pages": {
"1": {
"page_metadata": {
"page_dim": [PAGE_WIDTH_PX, PAGE_HEIGHT_PX]
},
"page_header": [
{
"id": "header-element-id",
"label": "Page-header",
"content": "Header text",
"bbox": [x0, y0, x1, y1],
"images": [],
"tables": []
}
],
"page_footer": [
{
"id": "footer-element-id",
"label": "Page-footer",
"content": "Footer text",
"bbox": [x0, y0, x1, y1],
"images": [],
"tables": []
}
],
"body": {
"sections": {
"0": {
"title": {
"id": "section-title-id",
"label": "Section-header",
"content": "Section Title",
"bbox": [x0, y0, x1, y1],
"title_feats": {
"font": "FONT_NAME",
"height": FONT_SIZE
}
},
"texts": [
{
"id": "text-id",
"label": "Text",
"content": "Paragraph text",
"bbox": [x0, y0, x1, y1]
}
],
"tables": [
{
"id": "table-id",
"label": "Table",
"columns": ["Column A", "Column B"],
"data": [
["Value A1", "Value B1"],
["Value A2", "Value B2"]
],
"description": "Table description",
"bbox": [x0, y0, x1, y1]
}
],
"images": [
{
"id": "image-id",
"label": "Picture",
"description": "Image description",
"extracted_text": "Raw text found inside the image",
"src": "IMAGE_URL",
"bbox": [x0, y0, x1, y1]
}
],
"subsections": [],
"supersection": [],
"level": 1
}
}
}
}
}
}
Table representation in JSON:
-
Tables are normalized into a regular grid (
columns+data), so each row has the same number of cells. -
Cells coming from
rowspan/colspanare repeated to fill the covered grid positions. -
This is intentional: duplication keeps row/column relationships explicit in a fixed grid, making downstream parsing and indexing simpler and more consistent.
Example "Try it out!"
curl -X 'POST' \
'https://neurolinker.api.ainexxo.com/v1/documents/json' \
-H 'accept: application/json' \
-H 'Authorization: Bearer nl_********************************' \
-H 'Content-Type: application/json' \
-d '{
"document_ids": [
"1d6986aa-80ac-4a4d-9009-5a323c9046b1"
]
}'
POST /api/v1/documents/images
Parameters
| Name | In | Type | Description |
|---|---|---|---|
document_ids |
body | array of strings | Ids of the documents |
Response
200
Response Schema
| Name | Type | Description |
|---|---|---|
success |
boolean | |
results |
array of result objects | |
total |
int | Total number of documents requested |
successful |
int | Number of successfully retrieved documents |
failed |
int | Number of failed retrievals |
total_images |
int | Total number of images retrieved across all documents |
message |
string | Human-readable status message |
Result object Schema
| Name | Type | Description |
|---|---|---|
document_id |
string | |
format |
string | equals "images" |
total_images |
int | |
images |
array of objects | See Schema ImageResult |
ImageResult Schema
| Name | Type | Description |
|---|---|---|
filename |
string | |
storage_path |
string | equals "images" |
url |
string |
{
"success": true,
"results": [
{
"document_id": "string",
"format": "images",
"total_images": 0,
"images": [
{
"filename": "string",
"storage_path": "string",
"url": "string"
}
]
}
//...
],
"total": 0,
"successful": 0,
"failed": 0,
"total_images": 0,
"message": "string"
}
Example "Try it out!"
curl -X 'POST' \
'https://neurolinker.api.ainexxo.com/v1/documents/images' \
-H 'accept: application/json' \
-H 'Authorization: Bearer nl_********************************' \
-H 'Content-Type: application/json' \
-d '{
"document_ids": [
"1d6986aa-80ac-4a4d-9009-5a323c9046b1"
]
}'
POST/api/v1/documents/page-summaries
Parameters
| Name | In | Type | Description |
|---|---|---|---|
document_ids |
body | array of strings | Ids of the documents |
Response
200
Response Schema
| Name | Type | Description |
|---|---|---|
success |
boolean | |
results |
array of result objects | |
total |
int | Total number of documents requested |
successful |
int | Number of successfully retrieved documents |
failed |
int | Number of failed retrievals |
message |
string | Human-readable status message |
Result object Schema
| Name | Type | Description |
|---|---|---|
document_id |
string | |
filename |
string | |
storage_path |
string | |
download_url |
string | |
file_size |
int | |
content |
string | Markdown Content |
{
"success": true,
"results": [
{
"document_id": "string",
"filename": "string",
"storage_path": "string",
"download_url": "string",
"file_size": 0,
"content": "string"
}
//...
],
"total": 0,
"successful": 0,
"failed": 0,
"message": "string"
}
Example "Try it out!"
curl -X 'POST' \
'https://neurolinker.api.ainexxo.com/v1/documents/page-summaries' \
-H 'accept: application/json' \
-H 'Authorization: Bearer nl_********************************' \
-H 'Content-Type: application/json' \
-d '{
"document_ids": [
"1d6986aa-80ac-4a4d-9009-5a323c9046b1"
]
}'
POST /api/v1/documents/document-summary
Returns a global document summary as a Markdown string. Use summary_type to choose between a page-based or section-based summary.
Parameters
| Name | In | Type | Description |
|---|---|---|---|
document_ids |
body | array of strings | Ids of the documents |
summary_type |
body | string | page (page-based) or section (section-based) |
Response
200
Response Schema
| Name | Type | Description |
|---|---|---|
success |
boolean | |
results |
array of result objects | |
total |
int | Total number of documents requested |
successful |
int | Number of successfully retrieved documents |
failed |
int | Number of failed retrievals |
message |
string | Human-readable status message |
Result object Schema
| Name | Type | Description |
|---|---|---|
document_id |
string | |
filename |
string | |
storage_path |
string | |
download_url |
string | |
file_size |
int | |
content |
string | Markdown content |
{
"success": true,
"results": [
{
"document_id": "string",
"filename": "string",
"storage_path": "string",
"download_url": "string",
"file_size": 0,
"content": "string"
}
//...
],
"total": 0,
"successful": 0,
"failed": 0,
"message": "string"
}
Example "Try it out!"
curl -X 'POST' \
'https://neurolinker.api.ainexxo.com/api/v1/documents/document-summary' \
-H 'accept: application/json' \
-H 'Authorization: Bearer nl_********************************' \
-H 'Content-Type: application/json' \
-d '{
"document_ids": [
"1d6986aa-80ac-4a4d-9009-5a323c9046b1"
],
"summary_type": "page"
}'
POST /api/v1/documents/section-summaries
Returns a summary for each individual section of the document.
How section-based summarization works:
-
Sections are processed in document order (which follows the detected heading hierarchy and reading flow).
-
Example hierarchy:
# section 1->## section 2->### section 3, then# section 4,# section 5->## section 6, then# section 7. -
In this case, summaries are generated section by section in that sequence (
section 1,section 2,section 3, and so on). -
The response keys are section IDs from the extracted document structure.
Parameters
| Name | In | Type | Description |
|---|---|---|---|
document_ids |
body | array of strings | Ids of the documents |
Response
200
Response Schema
| Name | Type | Description |
|---|---|---|
success |
boolean | |
results |
array of result objects | |
total |
int | Total number of documents requested |
successful |
int | Number of successfully retrieved documents |
failed |
int | Number of failed retrievals |
message |
string | Human-readable status message |
Result object Schema
| Name | Type | Description |
|---|---|---|
document_id |
string | |
filename |
string | |
storage_path |
string | |
download_url |
string | |
file_size |
int | |
content |
object | JSON object where each key is a section ID and the value is a Markdown-formatted summary of that section |
{
"success": true,
"results": [
{
"document_id": "string",
"filename": "string",
"storage_path": "string",
"download_url": "string",
"file_size": 0,
"content": {
"0": "**Main Topic and Purpose**\nSummary of the first section...",
"1": "**Main Topic and Purpose**\nSummary of the second section...",
"2": "..."
}
}
//...
],
"total": 0,
"successful": 0,
"failed": 0,
"message": "string"
}
Example "Try it out!"
curl -X 'POST' \
'https://neurolinker.api.ainexxo.com/api/v1/documents/section-summaries' \
-H 'accept: application/json' \
-H 'Authorization: Bearer nl_********************************' \
-H 'Content-Type: application/json' \
-d '{
"document_ids": [
"1d6986aa-80ac-4a4d-9009-5a323c9046b1"
]
}'
POST /api/v1/make-zip
Creates a ZIP archive of the processed output for a given request and returns a signed download URL (valid for 1 hour).
Parameters
| Name | In | Type | Description |
|---|---|---|---|
request_id |
body | string | The job/request identifier |
document_id |
body | string (optional) | Restricts the archive to a single document within the request |
local_images |
body | boolean | If true, rewrites Firebase Storage URLs to local ./filename references. Defaults to false |
content_types |
body | array of strings (optional) | Filter by content type: text, formula, tables, images. PNGs included only if images is selected. If omitted, all content is included. |
{
"request_id": "string",
"document_id": "string",
"local_images": false,
"content_types": ["text", "formula", "tables", "images"]
}
Response
200
| Name | Type | Description |
|---|---|---|
success |
boolean | |
url |
string | Signed URL to download the ZIP (expires in 1 hour) |
message |
string | Human-readable status message |
400 — Request or document not yet completed
404 — No files found for the given request/document