ocrapi API Reference

The powerful Optical Character Recognition (OCR) APIs let you convert scanned images of pages into recognized text.

Swagger OpenAPI Specification | .NET Framework Client | .NET Core Client | Java Client | Node.JS Client | Python Client | Drupal Client

API Endpoint
https://api.cloudmersive.com
Schemes: https
Version: v1

Authentication

Apikey

API Key Authentication

type
apiKey
name
Apikey
in
header

ImageOcr

Convert a scanned image into text

POST /ocr/image/toText


Converts an uploaded image in common formats such as JPEG, PNG into text via Optical Character Recognition. This API is intended to be run on scanned documents. If you want to OCR photos (e.g. taken with a smart phone camera), be sure to use the photo/toText API instead, as it is designed to unskew the image first.



imageFile: file
in formData

Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.

language: string
in header

Optional, language of the input document, default is English (ENG). Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)

preprocessing: string
in header

Optional, preprocessing mode, default is 'Auto'. Possible values are None (no preprocessing of the image), and Auto (automatic image enhancement of the image before OCR is applied; this is recommended).

Code Example:
Response Content-Types: application/json, text/json, application/xml, text/xml
Response Example (200 OK)
{
  "MeanConfidenceLevel": "number (float)",
  "TextResult": "string"
}

Convert a scanned image into words with location

POST /ocr/image/to/words-with-location


Converts an uploaded image in common formats such as JPEG, PNG into words/text with location information and other metdata via Optical Character Recognition. This API is intended to be run on scanned documents. If you want to OCR photos (e.g. taken with a smart phone camera), be sure to use the photo/toText API instead, as it is designed to unskew the image first.



imageFile: file
in formData

Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.

language: string
in header

Optional, language of the input document, default is English (ENG). Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)

Code Example:
Response Content-Types: application/json, text/json, application/xml, text/xml
Response Example (200 OK)
{
  "Successful": "boolean",
  "Words": [
    {
      "WordText": "string",
      "LineNumber": "integer (int32)",
      "WordNumber": "integer (int32)",
      "XLeft": "integer (int32)",
      "YTop": "integer (int32)",
      "Width": "integer (int32)",
      "Height": "integer (int32)",
      "ConfidenceLevel": "number (double)",
      "BlockNumber": "integer (int32)",
      "ParagraphNumber": "integer (int32)",
      "PageNumber": "integer (int32)"
    }
  ]
}

Convert a scanned image into words with location

POST /ocr/image/to/lines-with-location


Converts an uploaded image in common formats such as JPEG, PNG into lines/text with location information and other metdata via Optical Character Recognition. This API is intended to be run on scanned documents. If you want to OCR photos (e.g. taken with a smart phone camera), be sure to use the photo/toText API instead, as it is designed to unskew the image first.



imageFile: file
in formData

Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.

language: string
in header

Optional, language of the input document, default is English (ENG). Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)

Code Example:
Response Content-Types: application/json, text/json, application/xml, text/xml
Response Example (200 OK)
{
  "Successful": "boolean",
  "Lines": [
    {
      "LineText": "string",
      "Words": [
        {
          "WordText": "string",
          "LineNumber": "integer (int32)",
          "WordNumber": "integer (int32)",
          "XLeft": "integer (int32)",
          "YTop": "integer (int32)",
          "Width": "integer (int32)",
          "Height": "integer (int32)",
          "ConfidenceLevel": "number (double)",
          "BlockNumber": "integer (int32)",
          "ParagraphNumber": "integer (int32)",
          "PageNumber": "integer (int32)"
        }
      ]
    }
  ]
}

Convert a photo of a document into text

POST /ocr/photo/toText


Converts an uploaded photo of a document in common formats such as JPEG, PNG into text via Optical Character Recognition. This API is intended to be run on photos of documents, e.g. taken with a smartphone and supports cases where other content, such as a desk, are in the frame and the camera is crooked. If you want to OCR a scanned image, use the image/toText API call instead as it is designed for scanned images.



imageFile: file
in formData

Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.

language: string
in header

Optional, language of the input document, default is English (ENG). Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)

Code Example:
Response Content-Types: application/json, text/json, application/xml, text/xml
Response Example (200 OK)
{
  "MeanConfidenceLevel": "number (float)",
  "TextResult": "string"
}

PdfOcr

Converts an uploaded image in common formats such as JPEG, PNG into text via Optical Character Recognition.

POST /ocr/pdf/toText


imageFile: file
in formData

Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.

language: string
in header

Optional, language of the input document, default is English (ENG). Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)

preprocessing: string
in header

Optional, preprocessing mode, default is 'Auto'. Possible values are None (no preprocessing of the image), and Auto (automatic image enhancement of the image before OCR is applied; this is recommended).

Code Example:

OK

Response Content-Types: application/json, text/json, application/xml, text/xml
Response Example (200 OK)
{
  "Successful": "boolean",
  "OcrPages": [
    {
      "PageNumber": "integer (int32)",
      "MeanConfidenceLevel": "number (float)",
      "TextResult": "string"
    }
  ]
}

Preprocessing

Detect and unrotate a document image

POST /ocr/preprocessing/image/unrotate


Detect and unrotate an image of a document (e.g. that was scanned at an angle). Great for document scanning applications; once unskewed, this image is perfect for converting to PDF using the Convert API or optical character recognition using the OCR API.



imageFile: file
in formData

Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.

Code Example:
200 OK

OK

type
object
Response Content-Types: application/json, text/json, application/xml, text/xml
Response Example (200 OK)
"object"

Detect and unskew a photo of a document

POST /ocr/preprocessing/image/unskew


Detect and unskew a photo of a document (e.g. taken on a cell phone) into a perfectly square image. Great for document scanning applications; once unskewed, this image is perfect for converting to PDF using the Convert API or optical character recognition using the OCR API.



imageFile: file
in formData

Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.

Code Example:
200 OK

OK

type
object
Response Content-Types: application/json, text/json, application/xml, text/xml
Response Example (200 OK)
"object"

Schema Definitions

ImageToTextResponse: object

Response from an OCR to text operation. Includes the confience rating and converted text result.

MeanConfidenceLevel: number (float)

Confidence level rating of the OCR operation; ratings above 80% are strong.

TextResult: string

Converted text string from the image input.

Example
{
  "MeanConfidenceLevel": "number (float)",
  "TextResult": "string"
}

ImageToWordsWithLocationResult: object

Result of an image to words-with-location OCR operation

Successful: boolean
Words: OcrWordElement
OcrWordElement
Example
{
  "Successful": "boolean",
  "Words": [
    {
      "WordText": "string",
      "LineNumber": "integer (int32)",
      "WordNumber": "integer (int32)",
      "XLeft": "integer (int32)",
      "YTop": "integer (int32)",
      "Width": "integer (int32)",
      "Height": "integer (int32)",
      "ConfidenceLevel": "number (double)",
      "BlockNumber": "integer (int32)",
      "ParagraphNumber": "integer (int32)",
      "PageNumber": "integer (int32)"
    }
  ]
}

OcrWordElement: object

A single word in an OCR document

WordText: string

Text of the word

LineNumber: integer (int32)

Line number of the word

WordNumber: integer (int32)

Index of the word in the line

XLeft: integer (int32)

X location of the left edge of the word in pixels

YTop: integer (int32)

Y location of the top edge of the word in pixels

Width: integer (int32)

Width of the word in pixels

Height: integer (int32)

Height of the word in pixels

ConfidenceLevel: number (double)

Confidence level of the machine learning result; possible values are 0.0 (lowest accuracy) - 1.0 (highest accuracy)

BlockNumber: integer (int32)

Index of the containing block

ParagraphNumber: integer (int32)

Index of the containing paragraph

PageNumber: integer (int32)

Index of the containing page

Example
{
  "WordText": "string",
  "LineNumber": "integer (int32)",
  "WordNumber": "integer (int32)",
  "XLeft": "integer (int32)",
  "YTop": "integer (int32)",
  "Width": "integer (int32)",
  "Height": "integer (int32)",
  "ConfidenceLevel": "number (double)",
  "BlockNumber": "integer (int32)",
  "ParagraphNumber": "integer (int32)",
  "PageNumber": "integer (int32)"
}

ImageToLinesWithLocationResult: object

Result of an image to lines-with-location OCR operation

Successful: boolean
Lines: OcrLineElement

Words in the image

OcrLineElement
Example
{
  "Successful": "boolean",
  "Lines": [
    {
      "LineText": "string",
      "Words": [
        {
          "WordText": "string",
          "LineNumber": "integer (int32)",
          "WordNumber": "integer (int32)",
          "XLeft": "integer (int32)",
          "YTop": "integer (int32)",
          "Width": "integer (int32)",
          "Height": "integer (int32)",
          "ConfidenceLevel": "number (double)",
          "BlockNumber": "integer (int32)",
          "ParagraphNumber": "integer (int32)",
          "PageNumber": "integer (int32)"
        }
      ]
    }
  ]
}

OcrLineElement: object

A contiguous line of text in an OCR document

LineText: string

Text of the line

Words: OcrWordElement

Word objects in the line

OcrWordElement
Example
{
  "LineText": "string",
  "Words": [
    {
      "WordText": "string",
      "LineNumber": "integer (int32)",
      "WordNumber": "integer (int32)",
      "XLeft": "integer (int32)",
      "YTop": "integer (int32)",
      "Width": "integer (int32)",
      "Height": "integer (int32)",
      "ConfidenceLevel": "number (double)",
      "BlockNumber": "integer (int32)",
      "ParagraphNumber": "integer (int32)",
      "PageNumber": "integer (int32)"
    }
  ]
}

PdfToTextResponse: object

Response from an OCR to text operation. Includes the confience rating and converted text result.

Successful: boolean
OcrPages: OcrPageResult
OcrPageResult
Example
{
  "Successful": "boolean",
  "OcrPages": [
    {
      "PageNumber": "integer (int32)",
      "MeanConfidenceLevel": "number (float)",
      "TextResult": "string"
    }
  ]
}

OcrPageResult: object

PageNumber: integer (int32)

Page number of the page that was OCR-ed, starting with 1 for the first page in the PDF file

MeanConfidenceLevel: number (float)

Confidence level rating of the OCR operation; ratings above 80% are strong.

TextResult: string

Converted text string from the image input.

Example
{
  "PageNumber": "integer (int32)",
  "MeanConfidenceLevel": "number (float)",
  "TextResult": "string"
}