Speech and Voice Recognition API API Reference

Speech APIs enable you to recognize speech and convert it to text using advanced machine learning, and also to convert text to speech.

Swagger OpenAPI Specification | .NET Framework Client | .NET Core Client | Java Client | Node.JS Client | Python Client | Drupal Client

API Endpoint
https://api.cloudmersive.com
Schemes: https
Version: v1

Authentication

Apikey

API Key Authentication

type
apiKey
name
Apikey
in
header

Recognize

Recognize audio input as text using machine learning

POST /speech/recognize/file


Uses advanced machine learning to convert input audio, which can be mp3 or wav, into text.



speechFile: file
in formData

Speech file to perform the operation on. Common file formats such as WAV, MP3 are supported.

Code Example:
Response Content-Types: application/json, text/json, application/xml, text/xml
Response Example (200 OK)
{
  "TextResult": "string"
}

Speak

Perform text-to-speech on a string

POST /speech/speak/text/voice/basic/audio


Takes as input a string and a file format (mp3 or wav) and outputs a wave form in the appropriate format.



String input request

Code Example:
Request Content-Types: application/json, text/json, application/xml, text/xml, application/x-www-form-urlencoded
Request Example
{
  "Format": "string",
  "Text": "string"
}
200 OK

OK

type
object
Response Content-Types: application/json, text/json, application/xml, text/xml
Response Example (200 OK)
"object"

Perform text-to-speech on a string

POST /speech/speak/text/basicVoice/{format}


Takes as input a string and a file format (mp3 or wav) and outputs a wave form in the appropriate format.



The text you would like to conver to speech. Be sure to surround with quotes, e.g. "The quick brown fox jumps over the lazy dog."

format: string
in path

File format to generate response in; possible values are "mp3" or "wav"

Code Example:
Request Content-Types: application/json, text/json, application/xml, text/xml, application/x-www-form-urlencoded
Request Example
"string"
200 OK

OK

type
object
Response Content-Types: application/octet-stream
Response Example (200 OK)
"object"

Schema Definitions

SpeechRecognitionResult: object

Result of recognizing speech

TextResult: string

Recognition result in text format

Example
{
  "TextResult": "string"
}

TextToSpeechRequest: object

Input to a Text To Speech request

Format: string

File format for output audio file: wav or mp3, default is mp3

Text: string

Text to be converted to speech

Example
{
  "Format": "string",
  "Text": "string"
}