API Reference¶
Details on API request and response protocols for the ASR, MT, TTS and Video Interpolation services have been provided here.
ASR (Speech-to-Text)¶
The endpoint for ASR service is https://asr.iitm.ac.in/internal/asr/decode
Request keys¶
| Key | Description |
|---|---|
file |
the media file to be transcribed |
language |
the language of the source audio/video in all lowercase (eg:
hindi, english) |
vtt (optional) |
whether a webVTT caption file has to be generated. This is an
optional value. It accepts two string values either true or
false. By default, this is false. This can be used for
captioning purposes. |
Response keys¶
Upon successful service of the request, the API returns a JSON response with the following keys:
| Key | Description |
|---|---|
status |
success |
time_taken |
time taken to transcribe the given audio/video in seconds |
transcript |
the transcription of the given speech input, as infered by the deployed model |
vtt |
WebVTT caption if it was requested. i.e, if the vtt key was set to
true in the request, a WebVTT caption would be returned. |
In case of a service failure, the API returns a JSON response with the following keys:
| Key | Description |
|---|---|
status |
failure |
reason |
a reason for the failure in serving the request |
Supported Languages¶
- Bengali
- English
- Gujarati
- Hindi
- Kannada
- Malayalam
- Marathi
- Odia
- Punjabi
- Sanskrit
- Tamil
- Telugu
- Urdu
Usage¶
Sample audio files to test the API: english speech, tamil speech.
The ASR API accepts media files from most of the common formats such as .mp3,
.mp4, .wav, .ogg etc.
Web Demo interface available at https://asr.iitm.ac.in/demo/asr
Created: March 24, 2023