Your entire workflow is as follows:
- Receive an accessible URL in your audio/video file. You are able to do so through any type of on-line storage or add the file to AssemblyAI’s server.
- Make a POST name to the transcription API with each the
content_safetyparameters. This can begin the transcription.
- Make a GET name to the transcription API to examine if the method is full. It’s best to get the ultimate output when the transcription is completed.
Add a Native Audio File (Non-obligatory)
When you wouldn’t have an accessible URL, you should use the next step to add your native audio file to the web storage offered by AssemblyAI.
In your working listing, create a brand new file referred to as
upload_file.py . Then, fill it with the next code snippet (substitute the
filename variables based mostly in your use case):
Proceed by working the next command in your terminal:
It is going to add your audio file as chunks to the server and return a JSON output as soon as the add is accomplished.
You’ll use this accessible URL in a while for transcription.
Make POST Name to Transcript API
Subsequent, proceed by creating a brand new Python file referred to as
transcribe.py. Append the next code which makes a POST name through AssemblyAI’s transcript API. Keep in mind to switch the
audio_url variables. One factor to notice is that the
content_safety parameter have to be explicitly set to
True to allow content material moderation.
Apart from that, you possibly can management the brink for content material moderation. By default, it’s set to 50 however you possibly can simply modify it by including a further
content_safety_confidence parameter to the
Begin the transcription course of by working the next command:
It’s best to get a JSON-formatted output as follows:
"content_safety": true, ...
A very powerful key-value pairs are:
id— signify the distinctive identifier in your course of. This id is required when calling the GET API in a while to get the ultimate output
standing— signifies the progress of your transcription
Within the occasion the place you obtained
error because the standing, it might attributable to one of many following causes:
- Unsupported audio file format
- Audio file didn’t comprise audio knowledge
- Audio file was too quick (<200 milliseconds)
- URL of audio file is unreachable
- An error on API aspect
Make GET Name to Transcript API
You’ll want to make one other API name to the identical transcript API through GET HTTP. That is to examine the transcription course of and whether it is accomplished, you’re going to get the ultimate output that comes with info for content material moderation.
Additionally, the transcription course of could take as much as 10 minutes relying on the size of your file. Create a brand new Python file referred to as
transcribe_file.py with the next code:
id variables accordingly and run the next command:
The JSON output will comprise
content_safety_labels key with the next objects:
"text": "Yes, that's it. Why does that happen? By calling off the Hunt, your brain can stop persevering on the ugly sister. Giving the correct set of neurons a chance to be activated. Tip of the tongue, especially blocking on a person's name, is totally normal. 25 year olds can experience several tip of the tongues a week, but young people don't sweat them, in part because old age, memory loss and Alzheimer's are nowhere on their radars.",
consequence: a listing of dictionaries for the next objects (textual content, labels, timestamp).
textual contentrepresents the transcription of the audio beneath content material moderation. In the meantime,
labelsis a listing of flagged content material with the next key-value pairs (label, confidence, severity).
timestampaccommodates the beginning and finish time (milliseconds) for the corresponding transcription.
abstract: a dictionary for every detected label. Every label containing a floating level represents the confidence rating in relation to your entire audio file.
severity_score_summary: a dictionary for every detected label. Every label containing a floating level represents severity rating in relation to your entire audio file.
On the time of this writing, the API at the moment helps the next labels:
- Firm Financials
- Crime Violence
- Hate Speech
- Well being Points
- Pure Disasters
- Detrimental Information
- NSFW (Grownup Content material)
- Delicate Social Points
Please be famous that confidence rating and severity rating are totally different though each scale from 0 to 1.
Confidence rating represents the perceived accuracy of the prediction made by the AI mannequin whereas severity rating is the extremity worth for the label. For instance, pure disasters or accidents with mass casualties will end in a better severity rating (0.8 to 1.0) whereas a minor automobile accident may simply yield (0.1 to 0.2).
Because of this, you should use the data offered by the API to reasonable the content material associated to audio/video in your platform.