Your entire workflow is as follows:
- Receive an accessible URL in your audio/video file. You are able to do so through any type of on-line storage or add the file to AssemblyAI’s server.
- Make a POST name to the transcription API with each the
audio_url
andcontent_safety
parameters. This can begin the transcription. - Make a GET name to the transcription API to examine if the method is full. It’s best to get the ultimate output when the transcription is completed.
Add a Native Audio File (Non-obligatory)
When you wouldn’t have an accessible URL, you should use the next step to add your native audio file to the web storage offered by AssemblyAI.
In your working listing, create a brand new file referred to as upload_file.py
. Then, fill it with the next code snippet (substitute the api_key
and filename
variables based mostly in your use case):
Proceed by working the next command in your terminal:
python upload_file.py
It is going to add your audio file as chunks to the server and return a JSON output as soon as the add is accomplished.
"upload_url": "https://cdn.assemblyai.com/add/ccbbbfaf-f319-4455-9556-272d48faaf7f"
You’ll use this accessible URL in a while for transcription.
Make POST Name to Transcript API
Subsequent, proceed by creating a brand new Python file referred to as transcribe.py.
Append the next code which makes a POST name through AssemblyAI’s transcript API. Keep in mind to switch the api_key
and audio_url
variables. One factor to notice is that the content_safety
parameter have to be explicitly set to True
to allow content material moderation.
Apart from that, you possibly can management the brink for content material moderation. By default, it’s set to 50 however you possibly can simply modify it by including a further content_safety_confidence
parameter to the knowledge
dictionary:
knowledge =
"audio_url": "...",
"content_safety": True,
"content_safety_confidence: 75
Begin the transcription course of by working the next command:
python transcribe.py
It’s best to get a JSON-formatted output as follows:
"id": "o6x66w9882-8075-4f74-b6c0-b63e9ce8596a"
"language_model": "assemblyai_default",
"acoustic_model": "assemblyai_default",
"language_code": "en_us",
"standing": "queued",
"audio_url": "https://bit.ly/3qDXLG8",
"content_safety": true, ...
A very powerful key-value pairs are:
id
— signify the distinctive identifier in your course of. This id is required when calling the GET API in a while to get the ultimate outputstanding
— signifies the progress of your transcription
Within the occasion the place you obtained error
because the standing, it might attributable to one of many following causes:
- Unsupported audio file format
- Audio file didn’t comprise audio knowledge
- Audio file was too quick (<200 milliseconds)
- URL of audio file is unreachable
- An error on API aspect
Make GET Name to Transcript API
You’ll want to make one other API name to the identical transcript API through GET HTTP. That is to examine the transcription course of and whether it is accomplished, you’re going to get the ultimate output that comes with info for content material moderation.
Additionally, the transcription course of could take as much as 10 minutes relying on the size of your file. Create a brand new Python file referred to as transcribe_file.py
with the next code:
Substitute the api_key
and id
variables accordingly and run the next command:
python transcribe_file.py
The JSON output will comprise content_safety_labels
key with the next objects:
content_safety_labels":
"standing": "success",
"outcomes": [
"text": "Yes, that's it. Why does that happen? By calling off the Hunt, your brain can stop persevering on the ugly sister. Giving the correct set of neurons a chance to be activated. Tip of the tongue, especially blocking on a person's name, is totally normal. 25 year olds can experience several tip of the tongues a week, but young people don't sweat them, in part because old age, memory loss and Alzheimer's are nowhere on their radars.",
"labels": [
"label": "health_issues",
"confidence": 0.8165678381919861,
"severity": 0.1607203334569931
],
"timestamp":
"begin": 390066,
"finish": 420714
],
"abstract":
"health_issues": 0.8558240338157502
,
"severity_score_summary":
"health_issues":
"low": 0.5602263590800425,
"medium": 0.4397736409199575,
"excessive": 0
, ...
standing
: bothsuccess
orunavailable
consequence
: a listing of dictionaries for the next objects (textual content, labels, timestamp).textual content
represents the transcription of the audio beneath content material moderation. In the meantime,labels
is a listing of flagged content material with the next key-value pairs (label, confidence, severity).timestamp
accommodates the beginning and finish time (milliseconds) for the corresponding transcription.abstract
: a dictionary for every detected label. Every label containing a floating level represents the confidence rating in relation to your entire audio file.severity_score_summary
: a dictionary for every detected label. Every label containing a floating level represents severity rating in relation to your entire audio file.
On the time of this writing, the API at the moment helps the next labels:
- Accidents
- Alcohol
- Firm Financials
- Crime Violence
- Medicine
- Playing
- Hate Speech
- Well being Points
- Manga
- Marijuana
- Pure Disasters
- Detrimental Information
- NSFW (Grownup Content material)
- Pornography
- Profanity
- Delicate Social Points
- Terrorism
- Tobacco
- Weapons
Please be famous that confidence rating and severity rating are totally different though each scale from 0 to 1.
Confidence rating represents the perceived accuracy of the prediction made by the AI mannequin whereas severity rating is the extremity worth for the label. For instance, pure disasters or accidents with mass casualties will end in a better severity rating (0.8 to 1.0) whereas a minor automobile accident may simply yield (0.1 to 0.2).
Because of this, you should use the data offered by the API to reasonable the content material associated to audio/video in your platform.