ai transcription

Robots and machines can’t truly understand what they see or hear. They can interpret it in ways that make it seem like they do, increasingly appearing just like us (sometimes a little too much – see What is the uncanny valley? National Geographic).

Say you’ve got a recording that you want converted into text. You might be thinking of using AI software to produce that text. It’s a great solution for some, not so much for others looking for accuracy. Let’s take a look at the ins and outs of this, as it’s not quite the simple solution it may seem to be.

Humans have been using language in one form or another to communicate for around two million years. Robots have been programmed to recognise speech since the 1950s and, naturally, voice recognition technology has been very slow to progress – particularly when compared with advancement in other areas of tech such as quantum computing, the cloud, or GPS.

Ever had trouble communicating with your voice-enabled personal assistant, or frustrating voice recognition phone call options asking you to say aloud your query? These niggling issues are still commonplace because AI software cannot – and probably never will – understand the meaning behind what is being said. It simply recognises the sounds being made and converts them into what it ‘thinks’ to be the correct spoken words. AI doesn’t understand cause and effect and it cannot correlate what is being said, for example, the words ‘lamp’ or ‘onion’ with real objects the way a human brain can.

AI voice recognition software also struggles with the following:

1. Background noise in the real world

The human ear can extract the pertinent parts of the speech from ambient and background noise. A robot cannot and will take all of the sounds into consideration. This is extremely unhelpful if you have a difficult audio recording, perhaps recorded in a busy café, or with people speaking over the top of one another, road noise, dogs barking, children screaming and crying, doors banging, planes flying overhead, the constant buzz of air conditioning units in large buildings, etc., (you get the idea).

In my 25-year career of transcribing (typing), the vast majority of the recorded spoken audio I’ve received is a challenge to complete because of background noise. We fully appreciate it’s not always possible to create a perfectly sound-proofed studio environment (this is a challenging task even for the experts, like podcasters and TV production sound engineers).

2. Speech rate

Humans are capable of understanding slow and fast speech, high- and low-pitched voices laced with emotions and expressions. Most of the ASR (automatic speech recognition) systems struggle when it comes to understanding a speech consisting of more than 200 words per minute.

A secretary or transcriber (aka a real live human being) can understand what is being said and, as such, is able to punctuate properly and edit appropriately. For example, if you’ve said an article or preposition incorrectly, for example, an experienced secretary or transcriber can easily edit this for you; a robot cannot do this – the AI will literally spit out everything that it thinks you have said verbatim, meaning more editing for you later. (Have you got time for this?)

3. Colloquialisms and slang

Your local staff familiar with the vernacular versus a robot: local staff for the win.

4. Local place names

As above! With so many local place names that might crop up from time to time within, say, property work or just everyday correspondence or medical reporting, the need for a secretary or transcriber to reference the correct spellings is frequent. They can actually do this for you; a robot cannot and all it can do is guess, and it will frequently be wrong!

5. Formatting issues

If you have complicated templates for reports or documents, numbered headings, etc., AI software cannot ‘profile’ this for you (profiling means inserting text into a specifically allocated part of a previously prepared document template).

6. Subtleties and nuance

Because a robot will never be able to comprehend and interpret the meaning behind what is being said on a human level, it’s impossible for them to understand vagaries and nuance, e.g. inserting paragraphs at natural or appropriate places within the text. You will need to do this yourself. (Do you have time for this?)

AI software can produce simple numbered paragraphs, 1. 2. 3., etc., but only on the page as you start dictating. It cannot follow your pre-existing formatting or insert text into a particular place in the document, meaning you would require to cut and paste the text under the numbering or paragraph headings later. (Do you have time for this?)

7. Naming individual speakers

A lot of the time it’s not only what was said but who said it within the recording that needs to be captured. At present, AI software isn’t able to name individual speakers with a great deal of accuracy, but it can kind of separate them in a very rough and ready way. This means, again, you will require to edit the transcript to add in the individual speaker names. (Do you have the time for this?!)

What happens when you choose AI to transcribe (i.e. the robot types for you)

To sum up the above, if you find yourself in a position of using AI for your written transcripts, these common fundamental issues may emerge:

  • Receipt of a very basic rough draft of your transcript text, potentially containing several errors that will require correction. Be aware that time (and a great deal of patience) will be spent on editing.
  • Manually inputting the formatting, because the AI cannot do this for you. This can be a very time-consuming process, not to mention quite technical, potentially throwing up formatting issues, e.g. numbering, line spacing, alignment – margins, tabs, etc.
  • Editing will be required to add or check each speaker name, because the AI cannot do this for you, or do it accurately. This will require you to listen through to the entire audio recording in order to capture each individual speaker, if total accuracy is required and if not only what was said but who said it is of importance to you or your work, project, or research.

When AI transcription might suffice

Here are a few instances of where AI software is a possible or viable option:

1. Simple, single speaker dictation that is clear

If you can dictate extremely clearly, without background noise, without a strong accent or mumbling, clipped audio, issues with recording equipment, etc., then by all means have a crack at it. There WILL still be a degree of editing required. (Don’t say I didn’t warn you.)

2. Affordability factor 

For those where it’s not possible or feasible to pay for a human transcription service to produce their transcripts it can be cheaper to use AI (but obviously more time-consuming in the long-run while you edit later, and you WILL require to edit later, depending on how much language accuracy, factual detail, and formatting you are looking for).

3. Helping people with disabilities to communicate

For people with dysgraphia or those unable to write due to disability, AI transcription software can be helpful to be able to get down thoughts and ideas on paper for editing by themselves or others later. (However, it’s worth nothing that often, people with speech, language, and communication difficulties can often struggle with AI voice recognition software – it simply isn’t inclusive for them in the general sense of the likes of commercially available standard packages or apps available on the market.) 

4. When accuracy is not too important

Again, if simply getting the basic thoughts and ideas from brainstorming sessions is all you need, and when accuracy in terms of general grammar, punctuation, proper English usage (and potentially some factual information within the document due to errors) is not so important to you, then AI might be the option for you.

 

Tell us about your experiences with AI software in the comments below. It’s certainly an interesting topic and we’d like to hear from you.

If you need assistance with an audio recording, ask us for our rates. We can assist with large and small projects and on both a regular and an ad hoc basis. There are no contracts and no minimum requirements.

We can also help to edit work that AI has completed poorly for you, or take over from AI entirely!  

Further reading:

AI Is Powerless Without HI: Why Human Intelligence Is Irreplaceable (Forbes) 

I'm Fiona, owner and founder of Outsource Typing. Based in the UK, we provide Virtual PA, audio typing and transcription services to businesses and individuals worldwide. Follow us on Twitter & Instagram.

Leave a Reply

Your email address will not be published. Required fields are marked *