How to Transcribe Audio to Text (Video Transcription Tutorial!)
- Are you looking to transcribe your videos or your podcasts so you can easily repurpose your content? Well in this video, we're gonna take a look at how to transcribe audio to text and the best options to do it, both free and paid, so that you're covered, no matter what your budget is. (electronic music) Hey it's Justin Brown here from PrimalVideo where we help entrepreneurs and business owners amplify their business and brand with video. If you're new here then make sure you click that subscribe button and all the links to everything we mention in this video you can find linked in the description box below. So let's jump into it.
Transcribing videos or podcasts is a great way to easily repurpose your existing content or simplify the process of creating video or blog descriptions
Now there's a lot of options out there when you are looking to transcribe speech to text in any format ranging from free transcription software to paid automated services and manual transcriptions as well and there can also be big differences in the accuracy of the output that you'll receive back as well as the cost of each option. Depending on your project and your budget, each solution does have it's place. I use a mix for different projects and depending on the level of accuracy required and how fast we actually need the transcription returned to us. So in this video, we'll run through our favorite options for transcribing videos on every budget and any project, no matter how accurate you need your results. While we're running through our picks, let me know in the comments if you get your audio or video transcribed and what your favorite transcription services are and why. There's new ones popping up all the time so your suggestion might help the rest of the community and I'd definitely be interested in seeing what else is out there, as well. So we're gonna start off with the paid services first and then move into the free options. Now it is important to note across all transcribing options and services, that transcribing works best the clearer the audio is. So ideally you're providing audio or video with minimal background noise and no music, so just talking videos, ideally to get the best results. Now obviously, that doesn't cover off every transcribing solution, but wherever possible, try to have your audio, the spoken parts, as clear as possible. So with the paid options, in most cases, it is the case of the more you pay or the more expensive the service offering is, the higher the accuracy of the results. So looking at the paid options, they really fall into two categories. Manual and automated. Now in this automated category, there's also two subcategories. So you can actually automate your transcribing through websites or you can actually use computer software. Obviously the websites are platforms where you just upload your documents to a website and that will process everything for you and you will be sent back your transcription. Software, on the other hand, is something that you'll need to pay for and download and install on your computer and run your transcribing through that, that way. So we'll start out, then, looking at the websites and with the websites you've got things like Temi or Spext and how these websites work is that once you've created an account, you can send through either a link to your video or you can upload your entire video or you can upload an MP3 or an audio version of your video for the website or web platform to transcribe your audio from there. So generally, again, with these software solutions, the turn around time is normally really, really fast as there's no human element in here, you're not gonna be waiting for someone to sit there, download, and play through your audio or your video file to type it all out. It is all automated, so in most cases with these services, your transcribing will actually start immediately. So they're really, really fast to get the file back to you at the end. But a big thing to note with these software platforms or services, is that they really need clear audio to be able to get better results or good results. If you've got things like background music, or there's a heap of background noise, if the people talking have strong accents or there's a lot of wind, or anything like that that's really gonna start to drown out your audio, the spoken parts, in your video, then it's gonna make it really hard for the software to decipher what you're actually saying. But on the plus side, besides being really fast, these software platforms are normally really cheap. So you've got Temi that starts at around 10 cents per minute and you've got Spext that starts around 25 cents per minute and there's a heap of other solutions out there as well, but these two would be my top recommendations of where you should start out. So for me, I'm a big fan of Temi and I'll use it on projects where accuracy isn't too important, but where it makes it handy, especially on long-form editing projects like documentaries or corporate work where it's handy to have a full transcription of the videos that you're working on, the videos that you're editing and cutting down to be able to quickly find things. Where did this person say this? Okay that's here, find that in the timeline really, really quickly. So for those sorts of things where it doesn't matter if there is a few typos or even a few sentences that are out of whack in there somewhere, then that's where Temi is perfect for those type of projects, but I wouldn't use it if I really needed a high level of accuracy on the transcription or if the videos I had already had background music or a heap of people speaking in there, it's just gonna be too complicated and the results aren't gonna be fantastic. The next option you've got for automatically transcribing your videos is to use software that you can install on your computer. Now really this software is just an interface between some of these web platforms, but it will give you a great amount of control.
In a lot of cases, it will also give you a higher level of accuracy, as well
Now once again, there are a few applications out there for doing this, but the stand out and the best one that I've seen in a long time is a plugin for Adobe Premier Pro so, sorry but this is only for Adobe Premier Pro users, but it's called Transcriptive and it's from a company called Digital Anarchy. It sells for $299, which may seem expensive, but stick with me here because this will give you up to 95% accuracy, it can transcribe 60 minutes of video in around 10 minutes which is insane, and because it's built into Adobe Premier, it means that the whole exporting and uploading and everything is automated through Adobe Premier.
And it means once you get your transcribed video back, it's already inside of Adobe Premier that you can use as either markers in your video, or reference points where it makes it so much easier to find segments in your video because it can be linked directly with time code to your video file, so it's an amazing tool
So how it works is it connects through to either IBM's A.I. engine called Watson, or a platform called Speechmatics. Now out of the two of those, Watson is less accurate but it actually gives you 1,000 minutes free transcribing per month and after that, if you're doing a heap, then it's only two cents per minute for any additional transcribing. The other option, Speechmatics is the much better solution. Now this is where you get up to that 95% accuracy and in all of our tests is actually works on noisy video files as well or video files with a heap of background noise and background music and I was really blown away with the accuracy on this.
And Speechmatics only works out at around seven cents per minute to transcribe your video files
So when you're using this transcriptive plugin, you can use it to easily create captions, subtitles, word documents, it works with multiple presenters as well, and it is really, really fast. We did a test on both Watson and on Speechmatics with one of our YouTube videos, just to see how it would go so it had the background music and everything in there just as one of these complete videos does, and the results were almost chalk and cheese. For any quiet parts where there was no background music, Watson did okay, but across the board, Speechmatics was far more accurate and it was actually much faster to do the transcribing as well.
It actually returned the 24 minute, 22 or 24 minute video back in under a couple of minutes, so that's just crazy
So obviously at that price point of $299 for this plugin and having it only work in Adobe Premier, it's a great solution if you're using Adobe Premier and doing a heap of long-form editing projects, one's where, literally, with a click of a button, you can transcribe your entire sequence or project that you're working on in a matter of minutes
So it's an amazing tool for transcribing your edits or your dailies for things like documentaries, short-films, long-form video projects or long-form sales videos where it's much easier to work off a paper edit with clients, or your team, than trying to find things manually inside of your editing project. But obviously, if you're not using Adobe Premier Pro, or you don't wanna spend the $299 upfront on your transcribing, then this won't be the best solution for you. So those are the paid options that we recommend you check out and there's a couple of free options as well. Now, obviously as I said in the beginning, the free ones definitely don't have the accuracy that you will have in the paid solutions, but they still might be enough to do exactly what you want. So first off, on the free side, is YouTube's Auto Transcribe function. Now how this works is when you upload your video to YouTube, it's not always instant, can take up to 12 hours, and on some cases I've heard it taking longer. YouTube will automatically transcribe your videos. Once that's actually done, you can log in and you can download that text that it's actually transcribed from your video and copy and past it into a word document or anywhere else that you wanna use it from there. So that's one way where you actually don't even need to do anything. The other way is to use something like Google Voice or Siri to do the transcribing for you.
Now we did do a video quite a long time ago
It was one of the first videos on this channel I think. Talking about transcribing your videos using Google Voice and Google Docs. If you haven't seen it yet, I'll put a link up in the cards,
so all you need to do is open some sort of text app, a notes app or some sort of word processing app on your smartphone, and instead of typing in on the keyboard, you'll press the microphone button, which is for voice input
So all you need to do then is press that microphone button, hold your phone up to a computer or something that is playing through the video that you wanna transcribe and the transcribing will start. Now obviously here, this is again, really dependent on the amount of background noise in the area that you're going to be recording through, but also in the video, itself.
If there is a heap of music in it, then this really isn't going to work too well, but if the music is pretty quiet, or there's no music at all, and minimal background noise, then you can actually get some pretty decent results
So that's using your phone, but in most cases you can actually get better results using your desktop computer, as well.
Now this is something that we've found works best in Google Chrome and all you need to do is open up Google Drive, create a new Google Document, select Tools at the top and then go down to voice typing
Then all you need to do is hit that microphone and then hit play on a video that is also open on your computer and the transcribing will start. Now depending on how your computer is set up and how your microphones are set up, in some cases, you might get better results playing your video file through your smartphone and holding that up near your microphone on your computer and transcribing that way. One of the biggest thing we've seen with comments on our previous video talking through this, was that if you have any sort of accent or if your Google account isn't set to English as your primary language, then you may not get good results with this process or in some cases, you may not have access to the feature at all. So again, you need to be using Chrome on Mac or PC and you need to have your primary language setting set to English in your Google account. So those are our top recommendations for automated solutions when it comes to transcribing. So now we're gonna look at the manual options. These are the ones that are done by a human. The biggest benefit here is the higher level of accuracy.
But the downside is that generally these will take longer to do
So for this, your first option is to head over to Fiverr or Upwork.com and list a job and there'll be heaps of people on there that are willing to transcribe your video for not a lot of money, but really in our experience, we found some really, really good ones, and really cheap ones, and in other cases, the communication has been terrible and the end result hasn't been that great either. So they definitely can be hit and miss.
Now there are lots of dedicated website to this as well, where your transcribing will be done by a human and with some sort of quality control in there as well and our pick is currently Rev
com.
So once again, how it works is that you'll create an account with Rev
com. You'll either link to your video, you'll upload your video, or you will upload an audio file that you wanna transcribe. You'll submit that to Rev.com as a job and within 12 hours, you'll have your transcribed audio or video back to you. So the big, big pros with this is that it is done by a human, so we've already said the accuracy is much, much better.
There is quality control in here
There is a review process, if you're not happy you can go back and get them to make changes. The range or the options that you have on outputs, you can request different versions. Some with time codes, some as closed captions, some as subtitles, the options that you have to receive your transcribed files back is huge. Even if you want a word document that's laid out in a particular way, then they would do that as well. Because you actually get to provide a brief with your project or task telling them exactly what you want. So the cost for Rev.com is $1 per minute. So it is a little bit more expensive than the other automated options, but given the fact that there is that quality control, that you are actually getting your audio or your video transcribed by a real person that can understand English and different accents is huge. And your end product is, in most cases, going to be a lot better.
And also, for any video or audio that has multiple speakers or multiple presenters, I'll hands down go with Rev
com because of that much better accuracy. The moment you're throwing in different speakers and different accents and things, it does make a huge difference, even though some of those automated services did say they supported multiple people, multiple speakers, the output definitely isn't as good when you've got them in your video or audio. Now we do actually have a full walk-through on how we use Rev.com for our YouTube videos and I will put a link up in the cards as well.
So as you can see there's quite a few options out there to get your videos or your audio files transcribed either automatically, manually, or done through your phone
So for us, we really just use a mix of those solutions depending on what we need, the accuracy that we need and how fast we need that transcription back. So for something really, really quick, where if the accuracy isn't 100%, doesn't really matter, we can make those minor changes, I would either use my smartphone, or use Google Docs to do the transcribing, or I would use Temi for 10 cents per minute. For really large video editing projects where the accuracy doesn't need to be 100%, but we're gonna be constantly changing and updating those documents to be able to create new ones really, really fast off the edits as we're going, I would hands down use the transcriptive plugin for Adobe Premier Pro and I would use Speechmatics inside of that. And for anything where we 100% need the accuracy, for anything that's going out public, instead of just internal editing or working with teams, then it hands down goes to Rev.com. So while it does take a little bit longer than some of these really fast online solutions, I'm happy to wait because it's not that long, but the level of accuracy, and also the level of formats and control that you have over the transcribing process is really second to none and you can really get almost anything you want created by them once they've actually done the transcribing.
So whether it is a subtitle, captions, word document, formatted exactly how you want it, then, yeah, it's Rev
com. And even for every video on this YouTube channel all our subtitles or captions are actually done through Rev.com as well. Now transcribing your YouTube content can also help you boost your YouTube rankings. We put together a video which is linked on screen that explains how it works and why you should be doing it and exactly how we create subtitles for our YouTube content to help you get started.
So make sure you check it out and I'll see you soon
Tags: get transcript of youtube video
Watch, read, educate! © 2021