I wanted to share the general workflow path for this particular client's set of courses. I'll stress that this is not necessarily typical for most voiceover work, but this is what has proved most efficient for everyone involved in this project.
The overall goal here is to replace and improve the audio for a series of tech course videos. We (unfortunately) can't change the actual content, just the presentation.
I describe the process below as if the entire course was being handed from one party to the next, but in fact, we've found that it's much more efficient, as well as being far better for managing audio quality, to divide everything up into very tiny chunks of about 12 minutes or so each. There's also a great motivation factor here, as each party tends to gently nudge the next, (even if not done so explicitly), to "keep the line moving".
The client had originally hired an IT professional to create and record several video courses last year. The courses seem to be selling well on Udemy, but suffice to say that students had raised various concerns about being able to comprehend the instructor, and the client ultimately decided to retain the videos, and replace the corresponding audio with my voice.
(This is a rather stunning parallel to how I managed to get my very first professional job teaching Computer Science at Rutgers University in 1983, when I was 18, for an essentially identical circumstance.)
The client hired various outsourced transcriptionists to convert the audio to written scripts, but for a few reasons, these end up being just the first pass of three. When the work gets handed over to us, Cat Lady then works a second pass to make transcript corrections, cleans up spelling and grammar, and generally tries to weave in some sense of meaning and order through the judicious use of punctuation and common sense. Then it's my turn; she and I bounce ideas back and forth for how to comprehend difficult portions of the audio, and often my third pass will catch some tech concepts that I can draw from my own experiences with Linux and other software technologies.
Finally the script's sufficiently comprehensible that I can actually record the audio, although I'll be the first to admit that I often still have very little grip on the actual meaning of the sentences that I am speaking. My job's just to make them sound good. :-) I also fine-tune the script, (a fourth pass), as I read it aloud into Audacity, since some things just become more clear when you hear yourself actually speak the words.
I then outsource the audio editing of my recording sessions. The editor cleans up the sound, removes cat noises and bad takes, adjusts and evens up volume over time, and stitches everything together into a nearly final version of the audio.
Then it comes back to us. Cat Lady now reviews the audio for quality control, looking for any glitches or errors. At this point, we're (collectively) pretty consistent, and the audio is usually 99.9% correct, with an average of about one mistake per thousand words: Either I added or removed or swapped a word, or just said it wrong. Oftentimes the word in question is a bridge word or otherwise inconsequential to the meaning of what's being said, and so we have the liberty to simply change the script to match the audio in those cases. (I'll also typically add to the script a few commas or whatever, a fifth pass, to match the final audio's cadence.)
Otherwise, the audio has to be fixed, and that can usually be done with a very careful cut-and-paste of a few tenths of a second, using Audacity, rather than having to re-record it. (Re-recording brings its own issues, as it's sometimes hard to smoothly match the original audio's sound.)
Once we're (finally) satisfied that we've done as good a job with the audio as we possibly can, I ship the audio and corresponding (final) transcript back to the client.
The client will then take those two files, and outsource yet more workers to stitch them back into the original video files. Thankfully, there's no live action, so they already know to slow down the video so as to match the words that I record to the points in the video corresponding to where the initial instructor had originally said them.
Finally, the client has a separate outsourced team that's converting the corrected transcriptions into closed captions for the new versions of the videos.
Whew! I sure do hope that all of this is worth it! :-) We've been doing this since November, and at the rate we're going, it'll probably extend until around August, or possibly longer.
Update: Here's the spreadsheet we're using to manage this.