Gray squares, gray text and fixing misaligned transcription

Reduct synchronizes your media with its transcript - word for word, syllable for syllable, to the millisecond.

Gray squares represent one second of silence or unrecognized audio. This is most often silence, but can also be periods of background sounds, music, or crosstalk between speakers. Reduct will also sometimes insert gray squares if it senses that the AI has "hallucinated" incorrect words (rather than giving you an incorrect transcript piece). Clicking on the squares will jump to different points of silence in your media, just as clicking on a word takes you to that point in the media.

Gray text indicates misalignment - meaning that the text and the audio are not perfectly synced up for that portion of the video, usually due to poor audio quality, background noise, mis-transcription, people speaking over each other, or a lack of enunciation by the speaker. If you see an animated grey treatment on a paragraph, this indicates that that alignment is in progress, and will clear up shortly.

If you see misaligned transcript, you can often get a better alignment by correcting the transcript to closely match the video (eg. re-inserting false starts, umm, ahs, etc.). Note that the recording transcript should match the audio from your media as closely as possible, in order for the transcript and the media to be correctly aligned.

Unwanted speech can be removed by adding your content to a reel, and removing the offending portions of your footage with strikethrough editing. Select the text you'd like to eliminate and click 'Cut'.

If you notice the transcript playhead rushing ahead of the video player during a portion of your media, this can often be fixed by selecting the affected portion of the transcript and making a small correction, such as adding punctuation or changing a couple of words. This will cause the media and transcript to be re-processed and aligned to be tried again. In some cases, it can help to combine paragraphs before making these changes, especially when there are rapid (and/or inaccurate) speaker changes).