It’s quite common to find interviews, podcasts or any other audio tracks full of noises.
Those could’ve been the result of a bad mic, a bad mic isolation or if the recording was outdoor with wind, rain, etc.
Human voice frequency range is between 300Hz – 3000Hz
There are a few ways to isolate that range, the easiest is to apply a lowpass an a highpass filter to cut all the noises out and enhance voices.
To do so we can use ffmpeg, first you can preview the results with ffplay:
ffplay INPUT -af lowpass=3000,highpass=200
and modify the 2 values as you wish.
With youtube-dl you can also preview your filters without downloading the track:
ffplay $(youtube-dl -g VIDEO-URL |sed 1d) -af lowpass=3000,highpass=200
Then apply the desired modifications to a new file with ffmpeg:
ffmpeg -i INPUT -af lowpass=3000,highpass=200 OUTPUT