To download subtitles from youtube, you can use youtube-dl (this also gets audio, images, metadata):
youtube-dl -o ./data/mXC3xGZWo_M"/%(id)s.%(ext)s" -x --sub-lang en \
--write-sub --sub-format vtt --convert-subtitles srt --write-auto-sub \
--continue --write-info-json --write-description --write-annotations \
--min-filesize 50k --ignore-errors --write-all-thumbnails --no-call-home \
--audio-format mp3 mXC3xGZWo_M
This set of arguments will do it’s best to get you something – get SRT formatted subtitles if available, if not, try to convert them to SRT. It also tries to get real closed captioning if available, and if not it pulls the automatically generated ones from the speech-to-text software Youtube runs.
I found in doing this that youtube-dl did not handle the conversion properly. I would much prefer the SRT format, as it’s much simpler to parse.
[youtube] mXC3xGZWo_M: Downloading webpage
[youtube] mXC3xGZWo_M: Downloading video info webpage
[youtube] mXC3xGZWo_M: Extracting video information
[youtube] mXC3xGZWo_M: Looking for automatic captions
[youtube] mXC3xGZWo_M: Searching for annotations.
[youtube] mXC3xGZWo_M: Downloading MPD manifest
[info] Writing video description to: data\mXC3xGZWo_M\mXC3xGZWo_M.description
[info] Writing video annotations to: data\mXC3xGZWo_M\mXC3xGZWo_M.annotations.xm
l
[info] Writing video subtitles to: data\mXC3xGZWo_M\mXC3xGZWo_M.en.vtt
[info] Writing video description metadata as JSON to: data\mXC3xGZWo_M\mXC3xGZWo
_M.info.json
[youtube] mXC3xGZWo_M: Downloading thumbnail ...
[youtube] mXC3xGZWo_M: Writing thumbnail to: data\mXC3xGZWo_M\mXC3xGZWo_M.jpg
[download] Destination: data\mXC3xGZWo_M\mXC3xGZWo_M.m4a
[download] 100% of 46.63MiB in 00:12
[ffmpeg] Correcting container in "data\mXC3xGZWo_M\mXC3xGZWo_M.m4a"
[ffmpeg] Destination: data\mXC3xGZWo_M\mXC3xGZWo_M.mp3
Deleting original file data\mXC3xGZWo_M\mXC3xGZWo_M.m4a (pass -k to keep)
[ffmpeg] Converting subtitles
WARNING: video doesn't have subtitles
ERROR: file:data\mXC3xGZWo_M\mXC3xGZWo_M.en.vtt: Invalid data found when processing input
You can invoke ffmpeg directly to convert the file (I found this worked better than youtube-dl)
ffmpeg.exe -i mXC3xGZWo_M.en.vtt mXC3xGZWo_M.en.srt
This works like a charm:
ffmpeg version N-80386-g5f5a97d Copyright (c) 2000-2016 the FFmpeg developers
built with gcc 5.4.0 (GCC)
configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-nvenc --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmfx --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib
libavutil 55. 24.100 / 55. 24.100
libavcodec 57. 46.100 / 57. 46.100
libavformat 57. 38.100 / 57. 38.100
libavdevice 57. 0.101 / 57. 0.101
libavfilter 6. 46.101 / 6. 46.101
libswscale 4. 1.100 / 4. 1.100
libswresample 2. 1.100 / 2. 1.100
libpostproc 54. 0.100 / 54. 0.100
Input #0, webvtt, from 'mXC3xGZWo_M.en.vtt':
Duration: N/A, bitrate: N/A
Stream #0:0: Subtitle: webvtt
File 'mXC3xGZWo_M.en.srt' already exists. Overwrite ? [y/N] y
[srt @ 05140720] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
Output #0, srt, to 'mXC3xGZWo_M.en.srt':
Metadata:
encoder : Lavf57.38.100
Stream #0:0: Subtitle: subrip (srt)
Metadata:
encoder : Lavc57.46.100 srt
Stream mapping:
Stream #0:0 -> #0:0 (webvtt (native) -> subrip (srt))
Press [q] to stop, [?] for help
size= 261kB time=00:51:16.69 bitrate= 0.7kbits/s speed=1.26e+004x
video:0kB audio:0kB subtitle:143kB other streams:0kB global headers:0kB muxing overhead: 83.253769%
The only downside is you then get duplicated lines of text in the SRT file.