ccextractor

A fast closed captions extractor for MPEG files.
Download

ccextractor Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL
  • Price:
  • FREE
  • Publisher Name:
  • Carlos Fernandez
  • Publisher web site:

ccextractor Tags


ccextractor Description

A fast closed captions extractor for MPEG files. ccextractor project is a fast closed captions extractor for MPEG files.ccextractor is mostly a mildly optimized C port of McPoodle's excellent but painfully slow Perl script SCC_RIP. It lets you rip the raw closed captions (read: subtitles) data from a number of sources, such as DVD or replay TV.As an added bonus compared to the original SCC_RIP, ccextractor can extract subtitles from the HDTV transport streams that are becoming more common.At this point ccextractor extracts the line 21 captions (which must legally be present for a number of years until the transition to digital is complete). Note that in most .ts you can find, there will be subtitle data for both analog (EIA-608) decoders and digital (EIA-708). AFAIK there are notfreely available EIA-708 rippers.Anyway, since line 21 captions will be available for some time, we have time to build a decent 708 ripper.Basic Usage:For details on CC, please go to McPoodle's page:http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTMLYou will need his tools to use ccextrator's output.The basic idea is that you get the raw closed caption dump from ccextractor.Then you need other tools (which vary depending on what you want to do) to continue processing.To get a transcript from a .ts file in .srt (I assume this will be the most common use) do this:ccextractor -12 input_file-12 means "extract both subtitle tracks" (actually technical names are fields but tracks is easier to understand). 1 is almost always English. 2 is Spanish in HBO (at least in the few samples I've seen) but could be anything. Just extract both of them and check. Example: cctractor -12 house315.tsccextractor will create two files, called house315_1.bin and _2.Then use McPoodle's RAW2SCC to create a temporary SCC file (means Scenerist, which is originally the native format for some program, it's not important here).raw2scc house315_1.binThis creates house315_1.sccFrom this .scc file, you can get the final .srt by using McPoodle's CCASDI:ccasdi -s house315_1.srtWhich looks like this (just 3 random lines shown).51400:24:07,400 --> 00:24:09,300They've got another trialgoing on at Duke.51500:24:09,367 --> 00:24:12,56715% extend their livesbeyond five years.51600:24:12,634 --> 00:24:13,701If you're positivefor protein PHF-- What's New in This Release: · Force generated RCWT files to have the same length as source file. · Fix documentation for -startat / -endat switches. · Make -startat / -endat work with all output formats. · Fix sync check for raw/rcwt files. · Improve timing of dvr-ms NTSC captions. · Add -in=bin switch to read CCExtractor's own binary format. · Fix problem with short input files (smaller 1MB). · Clean up regular and debug output. · Add --no_progress_bar switch to help readability of redirected output. · Add -out=bin switch to write RCWT data. · Remove -bo/--bufferoutput switch and functionality. · Added new generic binary format (RCWT for Raw Captions With Time). This new format allows one file to contain all the available closed caption data instead of just one stream. · Added --no_progress_bar to disable status information (mostly used when debugging, as the progress information is annoying in the middle of debug logs). · The Windows GUI was reported to freeze in some conditions. Fixed. · The Windows GUI is now targeted for .NET 2.0 instead of 3.5. This allows Windows 2000 to run it (there's not .NET 3.5 for Windows 2000), as requested by a couple of key users.


ccextractor Related Software