Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Add SCC support to CEA-708 decoder #1426

Closed
PunitLodha opened this issue Mar 23, 2022 · 18 comments · Fixed by #1595
Closed

[PROPOSAL] Add SCC support to CEA-708 decoder #1426

PunitLodha opened this issue Mar 23, 2022 · 18 comments · Fixed by #1595

Comments

@PunitLodha
Copy link
Member

PunitLodha commented Mar 23, 2022

Add support for SCC format to CEA-708 decoder.
Currently, only SRT, SAMI and Transcript formats are supported, https://github.com/CCExtractor/ccextractor/blob/master/src/rust/src/decoder/tv_screen.rs#L126-L134

SCC format details :- http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML

#1423

@voidash
Copy link

voidash commented Mar 26, 2022

Just to be clear, i looked up similar function write_sami(). Basically it is writing to a file and the contents should look like the image i have embedded.

So if i want to add support for SCC format , then subtitles that are extracted should look like this right

Scenarist_SCC V1.0

01:02:53:14	94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f

01:02:55:14	942c 942c

01:03:27:29	94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f

I am working on this problem, and i will be sure to read contributor guidelines and contact you if i get stuck.

@shazbot666
Copy link

shazbot666 commented Mar 26, 2022

Here's a sample SCC extract from the sample WhackedOutVideos_short.mov using a commercial tool

sample video:
https://drive.google.com/file/d/13p6HBxGXlm0BGpaS15JwCJjfnBdm_Qbm/view?usp=sharing

Scenarist_SCC V1.0

00:58:56:14 e96e 2043 616e 6164 61ae

00:58:58:19 9426 94ad 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 6480

00:58:59:23 9426 94ad 9470 73f4 e9e3 6b20 70ef 6be9 6e67 bf80

00:59:02:03 9426 94ad 9470 c1e3 f475 61ec ec79 2c20 f468 ef73 e520 61f2 e520 f468 e520 f2ef 6473

00:59:03:09 9426 94ad 9470 f468 e579 2075 73e5 20f4 ef20 ecef e361 f4e5 20ec ef73 f420 70e5 ef70 ece5

00:59:04:29 9426 94ad 9470 eff2 20ef 62ea e5e3 f473 20e9 6e20 7570 20f4 ef20 3132 20e6 e5e5 f480

00:59:06:17 9426 94ad 9470 efe6 2070 eff7 64e5 f2ae

00:59:08:18 9426 94ad 9470 496e 20f4 68e9 7320 e361 73e5 2c20 f468 e579 20e6 e96e 6420 f468 e973

00:59:09:19 9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae

00:59:12:03 9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264

00:59:13:14 9426 94ad 9470 73f4 61f2 f473 2061 6761 e96e ae80

00:59:14:27 9426 94ad 9470 a862 ece5 e570 e96e 6729

00:59:18:26 9426 94ad 9470 54f7 efad f468 e9f2 6473 20ef e620 f468 e520 f7ef f2ec 6480

00:59:20:22 9426 94ad 9470 e973 20e3 ef76 e5f2 e564 2062 7920 f761 f4e5 f280

00:59:22:16 9426 94ad 9470 616e 6420 f468 e520 f2e5 73f4 20e9 7320 e3ef 76e5 f2e5 6420 6279 2075 73ae

00:59:24:23 9426 94ad 9470 54e9 6de5 20f4 ef20 6761 f468 e5f2 2075 7020 61ec ec20 f468 e520 67ef efe6 7980

00:59:26:04 9426 94ad 9470 67ef e96e 6773 adef 6e20 e6f2

@voidash
Copy link

voidash commented Apr 7, 2022

I took a shot at adding SCC support for the 708 decoder. I tried adding a function write_scc on tv_screen.rs and here is the commit on my fork: https://github.com/CCExtractor/ccextractor/compare/master...voidash:master

i ran the ccextractor in debug mode with these flags for the video https://drive.google.com/file/d/13p6HBxGXlm0BGpaS15JwCJjfnBdm_Qbm/view.

-in=mp4
-out=scc
-nofc
-dru
/home/cdjk/Downloads/WhackedOutVideos_short.mov
-o
/home/cdjk/Downloads/main.scc
-708

Here is the complete output: https://pastebin.com/58ieUtfY

Without -708 flag , the output is little different from #1423 . https://pastebin.com/PygNqWRh

My major concern is that Writer object is only being created for the last three lines.

[CEA-708] 00:00:30,030 00:00:30,029
[CEA-708] First: 0, Last: 29
[CEA-708] 9426 94ad 9470 616e 6420 f468 e520 f2e5 73f4 20e9 7320 e3ef 76e5 f2e5 6420 6279 2075 73ae


[CEA-708] 00:00:30,030 00:00:30,029
[CEA-708] First: 0, Last: 30
[CEA-708] 9426 94ad 9470 54e9 6de5 20f4 ef20 6761 f468 e5f2 2075 7020 61ec ec20 f468 e520 67ef efe6 7980


[CEA-708] 00:00:30,030 00:00:30,029
[CEA-708] First: 0, Last: 30
[CEA-708] 9426 94ad 9470 67ef e96e 6773 adef 6e20 e6f2 ef6d 2061 f2ef 756e 6420 f468 e520 67ec ef62 e580

And for those three lines , the start and end times are same. and the output file main.scc contains
Scenarist_SCC V1.0 only

However, the file main.p0.svc01.scc has those last three lines.
Note: i wrote write_scc function by looking how write_srt and write_transcript work. If there is something i need to understand please let me know

@cfsmp3
Copy link
Contributor

cfsmp3 commented Apr 8, 2022

@PunitLodha can you take a look at @voidash 's work?

@PunitLodha
Copy link
Member Author

Yes, I will in some time

@PunitLodha
Copy link
Member Author

So, for some reason, mp4 still uses the C decoder. And changing it to rust is not as straightforward. I am working on it.

Meanwhile, @voidash could you replicate the changes in C here, https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708_output.c#L370-L392

@voidash
Copy link

voidash commented Apr 12, 2022

Ok, i will take a look at it.

@voidash
Copy link

voidash commented Apr 12, 2022

I tried replicating the changes in C. here is the diff file : voidash@fb5dbe2

Here is the output when i passed the following parameters
-in=mp4 -out=scc -nofc -dru /home/cdjk/Downloads/WhackedOutVideos_short.mov -o /home/cdjk/Downloads/main.scc
https://pastebin.com/VeY4BmbK

The temp file main.p0.svc01.scc file is being written and the contents look like this :
https://pastebin.com/xq6Jwfuv

but main.scc is still unwritten. Looking at the console output it looks as if the caption type is roll up

0:00:15:982 --> 00:00:17:350
In this case, they find this
dude's video camera.
And the swear blizzard
00:00:15:29	9426 94ad 9470 496e 20f4 68e9 7320 e361 73e5 2c20 f468 e579 20e6 e96e 6420 f468 e973 

9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae 

9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264 


00:00:17:351 --> 00:00:18:784
dude's video camera.
And the swear blizzard
starts again.
00:00:17:10	9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae 

9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264 

9426 94ad 9470 73f4 61f2 f473 2061 6761 e96e ae80

Any suggestions on what should i do next?

@PunitLodha
Copy link
Member Author

main.scc will be empty because it is supposed to contain subs for 608, which is not present here. main.p0.svc01.scc is the file which is supposed to have 708 subs. So that is correct.
But I can see some issues with the output. One being that there are multiple timestamps on the same line. Other than that, I think the clear caption command is missing, which should be present at end time of each subtitle

@cfsmp3
Copy link
Contributor

cfsmp3 commented Apr 12, 2022

The mp4 code has a different flow. We use libgpac to actually open the mp4 file and the entry point into the decoders is different than the usual general loop.

It should be easy to change though and call the rust code.

@voidash
Copy link

voidash commented Apr 13, 2022

@PunitLodha . main.p0.svc01.sccnow looks like this.

00:00:02:15	94ae 9420 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 6480

00:00:03:18	942c 942c 

00:00:03:19	94ae 9420 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 648094ae 9420 9470 73f4 e9e3 6b20 70ef 6be9 6e67 bf80

00:00:05:28	942c 942c 

You can take a look at my approach here : voidash@e449557

Here is pastebin for main.p0.svc01.scc : https://pastebin.com/aMiaEStY
So 708 decoder found SCC subs which means Scenarist_SCC V1.0 header should be added on top of the main.p0.svc01.scc and also i guess i should remove the rust code which is just appending last three caption text

@cfsmp3
Copy link
Contributor

cfsmp3 commented Apr 13, 2022

I'd recommend looking into this - @PunitLodha

dtvcc_process_data(dec_ctx->dtvcc, (unsigned char *)temp);

If you can just call rust from there you're good to go. After that everything is the same thing.

@PunitLodha
Copy link
Member Author

I did look at that. But due to how the code is structured, it's not as easy as just calling the rust function from there. I'll have to change some stuff from the rust side first

@PunitLodha
Copy link
Member Author

@voidash

So 708 decoder found SCC subs which means Scenarist_SCC V1.0 header should be added on top of the main.p0.svc01.scc

Check out how sami header is added, and do it the same way

also i guess i should remove the rust code which is just appending last three caption text

The last captions are added by the code which you added in rust. It is called by the flush function. So you should correct the rust code too, and send a PR

voidash added a commit to voidash/ccextractor that referenced this issue Apr 14, 2022
voidash added a commit to voidash/ccextractor that referenced this issue Apr 15, 2022
@ArchitBhonsle
Copy link
Contributor

If this issue has been abandoned, I could start working on this.

The mp4 code has a different flow. We use libgpac to actually open the mp4 file and the entry point into the decoders is different than the usual general loop.

It should be easy to change though and call the rust code.

Is there a video with 708 captions which is not an MP4? This might help me avoid implementing this in C and/or changing the current MP4 flow.

@cfsmp3
Copy link
Contributor

cfsmp3 commented Mar 8, 2023

If this issue has been abandoned, I could start working on this.

Sure, go for it.

Is there a video with 708 captions which is not an MP4? This might help me avoid implementing this in C and/or changing the current MP4 flow.

Yes, almost any US Transport Stream.
You can find plenty on our website.

@PunitLodha
Copy link
Member Author

#1499 details the issue with mp4 code flow and how to fix it

@IshanGrover2004
Copy link
Contributor

IshanGrover2004 commented Dec 17, 2023

Hi,
I would like to work on this issue and continue to work on where @voidash left it.
Just wanted to know what is the current progress and what things are needed to fulfil the feature.
And lil bit of how could i resolve it.

If any necessary information i should know, just tell me that as well.
@PunitLodha @cfsmp3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants