-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update HapVideoDRAFT.md #3
Conversation
Proposed extension to allow the snappy data to be split up into smaller partitions, for separate decompression. Also allow for future potential options using a bit-mask. Discussion welcome, although I would like to move fast on this since I need it for a project.
I'm curious about the motivation for this - have you measured the time saving for common video frame sizes? Above the frame-format, have you considered snappy-decompressing entire frames in parallel? My gut feeling is that this adds complexity only to benefit edge-case scenarios - which isn't to say I'm against it, because some of those edge cases are the most interesting - I'd just like to be certain of its benefit. As a tiny niggle, could you take out the line changes which are only formatting? |
On a 4Kx2K video the snappy decode portion takes 8ms on my machine. I took the same file and encoded it into twelve partitions, and the decode time goes down to 1.5 ms on the same 6 core CPU. In my case I see the main usage for HAP to be ultra-high res (4K and higher) and high frame rate (60FPS+) video playback. For more common resolutions and frame rates I would still tend to use H264. Frame based decoding is something I've considered, but it is more complex to add. It took me 2.5 hours to add both encode and decode support for this extension, while frame based encoding would probably take a few days to fit nicely into the code base. Also this way of decoding reduces the movie seek time, while frame based decoding doesn't help with that. Do you mean you want me to revert the ##s I used to control the headings and leave my sub-heading just as regular text? Happy to do so, just let me know. |
I'm happy to accept a way of breaking frames into chunks, but...
|
Probably a nomenclature difference here, but what do you mean by 'define it as a data format'. Where would that go? I don't see how a decoder ignorant of this addition would be able to do anything useful. You can't put the batch of partitioned compressed data though the snappy decompressor and have it decompress properly. You need to know the size of each partition. I have thought about changing my definition from 'Snappy With Options' to simply 'With Options', and have the snappy portion be a flag in the options field. This would allow for different compression formats and other options like split compressed/uncompressed sections. That makes sense to me. |
I simply mean place the options, chunk count and chunk sizes together, and consider them part of the data section, not the header section. Define the steps to decode this blob (options, chunk count, chunk sizes and combined chunk data) as an appendix to the Hap spec. Obviously I don't intend ignorant decompressors to make sense of it, I'd just like to limit it to a single ignorable blob. By-the-by, snappy already has a framing format which splits it into chunks - I assume you consider it too verbose for this purpose? It specifies a fairly low limit on chunk size. I think if we allow other formats, we should probably permit mixing them (eg alpha-clear areas could be compressed differently to almost nothing) in which case a list of compression formats would have to accompany the list of chunk sizes (a flag would not be adequate). |
It seems strange to me the information that describes the frames would be in the data section. If this was in the original spec it would have been in the header section, no? For a compromise and clarity how about we describe it as the 'Options Section' (Header Section, Options Section, Data Section). I don't think the data section size field should include the 'options section' as part of it's byte total, what do you think? I've looked at the framing spec but it seems needless for this, and if we plan to support other compressors in future this options feature should be agnostic to any compressor specific features like this. One we decide on the above, I'll update the spec to include per partition (or do you want me to rename it to chunks?) information for compression type etc. |
Hey Tom, sorry to bug you. Any further thoughts on this? |
I agree that snappy's framing format isn't suitable. Let's focus on the technical aspects and I can adapt the nomenclature afterwards - perhaps the data required to unpack the new format could be named an extended header? This is mostly recap plus some clarification:
One thought is that perhaps we should use an atom structure so this layout can be extended in the future. Thoughts on any/all? |
No technical comments as I'm not across anything at this level but both On 9/07/13 2:27 PM, Tom Butterworth wrote:
|
Re-written as per discussion
Sounds good to me, I've updated the spec. With regards to the atom structure: I'm not too familiar with it, but is that not a quicktime/.mov specific feature? I don't think ffmpeg has great support for it and I wouldn't want to tie down the codec to only .mov containers. Correct me if I'm wrong though. |
You still restrict the interpretation of the "simple" header for your new format - I want to keep that as it is for all formats, so remove your requirement that the first three bytes be zero for the new format, and describe the "simple" header once - in other words your section titled "If the second-stage compression format is None (OxA) or Snappy (0xB):" should apply to all frames. By atoms I mean start each entry of the extended header with eight bytes, of which four are a code denoting that entry's meaning, and four are its size. QuickTime does use (an extended version of) the same layout, but this would have no impact on choice of container format. You are already close as you begin each entry with its size. Adding a code to indicate meaning would relax requirements for ordering sections and allow additional optional tables if other compressors needed more information (so for snappy-only frames we could do away with the reserved byte and perhaps the uncompressed size, and add those as separate tables if other compressors required them). Or perhaps you think it's overkill. I still haven't had time to look this over as closely as I'd like to - hope these points are of some use. |
Updated with notes from Tom
Ok I've updated it with these notes in mind. Let me know what you think |
Apologies for being a little vague before - I was suggesting something like this more compact version which splits chunk size and format information. Each section of the header is prefixed with three bytes of size and one byte indicating the content of that section (my original eight bytes seem a waste of space), and "options" are indicated by the presence of particular sections (presently the only case being a pair of chunk-size and chunk-format sections). The order of sections is not prescribed. The number of chunks is calculated from the known size and per-chunk size of the sections, rather than stored. A frame split into 4 chunks could be (each line is one byte)
Comments? |
Hap VideoIntroductionHap is an open video codec. It stores frames in a format that can be decoded in part by dedicated graphics hardware on modern computer systems. The aim of Hap is to enable playback of a greater number of simultaneous streams of higher resolution video than is possible using alternative codecs. ScopeThis document describes the content of Hap frames. It makes no recommendation on container format or the practical details of implementing an encoder or decoder. External ReferencesThe correct encoding and decoding of Hap frames depends on compression schemes defined outside of this document. Adherence to these schemes is required to produce conforming Hap frames.
Hap FramesA Hap frame is always formed of a header of four or eight bytes and a frame data section. Frames may also have a variable length set of tables to inform decoding, known as decode instructions. The presence of decode instructions will be indicated in the frame header and if present, they will always be positioned after the frame header and before the frame data. Frame HeaderThe header will be four or eight bytes in size and records the format and combined size of any decode instructions and the frame data. The size of the header is determined by the value of the first three bytes. When the first three bytes of the header each have the value zero, the header is eight bytes in size. The fifth, sixth, seventh and eighth bytes are an unsigned integer stored in little-endian byte order. This is the combined size of any decode instructions and the frame data in bytes. When any of the first three bytes of the header have a non-zero value, the header is four bytes in size. The first three bytes are an unsigned integer stored in little-endian byte order. This is the combined size of any decode instructions and the frame data in bytes. The fourth byte of the header is an unsigned integer denoting the S3 and second-stage compression formats in which the data is stored, as well as indicating the presence of decode instructions. Its value and meaning will be one of the following:
Decode InstructionsOnly if indicated by the value of the fourth byte of the header, decode instructions will immediately follow the frame header. Decode instructions are stored in a sectioned layout, where each section starts with fields indicating that section's size and type. A section may itself contain other sections, or data to inform decoding. Decoders encountering a section of an unknown type should attempt to continue decoding the frame if other sections provide adequate information to do so. Sections may occur in any order. The first, second and third bytes of a section are an unsigned integer stored in little-endian byte order. They represent the size of that section in bytes, including the size and type fields. The fourth byte is a code denoting the type of that section, and will be one of the following:
####Decode Instructions Container ####Chunk Second-Stage Compressor Table
The number of chunks can be calculated from the size of this section discounting the size of the section size and type fields. If second-stage compression is indicated, each chunk is to be passed to the second-stage decompressor independently. This section, if present, must be accompanied by a Chunk Size Table. ####Chunk Size Table The presence of this section indicates that frame data is split into chunks. Following the size and type fields comes a series of four-byte fields being unsigned integers stored in little-endian byte order, and indicating the byte size of each chunk. If second-stage compression is used, each chunk should be passed to the second-stage decompressor independently. This section, if present, must be accompanied by a Chunk Second-Stage Compressor Table. Frame DataThe remainder of the frame is the frame data, starting immediately after the header and any decode instructions. The data is to be treated as indicated by the header and decode instructions. If a second-stage compressor is indicated then the frame data is to be decompressed accordingly. The result of that decompression will be data in the indicated S3 format. If no second-stage compressor is indicated, the frame data is in the indicated S3 format. |
One thing we are losing with this spec though is the ability to have compressors that require more information than just compressed size. Unless the new compressor specifies a new type of chunk table, which seems messy. I'm also personally not a fan of the 3+1 byte/size+info way of storing headers. I find the albeit small amount of complexity it adds, and low hard limit it places on maximum sizes, not worth it for the memory savings. Especially since this is already a very uncompressed video format, the memory savings is a tiny fraction of a packet's size. |
The intention is that any additional information required can be added as additional tables - I'm not sure I get why that's any more or less messy than a single table with variable-length entries, but feel free to put me right. For the decode instructions three bytes is surely ample for recording table sizes, but you're right re complexity, perhaps you have a point. The original 3+1 byte header is an artefact of Hap's history. If using four bytes for section-size fields, I'd probably use four bytes for section type fields as well, to keep section content on 32-bit boundaries. I think a byte is adequate to express any likely future number of entries in the Chunk Second-Stage Compressor table, and am inclined to leave its definition as above. Thoughts? |
A further thought - storing chunk start offsets as well as sizes would allow chunks to be repeated if a codec wanted to identify repeated areas to reduce bit-rate - might make a difference for animation style frames. Could be optional - use chunk sizes from offset zero if not present. |
Hey, I just thought it would be good if the compressor type and the table format was totally orthogonal. All compressor types would use a common table chunk table description. As it is right now if I wanted to add another compressor with a decompressed size that wasn't determinable, I'd need to describe a different chunk table format. Yes I like the idea of having everything as 32-bit ints, just for simplicity if nothing else. Chunks + Offsets sounds good to me too. |
After some thought, I'm minded against 32-bit ints: a decoder is already required to correctly handle 24-bit ints to read the four-byte version of the basic header, so their use doesn't introduce any complexity not already present. They provide ample space to describe any likely byte count within the extended header, so the extra range of a 32-bit int seems redundant. As above, I do intend to use 32 bits to express chunk sizes. I see where you're coming from re a single table (and the word "orthogonal" needs way more usage) - but chunk layout is meaningful prior to handing data off to the secondary decompressor(s). One motivation for splitting your single table into its columns was the ease of traversing fixed-width entries - for any chunk you can find its entry by address arithmetic rather than stepping through each prior entry. I might drop the last entry in the Chunk Size Table as it is simply the remaining frame size. Seem reasonable? Holler if any of the above still seems grossly misguided - I often am... |
Sorry, I'm out of the office this week but I'll get back onto this next week |
No problem |
If we do go with a 3 + 1 byte header for decode instruction section headers, for simplicity they should probably match the rules for the whole-frame header: size excludes the size of the header for each section, and they can be extended to 3 + 1 + 4 bytes in the same way. |
Hey, sorry for the delay in answering. I see what you are saying about the tables/column and I agree now. Thinking of it as columns makes it more clean in my head. Are you saying the Chunk Size table will have 1 entry less than the number of chunks, and the size of the last chunk will be inferred by the total size - consumed size? |
Further to conversation between @mbechard and myself, changes to the above should be:
I'd suggest the spec should be arranged roughly as my version above, with the following (fairly major) edits:
If you can observe the painfully pedantic tone of the current spec I'd be grateful - eg specify the byte ordering every time you describe a multi-byte int. The hope is to limit the mistakes of anyone referring to a paragraph in isolation. |
Updated as per discussions with Tom. Hope I got everything correct.
Apologies for the delay, turns out GitHub doesn't notify new commits on pull requests. I've merged this to a new branch (chunks) and made some edits which I hope you're happy with, none of which change the layout of frames - comments here or new pull request on that branch welcome if there's anything you'd like to change. I will update the source code on the chunks branch then merge to master once I've done that. Perhaps we could exchange sample movies to verify we are both interpreting the spec the same way. |
Sorry for the delay also, I was on vacation. This looks good to me. One comment I was wrestling with was this paragraph “Decoders encountering a section of an unknown type should attempt to continue decoding the frame if other sections provide adequate information to do so. Some sections are only permitted inside other sections, but at any given hierarchical level sections may occur in any order.” I don’t think there is any way for a decoder to know if there is adequate info to decode when it encounters an unknown section type. Already with the new section types we have added, attempting to blindly decompress will cause unknown issues, possibly a crash. From: Tom Butterworth [mailto:[email protected]] Apologies for the delay, turns out GitHub doesn't notify new commits on pull requests. I've merged this to a new branch (chunks) and made some edits which I hope you're happy with, none of which change the layout of frames - comments here or new pull request on that branch welcome if there's anything you'd like to change. I will update the source code on the chunks branch then merge to master once I've done that. Perhaps we could exchange sample movies to verify we are both interpreting the spec the same way. — |
This is basically a requirement to not give up if you have all the information you need to decode as well as some you can't make sense of. A decoder which understands every entry in the secondary compressor table and has a chunk size table knows it has adequate information to proceed, even if other unrecognised section types are present. This permits the a decoder written to this spec to decode frames after the addition of future section types which may be required by future secondary compressors and which may be present even if those compressors haven't been used in a given frame. The spec effectively limits the addition of new section types to be permitted only within the decode instructions container. |
Hi, Now that the decode side of this has been merged, do you guys have any reference file that you've used to test this? Thanks! |
All files created with the latest experimental version of TouchDesigner (using the Movie Out TOP) will make use of it. |
Thanks. This helped me validade my implementation. At this resolution the added complexity doesn't really pay off, with 0.6ms decode time for the simpler serial implementation, vs. 0.4ms for the parallel implementation on a quad-core, but I imagine width larger resolutions it will make more sense. |
Indeed |
Proposed extension to allow the snappy data to be split up into smaller partitions for separate decompression. Also allow for future potential options using a bit-mask. Discussion welcome, although I would like to move fast on this since I need it for a project. I'm just looking to augment the format specification though, I don't need the sample code/Quicktime plugin to incorporate anytime soon.