Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HapVideoDRAFT.md #3

Merged
merged 4 commits into from
May 5, 2014
Merged

Update HapVideoDRAFT.md #3

merged 4 commits into from
May 5, 2014

Conversation

mbechard
Copy link
Contributor

Proposed extension to allow the snappy data to be split up into smaller partitions for separate decompression. Also allow for future potential options using a bit-mask. Discussion welcome, although I would like to move fast on this since I need it for a project. I'm just looking to augment the format specification though, I don't need the sample code/Quicktime plugin to incorporate anytime soon.

Proposed extension to allow the snappy data to be split up into smaller partitions, for separate decompression. Also allow for future potential options using a bit-mask. Discussion welcome, although I would like to move fast on this since I need it for a project.
@bangnoise
Copy link
Collaborator

I'm curious about the motivation for this - have you measured the time saving for common video frame sizes? Above the frame-format, have you considered snappy-decompressing entire frames in parallel?

My gut feeling is that this adds complexity only to benefit edge-case scenarios - which isn't to say I'm against it, because some of those edge cases are the most interesting - I'd just like to be certain of its benefit.

As a tiny niggle, could you take out the line changes which are only formatting?

@mbechard
Copy link
Contributor Author

On a 4Kx2K video the snappy decode portion takes 8ms on my machine. I took the same file and encoded it into twelve partitions, and the decode time goes down to 1.5 ms on the same 6 core CPU. In my case I see the main usage for HAP to be ultra-high res (4K and higher) and high frame rate (60FPS+) video playback. For more common resolutions and frame rates I would still tend to use H264.

Frame based decoding is something I've considered, but it is more complex to add. It took me 2.5 hours to add both encode and decode support for this extension, while frame based encoding would probably take a few days to fit nicely into the code base. Also this way of decoding reduces the movie seek time, while frame based decoding doesn't help with that.

Do you mean you want me to revert the ##s I used to control the headings and leave my sub-heading just as regular text? Happy to do so, just let me know.

@bangnoise
Copy link
Collaborator

I'm happy to accept a way of breaking frames into chunks, but...

  • Can we define chunked snappy as a data format and apart from the addition of the new codes for the third byte, leave the header format unchanged? This allows decoders ignorant of this addition still to calculate the entire frame size.
  • Have you considered positioning the chunking format above the compression format, to allow mixed compressed and uncompressed frame chunks (or perhaps in the future mixing compressors)?
  • I hadn't spotted the addition of a third-level heading - ignore my comment re formatting - or ideally avoid more than two levels of heading and leave the formatting unchanged

@mbechard
Copy link
Contributor Author

Probably a nomenclature difference here, but what do you mean by 'define it as a data format'. Where would that go? I don't see how a decoder ignorant of this addition would be able to do anything useful. You can't put the batch of partitioned compressed data though the snappy decompressor and have it decompress properly. You need to know the size of each partition.

I have thought about changing my definition from 'Snappy With Options' to simply 'With Options', and have the snappy portion be a flag in the options field. This would allow for different compression formats and other options like split compressed/uncompressed sections. That makes sense to me.

@bangnoise
Copy link
Collaborator

I simply mean place the options, chunk count and chunk sizes together, and consider them part of the data section, not the header section. Define the steps to decode this blob (options, chunk count, chunk sizes and combined chunk data) as an appendix to the Hap spec. Obviously I don't intend ignorant decompressors to make sense of it, I'd just like to limit it to a single ignorable blob.

By-the-by, snappy already has a framing format which splits it into chunks - I assume you consider it too verbose for this purpose? It specifies a fairly low limit on chunk size.

I think if we allow other formats, we should probably permit mixing them (eg alpha-clear areas could be compressed differently to almost nothing) in which case a list of compression formats would have to accompany the list of chunk sizes (a flag would not be adequate).

@mbechard
Copy link
Contributor Author

It seems strange to me the information that describes the frames would be in the data section. If this was in the original spec it would have been in the header section, no? For a compromise and clarity how about we describe it as the 'Options Section' (Header Section, Options Section, Data Section). I don't think the data section size field should include the 'options section' as part of it's byte total, what do you think?

I've looked at the framing spec but it seems needless for this, and if we plan to support other compressors in future this options feature should be agnostic to any compressor specific features like this.

One we decide on the above, I'll update the spec to include per partition (or do you want me to rename it to chunks?) information for compression type etc.

@mbechard
Copy link
Contributor Author

mbechard commented Jul 4, 2013

Hey Tom, sorry to bug you. Any further thoughts on this?

@bangnoise
Copy link
Collaborator

I agree that snappy's framing format isn't suitable.

Let's focus on the technical aspects and I can adapt the nomenclature afterwards - perhaps the data required to unpack the new format could be named an extended header?

This is mostly recap plus some clarification:

  • Appart from the addition of new codes, leave the first four/eight bytes unchanged
  • Permit mixing compressors in one frame, so add a compressor table for the new format
  • Consider the combined chunk count and offset and compressor tables of the new format as part of the data section as far as recording the size, so the "simple" header records a size and type for what follows. You could change references to "the frame data" to "any extended header and associated frame data" or somesuch.

One thought is that perhaps we should use an atom structure so this layout can be extended in the future.

Thoughts on any/all?

@sheridanis
Copy link

No technical comments as I'm not across anything at this level but both
myself and the company I work for would be keen to see decreased latency
as amongst other things where using the codec for vj work at 9600x1080
so are constantly scrubbing and switching clips

On 9/07/13 2:27 PM, Tom Butterworth wrote:

I agree that snappy's framing format isn't suitable.

Let's focus on the technical aspects and I can adapt the nomenclature
afterwards - perhaps the data required to unpack the new format could
be named an extended header?

This is mostly recap plus some clarification:

  • Appart from the addition of new codes, leave the first four/eight
    bytes unchanged
  • Permit mixing compressors in one frame, so add a compressor table
    for the new format
  • Consider the combined chunk count and offset and compressor tables
    of the new format as part of the data section as far as recording
    the size, so the "simple" header records a size and type for what
    follows. You could change references to "the frame data" to "any
    extended header and associated frame data" or somesuch.

One thought is that perhaps we should use an atom structure so this
layout can be extended in the future.

Thoughts on any/all?


Reply to this email directly or view it on GitHub
#3 (comment).

Re-written as per discussion
@mbechard
Copy link
Contributor Author

mbechard commented Jul 9, 2013

Sounds good to me, I've updated the spec.
One thing I'm not sure about is if the table should be constant size per entry the way I've written it, or if it should be variable size per entry, determined by the compressor type.
My reasoning for adding the uncompressed size for each entry is for future compressors that may not be able to determine uncompressed size in constant time the way Snappy can (which is important to know to avoid a memcpy after decompression)
Thoughts?

With regards to the atom structure: I'm not too familiar with it, but is that not a quicktime/.mov specific feature? I don't think ffmpeg has great support for it and I wouldn't want to tie down the codec to only .mov containers. Correct me if I'm wrong though.

@bangnoise
Copy link
Collaborator

You still restrict the interpretation of the "simple" header for your new format - I want to keep that as it is for all formats, so remove your requirement that the first three bytes be zero for the new format, and describe the "simple" header once - in other words your section titled "If the second-stage compression format is None (OxA) or Snappy (0xB):" should apply to all frames.

By atoms I mean start each entry of the extended header with eight bytes, of which four are a code denoting that entry's meaning, and four are its size. QuickTime does use (an extended version of) the same layout, but this would have no impact on choice of container format. You are already close as you begin each entry with its size. Adding a code to indicate meaning would relax requirements for ordering sections and allow additional optional tables if other compressors needed more information (so for snappy-only frames we could do away with the reserved byte and perhaps the uncompressed size, and add those as separate tables if other compressors required them). Or perhaps you think it's overkill.

I still haven't had time to look this over as closely as I'd like to - hope these points are of some use.

Updated with notes from Tom
@mbechard
Copy link
Contributor Author

Ok I've updated it with these notes in mind. Let me know what you think

@bangnoise
Copy link
Collaborator

Apologies for being a little vague before - I was suggesting something like this more compact version which splits chunk size and format information. Each section of the header is prefixed with three bytes of size and one byte indicating the content of that section (my original eight bytes seem a waste of space), and "options" are indicated by the presence of particular sections (presently the only case being a pair of chunk-size and chunk-format sections). The order of sections is not prescribed. The number of chunks is calculated from the known size and per-chunk size of the sections, rather than stored.

A frame split into 4 chunks could be (each line is one byte)

zero
zero
zero
frame-format
frame-size
frame-size
frame-size
frame-size
    section-size
    section-size
    section-size (28)
    section-type (decode instructions)
        section-size
        section-size
        section-size (20)
        section-type (chunk sizes)
            chunk-1-size
            chunk-1-size
            chunk-1-size
            chunk-1-size
            chunk-2-size
            chunk-2-size
            chunk-2-size
            chunk-2-size
            chunk-3-size
            chunk-3-size
            chunk-3-size
            chunk-3-size
            chunk-4-size
            chunk-4-size
            chunk-4-size
            chunk-4-size
        section-size
        section-size
        section-size (8)
        section-type (chunk formats)
            chunk-1-format
            chunk-2-format
            chunk-3-format
            chunk-4-format
frame-data
...

Comments?

@bangnoise
Copy link
Collaborator

Hap Video

Introduction

Hap is an open video codec. It stores frames in a format that can be decoded in part by dedicated graphics hardware on modern computer systems. The aim of Hap is to enable playback of a greater number of simultaneous streams of higher resolution video than is possible using alternative codecs.

Scope

This document describes the content of Hap frames. It makes no recommendation on container format or the practical details of implementing an encoder or decoder.

External References

The correct encoding and decoding of Hap frames depends on compression schemes defined outside of this document. Adherence to these schemes is required to produce conforming Hap frames.

  1. S3 Texture Compression: described in the OpenGL S3TC Extension
  2. Snappy Compression: described in the Snappy Format Description
  3. Scaled YCoCg DXT5 Texture Compression: described in Real-Time YCoCg-DXT Compression, JMP van Waveren and Ignacio Castaño, September 2007

Hap Frames

A Hap frame is always formed of a header of four or eight bytes and a frame data section. Frames may also have a variable length set of tables to inform decoding, known as decode instructions. The presence of decode instructions will be indicated in the frame header and if present, they will always be positioned after the frame header and before the frame data.

Frame Header

The header will be four or eight bytes in size and records the format and combined size of any decode instructions and the frame data. The size of the header is determined by the value of the first three bytes.

When the first three bytes of the header each have the value zero, the header is eight bytes in size. The fifth, sixth, seventh and eighth bytes are an unsigned integer stored in little-endian byte order. This is the combined size of any decode instructions and the frame data in bytes.

When any of the first three bytes of the header have a non-zero value, the header is four bytes in size. The first three bytes are an unsigned integer stored in little-endian byte order. This is the combined size of any decode instructions and the frame data in bytes.

The fourth byte of the header is an unsigned integer denoting the S3 and second-stage compression formats in which the data is stored, as well as indicating the presence of decode instructions. Its value and meaning will be one of the following:

Hexadecimal Byte Value S3 Format Second-Stage Compressor
0xAB RGB DXT1 None
0xBB RGB DXT1 Snappy
0xCB RGB DXT1 Consult decode instructions
0xAE RGBA DXT5 None
0xBE RGBA DXT5 Snappy
0xCE RGBA DXT5 Consult decode instructions
0xAF Scaled YCoCg DXT5 None
0xBF Scaled YCoCg DXT5 Snappy
0xCF Scaled YCoCg DXT5 Consult decode instructions

Decode Instructions

Only if indicated by the value of the fourth byte of the header, decode instructions will immediately follow the frame header.

Decode instructions are stored in a sectioned layout, where each section starts with fields indicating that section's size and type.

A section may itself contain other sections, or data to inform decoding. Decoders encountering a section of an unknown type should attempt to continue decoding the frame if other sections provide adequate information to do so. Sections may occur in any order.

The first, second and third bytes of a section are an unsigned integer stored in little-endian byte order. They represent the size of that section in bytes, including the size and type fields.

The fourth byte is a code denoting the type of that section, and will be one of the following:

Hexadecimal Byte Value Meaning
0x01 Decode Instructions Container
0x02 Chunk Second-Stage Compressor Table
0x03 Chunk Size Table

####Decode Instructions Container
The Decode Instructions Container is the only permitted top-level section. Following the size and type fields it will contain any other required sections.

####Chunk Second-Stage Compressor Table
The presence of this section indicates that frame data is split into chunks. Following the size and type fields comes a series of single-byte fields indicating the second-stage compressor for each chunk, with one of the following values:

Hexadecimal Byte Value Second-Stage Compressor
0x0A None
0x0B Snappy

The number of chunks can be calculated from the size of this section discounting the size of the section size and type fields. If second-stage compression is indicated, each chunk is to be passed to the second-stage decompressor independently. This section, if present, must be accompanied by a Chunk Size Table.

####Chunk Size Table

The presence of this section indicates that frame data is split into chunks. Following the size and type fields comes a series of four-byte fields being unsigned integers stored in little-endian byte order, and indicating the byte size of each chunk. If second-stage compression is used, each chunk should be passed to the second-stage decompressor independently. This section, if present, must be accompanied by a Chunk Second-Stage Compressor Table.

Frame Data

The remainder of the frame is the frame data, starting immediately after the header and any decode instructions. The data is to be treated as indicated by the header and decode instructions. If a second-stage compressor is indicated then the frame data is to be decompressed accordingly. The result of that decompression will be data in the indicated S3 format. If no second-stage compressor is indicated, the frame data is in the indicated S3 format.

@mbechard
Copy link
Contributor Author

One thing we are losing with this spec though is the ability to have compressors that require more information than just compressed size. Unless the new compressor specifies a new type of chunk table, which seems messy. I'm also personally not a fan of the 3+1 byte/size+info way of storing headers. I find the albeit small amount of complexity it adds, and low hard limit it places on maximum sizes, not worth it for the memory savings. Especially since this is already a very uncompressed video format, the memory savings is a tiny fraction of a packet's size.
But this is functional for my immediate needs so if this is what you when then that's cool. Thanks

@bangnoise
Copy link
Collaborator

The intention is that any additional information required can be added as additional tables - I'm not sure I get why that's any more or less messy than a single table with variable-length entries, but feel free to put me right.

For the decode instructions three bytes is surely ample for recording table sizes, but you're right re complexity, perhaps you have a point. The original 3+1 byte header is an artefact of Hap's history.

If using four bytes for section-size fields, I'd probably use four bytes for section type fields as well, to keep section content on 32-bit boundaries. I think a byte is adequate to express any likely future number of entries in the Chunk Second-Stage Compressor table, and am inclined to leave its definition as above. Thoughts?

@bangnoise
Copy link
Collaborator

A further thought - storing chunk start offsets as well as sizes would allow chunks to be repeated if a codec wanted to identify repeated areas to reduce bit-rate - might make a difference for animation style frames. Could be optional - use chunk sizes from offset zero if not present.

@mbechard
Copy link
Contributor Author

Hey, I just thought it would be good if the compressor type and the table format was totally orthogonal. All compressor types would use a common table chunk table description. As it is right now if I wanted to add another compressor with a decompressed size that wasn't determinable, I'd need to describe a different chunk table format.

Yes I like the idea of having everything as 32-bit ints, just for simplicity if nothing else.

Chunks + Offsets sounds good to me too.

@bangnoise
Copy link
Collaborator

After some thought, I'm minded against 32-bit ints: a decoder is already required to correctly handle 24-bit ints to read the four-byte version of the basic header, so their use doesn't introduce any complexity not already present. They provide ample space to describe any likely byte count within the extended header, so the extra range of a 32-bit int seems redundant. As above, I do intend to use 32 bits to express chunk sizes.

I see where you're coming from re a single table (and the word "orthogonal" needs way more usage) - but chunk layout is meaningful prior to handing data off to the secondary decompressor(s). One motivation for splitting your single table into its columns was the ease of traversing fixed-width entries - for any chunk you can find its entry by address arithmetic rather than stepping through each prior entry.

I might drop the last entry in the Chunk Size Table as it is simply the remaining frame size. Seem reasonable?

Holler if any of the above still seems grossly misguided - I often am...

@mbechard
Copy link
Contributor Author

Sorry, I'm out of the office this week but I'll get back onto this next week

@bangnoise
Copy link
Collaborator

No problem

@bangnoise
Copy link
Collaborator

If we do go with a 3 + 1 byte header for decode instruction section headers, for simplicity they should probably match the rules for the whole-frame header: size excludes the size of the header for each section, and they can be extended to 3 + 1 + 4 bytes in the same way.

@mbechard
Copy link
Contributor Author

mbechard commented Aug 8, 2013

Hey, sorry for the delay in answering.
I don't agree that the 3 + 1 + 4 standard should be used beyond the initial size. It's only there for backwards compatibility and doesn't add any value to the new spec, while adding complexity. Just going straight 4-byte makes everyone's job easier in the future, and any new compressor never has to deal with the 3+1 encoding. I still think 32-bit size straight up is the way to go.

I see what you are saying about the tables/column and I agree now. Thinking of it as columns makes it more clean in my head.

Are you saying the Chunk Size table will have 1 entry less than the number of chunks, and the size of the last chunk will be inferred by the total size - consumed size?

@bangnoise
Copy link
Collaborator

Further to conversation between @mbechard and myself, changes to the above should be:

  1. Add an offsets table which is optional
    • The spec should describe calculating offsets from the start of the data section if such a table is absent
  2. The chunk size table should not omit the last chunk size
  3. Section sizes should exclude the section header size to match the rules for the "simple" frame header (I'm not set on the word "section" to describe the different parts of the header if you can come up with something better)
  4. Section headers should match exactly the frame header: 3 bytes section size plus 1 byte type, with the option to zero the first three bytes and use the subsequent four for the size.

I'd suggest the spec should be arranged roughly as my version above, with the following (fairly major) edits:

  1. Before describing any specifics of the header, describe the sectioned frame layout in general terms roughly as in the four paragraphs starting "Decode instructions are stored in a sectioned layout" with the addition of the instructions for extension to 3+1+4 bytes from the "Frame Header" section
  2. Describe the existing formats as other sorts of section (or whatever terminology you settle on) - requiring no changes to their layout, obviously

If you can observe the painfully pedantic tone of the current spec I'd be grateful - eg specify the byte ordering every time you describe a multi-byte int. The hope is to limit the mistakes of anyone referring to a paragraph in isolation.

Updated as per discussions with Tom. Hope I got everything correct.
@bangnoise
Copy link
Collaborator

Apologies for the delay, turns out GitHub doesn't notify new commits on pull requests.

I've merged this to a new branch (chunks) and made some edits which I hope you're happy with, none of which change the layout of frames - comments here or new pull request on that branch welcome if there's anything you'd like to change.

I will update the source code on the chunks branch then merge to master once I've done that.

Perhaps we could exchange sample movies to verify we are both interpreting the spec the same way.

@mbechard
Copy link
Contributor Author

Sorry for the delay also, I was on vacation. This looks good to me. One comment I was wrestling with was this paragraph

“Decoders encountering a section of an unknown type should attempt to continue decoding the frame if other sections provide adequate information to do so. Some sections are only permitted inside other sections, but at any given hierarchical level sections may occur in any order.”

I don’t think there is any way for a decoder to know if there is adequate info to decode when it encounters an unknown section type. Already with the new section types we have added, attempting to blindly decompress will cause unknown issues, possibly a crash.

From: Tom Butterworth [mailto:[email protected]]
Sent: Thursday, September 5, 2013 11:01 AM
To: Vidvox/hap
Cc: mbechard
Subject: Re: [hap] Update HapVideoDRAFT.md (#3)

Apologies for the delay, turns out GitHub doesn't notify new commits on pull requests.

I've merged this to a new branch (chunks) and made some edits which I hope you're happy with, none of which change the layout of frames - comments here or new pull request on that branch welcome if there's anything you'd like to change.

I will update the source code on the chunks branch then merge to master once I've done that.

Perhaps we could exchange sample movies to verify we are both interpreting the spec the same way.


Reply to this email directly or view it on GitHub #3 (comment) .Image removed by sender.

@bangnoise
Copy link
Collaborator

This is basically a requirement to not give up if you have all the information you need to decode as well as some you can't make sense of.

A decoder which understands every entry in the secondary compressor table and has a chunk size table knows it has adequate information to proceed, even if other unrecognised section types are present.

This permits the a decoder written to this spec to decode frames after the addition of future section types which may be required by future secondary compressors and which may be present even if those compressors haven't been used in a given frame.

The spec effectively limits the addition of new section types to be permitted only within the decode instructions container.

@bangnoise bangnoise merged commit 77828f8 into Vidvox:master May 5, 2014
@ncruces
Copy link

ncruces commented Jul 25, 2014

Hi,

Now that the decode side of this has been merged, do you guys have any reference file that you've used to test this?

Thanks!

@mbechard
Copy link
Contributor Author

All files created with the latest experimental version of TouchDesigner (using the Movie Out TOP) will make use of it.
http://www.derivative.ca/088/Downloads/Experimental.asp

@bangnoise
Copy link
Collaborator

@ncruces
Copy link

ncruces commented Jul 30, 2014

Thanks. This helped me validade my implementation.

At this resolution the added complexity doesn't really pay off, with 0.6ms decode time for the simpler serial implementation, vs. 0.4ms for the parallel implementation on a quad-core, but I imagine width larger resolutions it will make more sense.

@bangnoise
Copy link
Collaborator

Indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants