Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify proprietary DRM schemes #68

Open
HadrienGardeur opened this issue Aug 12, 2020 · 11 comments
Open

Identify proprietary DRM schemes #68

HadrienGardeur opened this issue Aug 12, 2020 · 11 comments

Comments

@HadrienGardeur
Copy link
Collaborator

While we're already capable of identifying LCP using the EPUB profile and its scheme element, there are other DRM schemes out there which we're currently not capable of identifying.

A user could import an EPUB file that's protected by other schemes such as:

  • Adobe ACS
  • Kobo DRM
  • Google DRM
  • Apple DRM

IMO we should be able to identify all these schemes in RWPM since they could prohibit users from accessing their files. Without this info, Readium Mobile and Desktop could return an unknown error (or worse) which is never ideal.

For each of these DRM schemes we should document:

  • how to identify them (URI)
  • and how to detect them (this probably belongs more in architecture than here)

This information should be added to an appendix of the EPUB Profile.

Any thoughts on this @llemeurfr @danielweck @qnga @mickael-menu ?

@mickael-menu
Copy link
Member

This sounds like a better user experience for sure. That would also solve our dilemma about when to resolve the scheme between the EPUB Parser and Content Protection, if we consider that it's the responsibility of the EPUB parser to detect as many DRM as possible.

@qnga
Copy link
Contributor

qnga commented Aug 12, 2020

I think attempting to support everything is always challenging and each DRM seems to be very specific. To implement ACS support I had to write a custom encryption.xml parser to retrieve non-standard data required by the ACS Connector. Do we want the EPUB parser and the Encryption model to take care of this? I don't think so.

On the contrary, I was going to suggest to remove scheme and profile properties because they are LCP-specific, not link-specific (as far as I know) and, as Mickaël said, we didn't know when and how to fit them. Or at least, we may not use them in non-native RWPM.

Actually, there is another way to return clever errors to users. Readium allows only one ContentProtection by publication, therefore this unique one may be responsible for checking no resource is encrypted with an unknown algorithm. If not so, Resource.read would return a Forbidden error with the algorithm URI. In case where no ContentProtection is built on parsing, we might use a fake one that would check no resource is encrypted, and if not so may attempt to guess the DRM that is used.

I may have drifted away from the initial topic because I cannot completely figure out what the issue is.

@llemeurfr
Copy link
Contributor

I'm convinced from the start that having schema and profile at the level of each encrypted resource is non useful, and this information should be held at the level of the global manifest metadata.

If we go the route of having a "drm" / "protectedBy" property with values like "lcp" "acs" ..., the value currently in "scheme" would go there, along with a sub-property "profile" for dealing with the lcp case today, maybe other types of drm tomorrow.

"drm": { "scheme":"http://readium.org/2014/01/lcp", "profile":"http://readium.org/lcp/basic-profile"}
"drm": { "scheme":"http://adobe.com/drm/acs"}

Now, apart from lcp and acs, we won't find in the wild different types of drm protected ebooks, will we? Kobo, Apple, Google, Amazon protected ebooks are not exported easily from their walled gardens.

@HadrienGardeur
Copy link
Collaborator Author

Do we want the EPUB parser and the Encryption model to take care of this?

We have zero plan for supporting these DRMs on the decryption side, but IMO there's a real use case in being able to identify them and provide better error handling.

On the contrary, I was going to suggest to remove scheme and profile properties because they are LCP-specific, not link-specific (as far as I know) and, as Mickaël said, we didn't know when and how to fit them. Or at least, we may not use them in non-native RWPM.

That's incorrrect, scheme identifies a DRM and is therefore not LCP specific. Same comment for profiles, which are widely used across many different serialization formats and media types.

I'm convinced from the start that having schema and profile at the level of each encrypted resource is non useful, and this information should be held at the level of the global manifest metadata.

That's not the same level of information. Your average LCP protected file will have:

  • fonts obfuscated with the IDPF or Adobe algorithms
  • LCP protected resources (most of the content)
  • non-protected resources (cover, table of contents)

We need to have that information at a resource level, which is consistent with how this information is expressed in EPUB as well (albeit poorly expressed in the case of EPUB).

While I'm certainly in favor of a helper method at a publication level that will process resource-level information and return a value for the publication, this is quite different from adopting a lossy information model like the one you're suggesting.

@mickael-menu
Copy link
Member

To implement ACS support I had to write a custom encryption.xml parser to retrieve non-standard data required by the ACS Connector. Do we want the EPUB parser and the Encryption model to take care of this? I don't think so.

I think the proposal is not meant to replace any custom parsing from the DRMs (we still need to parse the LCPL for example with LCP). It's more to give an information about which DRM is used, for the well-known ones.

@HadrienGardeur
Copy link
Collaborator Author

I think the proposal is not meant to replace any custom parsing from the DRMs (we still need to parse the LCPL for example with LCP). It's more to give an information about which DRM is used, for the well-known ones.

Correct. I see this as a DRM counter-part to the work that has been done to better identify formats in our SDK.

@mickael-menu
Copy link
Member

Although I think this is doable for the scheme, I'm still not sure that the EPUBParser should fill the profile. For example with LCP it means parsing the license.lcpl.

@qnga
Copy link
Contributor

qnga commented Aug 12, 2020

We need to have that information at a resource level, which is consistent with how this information is expressed in EPUB as well (albeit poorly expressed in the case of EPUB).

You're talking about the algorithm property. The fact remains that scheme and profile are not link-specific. We can imagine that encryption.algorithm would say whether the link is encrypted or obfuscated, and a more global information would talk about encryption scheme.

If scheme and profile are widely used, I think giving the responsibility to fit them to the Epub Parser would go against the extensibility provided by the ContentProtection API.

I'd like to see a more concrete specification of better error handling provided by an information about DRM right inside RWPM. It seems to me that the ContentProtection API already fits the need.

@llemeurfr
Copy link
Contributor

(obfuscated, encrypted, non-encrypted) We need to have that information at a resource level

Isn't the algorithm property giving this information?

and a more global information would talk about encryption scheme.

Just a precision: it is not an encryption scheme, the algorithm says all about the encryption of the resource. It is a drm scheme, i.e. the way to find the decryption key.

@HadrienGardeur
Copy link
Collaborator Author

Isn't the algorithm property giving this information?

It only provides partial information and answers two specific questions:

  • is it encrypted?
  • if so, which algorithm is used?

You need to know the scheme and the profile as well to have the whole picture and answer the final question:

  • how can I obtain the key necessary to decrypt this ressource ?

Just a precision: it is not an encryption scheme, the algorithm says all about the encryption of the resource. It is a drm scheme, i.e. the way to find the decryption key.

In most cases that's correct. But one could imagine use cases that are not DRM related.

For example a stronger approach to protect fonts, without going full DRM on it.

@qnga
Copy link
Contributor

qnga commented Aug 12, 2020

Let me remind the existence of the ContentProtectionService, designed to be exposed both as a native API and served over HTTP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants