-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add ClipSet to schema #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (non-blocking): The from-bodyxml
transformer currently supports transforming Text
nodes as children of Body
, you can try with 3e535c8f-a0db-58ba-b797-3933bc45187c
as an example.
### `ClipSet` | ||
|
||
```ts | ||
interface ClipSet extends Node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (non-blocking): How feasible would it be to support alternativeText
and alternativeImage
now or in a future iteration? I’m wondering if we could use the poster image as an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So based on what's in Spark Clips:
description
is the equivalent of alternativeText
- "Describe this clip (for those who cannot see it)"
poster
should be usable as an alternativeImage. I'm now wondering why it's a string
and not ImageSet
(as it is in the CAPI response) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How feasible would it be to support alternativeText and alternativeImage now or in a future iteration?
Is this for distributable reasons?
If the answer is yes and the objective is to produce a valid HTML tag that is renderable in every context, then with Clipset data it's possible to create basic HTML video tag that is playable by every browser.
The following it's a simplified example extracted from this article. It could even be simplified more avoiding to use multiple sources but just one of them as src
:
<video
poster="<POSTER_URL>"
>
<source id="video-source-0-daecfa57-a12a-468b-8045-ad32cfa79b3b"
src="https://spark-clips-prod.s3.eu-west-1.amazonaws.com/optimised-media-files/16984229396750/640x360.mp4"
type="video/mp4">
<source id="video-source-1-daecfa57-a12a-468b-8045-ad32cfa79b3b"
src="https://spark-clips-prod.s3.eu-west-1.amazonaws.com/optimised-media-files/16984229396750/1280x720.mp4"
type="video/mp4">
<source id="video-source-2-daecfa57-a12a-468b-8045-ad32cfa79b3b"
src="https://spark-clips-prod.s3.eu-west-1.amazonaws.com/optimised-media-files/16984229396750/1920x1080.mp4"
type="video/mp4">
<source id="video-source-3-daecfa57-a12a-468b-8045-ad32cfa79b3b"
src="https://spark-clips-prod.s3.eu-west-1.amazonaws.com/optimised-media-files/16984229396750/0x0.mp3"
type="audio/mpeg">
<track label="English" kind="captions" srclang="en" src="https://next-media-api.ft.com/clips/captions/32065539">
</video>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epavlova I suspect you may want to use XML for the bodyXML field. In that case you should be able to easily map the data from the Clipset model into an XML format, something like the following (Not one of our clips):
<video id="abc123" xmlns="https://example.com/video/1.0">
<title>Building a Birdhouse</title>
<description>Step-by-step guide.</description>
<language>en</language> <!-- BCP 47 -->
<published>2025-08-01T10:30:00Z</published> <!-- RFC 3339 -->
<duration>PT4M12S</duration> <!-- ISO 8601 -->
<people>
<creator role="host">Pat Lee</creator>
<contributor role="editor">R. Singh</contributor>
</people>
<content>
<container>mp4</container>
<videoCodec>h264</videoCodec>
<audioCodec>aac</audioCodec>
<width>1920</width>
<height>1080</height>
<frameRate>29.97</frameRate>
<bitrate unit="bps">3500000</bitrate>
<aspectRatio>16:9</aspectRatio>
</content>
<files>
<file role="main" bytes="184563210" checksum="sha256:...">
<url>https://cdn.example.com/v/abc123/master.mp4</url>
</file>
<file role="1080p" bitrate="3500000">
<url>https://cdn.example.com/v/abc123/1080p.mp4</url>
</file>
<file role="720p" bitrate="1800000">
<url>https://cdn.example.com/v/abc123/720p.mp4</url>
</file>
</files>
<tracks>
<captions lang="en" kind="subtitles" format="vtt">
<url>https://cdn.example.com/v/abc123/en.vtt</url>
</captions>
<audio lang="en" channels="2"/>
</tracks>
<chapters>
<chapter start="PT0S" title="Intro"/>
<chapter start="PT1M10S" title="Tools"/>
<chapter start="PT2M45S" title="Assembly"/>
</chapters>
<thumbnails>
<image width="1280" height="720">https://cdn.example.com/v/abc123/cover.jpg</image>
<sprite columns="10" rows="10">https://cdn.example.com/v/abc123/sprite.jpg</sprite>
</thumbnails>
<rights>
<license>CC-BY-4.0</license>
<drm scheme="fairplay" keyId="..."/>
</rights>
<tags>
<tag>DIY</tag><tag>woodwork</tag>
</tags>
</video>
Some background on Transcript. In the transcript field from CAPI, Spark store an html fragment (Clipset id <p>Are you getting sacked for telling the truth, home secretary? </p>
<p>[ALARM BLARING IN DISTANCE] </p>
<p></p>
<p>Thank you. </p>
<p>Morning. </p>
<p>Are you going to be the next foreign secretary? </p>
<p>The new foreign secretary, David Cameron? </p> I believe, but Amir or Ash may know better, that this was because it was the payload received by the tool that was automatically generate the transcript, so it was the easier way for Spark to store the payload as it was. To simplify the rendering and doing everything Server Side, we added the following steps:
This seems a convoluted solution and a tech debt we should repay and move Transcript away from RichTextSource that doesn't seem meant for this purpose. I see some possible options:
|
There are a few changes to the current ClipSet workaround in cp-content-pipeline: - dataLayout -> layoutWidth (for consistency with theother nodes) - changes to how the Body for transcripts is represented (pending further discussion with CP) This commit also updates the from-bodyxml transformerfor clipset
47e08af
to
8d7b087
Compare
This is a draft PR for what I think the Clips Schema should be (based on earlier work #86 and what is currently in Workarounds). While the
externals
are still subject to change when we start working on cp-content-pipeline, I think thetransit-tree
properties are in a good place, and can be merged to help with the publishing/migration aspects.There's a few considerations / changes to how it currently works:
dataLayout
tolayoutWidth
for consistency with other nodes (this will require a change in cp-content-pipeline / spark / possibly next-home-page / possibly ft-app)a. ClipSet only allows a limited number of layouts, I assume that is intentional?
raw
,structured
,references
). I'm not really sure how we model that in content-tree, or if. If we need content-tree to be different to cp-content-pipeline (i.e. maintain a workaround), that would also mean the UI component itself isn't really transferable.a. DECISION IRL - we should not replicate the graphql structure in content-tree, but instead make cp-content-pipeline work with this somehow. Some ideas below, but still a bit hazy.
transcript
currently is a with only text. Our currentBody
node in content-tree doesn't allow Text as a top level node - should it? Or should it be a separate thing?a. note - live events also do this, but i've been ignoring them for now!
b. DECISION IRL - Add Text to the allowed bodyNodes (will do this separately)
Appendix - Nested Bodies
As mentioned, there is a challenge around the mismatch between how cp-content-pipeline represents nested bodies (GraphQL
RichText
type), and how it is in content tree (Body
as an attribute). I have a couple of ideas around how we can make cp-content-pipeline work with the Clip Transcript, but someone that knows it better than me might have more thoughts!body
instead of theRichText
type:✅
⚠️ would be a breaking api change
bodyTree
in cp-content-pipeline-api more accurately reflects content-tree❌ i think it works okay for this case, because a transcript is simple. I don't know if it works as well for nested bodies that might have
references
(e.g. if we had the same logic for a CCC fallback that includes images). Maybe there's something around merging the references with the top level ones?? but sounds complicatedcp-content-pipeline-ui
has a Clip workaround that expects a different format:✅ not a breaking change maybe?
✅ still have a shared
Clip
component that can be used in e.g. Spark Preview that expects content-tree format🤔 how does it work with references?
🤔 not sure if it's logically the right place??
Testing Notes
I have tested the transformer with an article that contains a clip that is currently failing: