From 4ed601346e0aaa794cfdfbdf12923c179ccdad62 Mon Sep 17 00:00:00 2001 From: ccl-core Date: Mon, 14 Jul 2025 10:27:40 +0000 Subject: [PATCH 1/2] Update specs on AudioObjects --- docs/croissant-spec-draft.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/docs/croissant-spec-draft.md b/docs/croissant-spec-draft.md index 1e18661bc..246967497 100644 --- a/docs/croissant-spec-draft.md +++ b/docs/croissant-spec-draft.md @@ -1789,6 +1789,25 @@ Bounding boxes are common annotations in computer vision. They describe imaginar } ``` + +### AudioObject + +Croissant uses Schema.org [AudioObject](https://schema.org/AudioObject) to represent an Audio feature. An AudioObject is a standard feature that represents a segment of audio as a digital sound recording. Croissant provides the audio-specific `cr:samplingRate` attribute, which can be specified at the audio field's `Source`: + +```json +{ + "@type": "cr:Field", + "@id": "recordset/audio", + "dataType": "sc:AudioObject", + "source": { + "fileSet": { "@id": "files" }, + "extract": { "fileProperty": "content" }, + "samplingRate": "16000", + } +} +``` + + ### SegmentationMask Segmentation masks are common annotations in computer vision. They describe pixel-perfect zones that outline objects or groups of objects in images or videos. Croissant defines `cr:SegmentationMask` with two manners to describe them: From 2885dfc12f00debc667cebc1a9c2cc1cdcd542af Mon Sep 17 00:00:00 2001 From: ccl-core Date: Mon, 14 Jul 2025 10:42:22 +0000 Subject: [PATCH 2/2] Update data type --- docs/croissant-spec-draft.md | 52 ++++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/docs/croissant-spec-draft.md b/docs/croissant-spec-draft.md index 246967497..4dab66ad5 100644 --- a/docs/croissant-spec-draft.md +++ b/docs/croissant-spec-draft.md @@ -1201,14 +1201,58 @@ Commonly used atomic data types: sc:Float Describes a float. + + cr:Float16 + Describes a float in half-precision floating-point format. + + + cr:Float32 + Describes a float in single-precision floating-point format. + + + cr:Float64 + Describes a float in double-precision floating-point format. + sc:Integer Describes an integer. + + cr:Int8 + Describes an 8-bit integer. + + + cr:Int16 + Describes an 16-bit integer. + + + cr:Int32 + Describes an 32-bit integer. + + + cr:Int64 + Describes an 64-bit integer. + sc:Text Describes a string. + + cr:UInt8 + Describes an 8-bit unsigned integer. + + + cr:UInt16 + Describes an 16-bit unsigned integer. + + + cr:UInt32 + Describes an 32-bit unsigned integer. + + + cr:UInt64 + Describes an 64-bit unsigned integer. + Other data types commonly used in ML datasets: @@ -1219,13 +1263,17 @@ Other data types commonly used in ML datasets: Usage - sc:ImageObject - Describes a field containing the content of an image (pixels). + cr:AudioObject + Represents a segment of audio as a digital sound recording. Refer to the section "ML-specific features > Bounding boxes". cr:BoundingBox Describes the coordinates of a bounding box (4-number array). Refer to the section "ML-specific features > Bounding boxes". + + sc:ImageObject + Describes a field containing the content of an image (pixels). + cr:Split Describes a RecordSet used to divide data into multiple sets according to intended usage with regards to models. Refer to the section "ML-specific features > Splits".