Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions CRAMv2.1.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1668,11 +1668,11 @@ \subsection{\textbf{Choosing the container size}}

$\bullet$ Applications typically buffer containers into memory

We recommend 1MB containers. They are small enough to provide good random access
We recommend 1MiB containers. They are small enough to provide good random access
and streaming performance while being large enough to provide good compression.
1MB containers are also small enough to fit into the L2 cache of most modern CPUs.
1MiB containers are also small enough to fit into the L2 cache of most modern CPUs.

Some simplified examples are provided below to fit data into 1MB containers.
Some simplified examples are provided below to fit data into 1MiB containers.

\textbf{Unmapped short reads with bases, read names, recalibrated and original
quality scores}
Expand All @@ -1681,27 +1681,27 @@ \subsection{\textbf{Choosing the container size}}
quality scores. We estimate 0.4 bits/base (read names) + 0.4 bits/base (bases)
+ 3 bits/base (recalibrated quality scores) + 3 bits/base (original quality scores)
=\textasciitilde{} 7 bits/base. Space estimate is (10,000 * 100 * 7) / 8 / 1024
/ 1024 =\textasciitilde{} 0.9 MB. Data could be stored in a single container.
/ 1024 =\textasciitilde{} 0.9 MiB. Data could be stored in a single container.

\textbf{Unmapped long reads with bases, read names and quality scores}

We have 10,000 unmapped long reads (10kb) with read names and quality scores. We
estimate: 0.4 bits/base (bases) + 3 bits/base (original quality scores) =\textasciitilde{}
3.5 bits/base. Space estimate is (10,000 * 10,000 * 3.5) / 8 / 1024 / 1024 =\textasciitilde{}
42 MB. Data could be stored in 42 x 1MB containers.
42 MiB. Data could be stored in 42 x 1MiB containers.

\textbf{Mapped short reads with bases, pairing and mapping information}

We have 250,000 mapped short reads (100bp) with bases, pairing and mapping information.
We estimate the compression to be 0.2 bits/base. Space estimate is (250,000 * 100
* 0.2) / 8 / 1024 / 1024 =\textasciitilde{} 0.6 MB. Data could be stored in a single
* 0.2) / 8 / 1024 / 1024 =\textasciitilde{} 0.6 MiB. Data could be stored in a single
container.

\textbf{Embedded reference sequences}

We have a reference sequence (10Mb). We estimate the compression to be 2 bits/base.
Space estimate is (10000000 * 2 / 8 / 1024 / 1024) =\textasciitilde{} 2.4MB. Data
could be written into three containers: 1MB + 1MB + 0.4MB.
Space estimate is (10000000 * 2 / 8 / 1024 / 1024) =\textasciitilde{} 2.4MiB. Data
could be written into three containers: 1MiB + 1MiB + 0.4MiB.

\newpage

Expand Down
16 changes: 8 additions & 8 deletions CRAMv3.tex
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
\renewcommand{\footrulewidth}{0pt}

\newcommand\bits{\,\mbox{bits}}
\newcommand\MB{\,\mbox{MB}}
\newcommand\MiB{\,\mbox{MiB}}

\setlength{\parindent}{0cm}
\setlength{\parskip}{0.18cm}
Expand Down Expand Up @@ -2573,9 +2573,9 @@ \subsection{\textbf{Choosing the container size}}

We recommend 1 megabyte containers. They are small enough to provide good random access
and streaming performance while being large enough to provide good compression.
1\MB\ containers are also small enough to fit into the L2 cache of most modern CPUs.
1\MiB\ containers are also small enough to fit into the L2 cache of most modern CPUs.

Some simplified examples are provided below to fit data into 1\MB\ containers.
Some simplified examples are provided below to fit data into 1\MiB\ containers.

\textbf{Unmapped short reads with bases, read names, recalibrated and original
quality scores}
Expand All @@ -2584,27 +2584,27 @@ \subsection{\textbf{Choosing the container size}}
quality scores. We estimate 0.4 bits/base (read names) + 0.4 bits/base (bases)
+ 3 bits/base (recalibrated quality scores) + 3 bits/base (original quality scores)
$\approx$ 7 bits/base. Space estimate is $10\,000 \times 100 \times 7 \bits
\approx 0.9 \MB$. Data could be stored in a single container.
\approx 0.9 \MiB$. Data could be stored in a single container.

\textbf{Unmapped long reads with bases, read names and quality scores}

We have 10,000 unmapped long reads (10kb) with read names and quality scores. We
estimate: 0.4 bits/base (bases) + 3 bits/base (original quality scores) $\approx$
3.5 bits/base. Space estimate is $10\,000 \times 10\,000 \times 3.5 \bits
\approx 42 \MB$. Data could be stored in $42 \times 1\MB$ containers.
\approx 42 \MiB$. Data could be stored in $42 \times 1\MiB$ containers.

\textbf{Mapped short reads with bases, pairing and mapping information}

We have 250,000 mapped short reads (100bp) with bases, pairing and mapping information.
We estimate the compression to be 0.2 bits/base. Space estimate is $250\,000 \times 100
\times 0.2 \bits \approx 0.6 \MB$. Data could be stored in a single
\times 0.2 \bits \approx 0.6 \MiB$. Data could be stored in a single
container.

\textbf{Embedded reference sequences}

We have a reference sequence (10Mb). We estimate the compression to be 2 bits/base.
Space estimate is $10\,000\,000 \times 2 \bits \approx 2.4 \MB$. Data
could be written into three containers: $1\MB + 1\MB + 0.4\MB$.
Space estimate is $10\,000\,000 \times 2 \bits \approx 2.4 \MiB$. Data
could be written into three containers: $1\MiB + 1\MiB + 0.4\MiB$.

\newpage

Expand Down