samtools · d-cameron · Sep 9, 2025 · Aug 11, 2025 · Aug 11, 2025
diff --git a/VCFv4.1.tex b/VCFv4.1.tex
@@ -517,7 +517,7 @@ \subsection{Encoding Structural Variants}
   \item An imprecise deletion of approximately 105 bp.
   \item An imprecise deletion of an ALU element relative to the reference.
   \item An imprecise insertion of an L1 element relative to the reference.
-  \item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
+  \item An imprecise duplication of approximately 21kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
   \item An imprecise tandem duplication of 76bp. The sample genotype is copy number 5 (but the two haplotypes are not known).
 \end{enumerate}
 
@@ -957,8 +957,8 @@ \section{BCF specification}
 VCF is very expressive, accommodates multiple samples, and is widely used
 in the community.  Its biggest drawback is that it is big and slow.
 Files are text and therefore require a lot of space on disk.  A normal batch
-of \~100~exomes is a few GB, but large-scale VCFs with thousands of exome
-samples quickly become hundreds of GBs.  Because the file is text, it is
+of \~100~exomes is a few gigabytes, but large-scale VCFs with thousands of exome
+samples quickly become hundreds of gigabytes.  Because the file is text, it is
 extremely slow to parse.
 
 Overall, the idea behind is BCF2 is simple.  BCF2 is a binary, compressed

diff --git a/VCFv4.2.tex b/VCFv4.2.tex
@@ -534,7 +534,7 @@ \subsection{Encoding Structural Variants}
   \item An imprecise deletion of approximately 205 bp.
   \item An imprecise deletion of an ALU element relative to the reference.
   \item An imprecise insertion of an L1 element relative to the reference.
-  \item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
+  \item An imprecise duplication of approximately 21kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
   \item An imprecise tandem duplication of 76bp. The sample genotype is copy number 5 (but the two haplotypes are not known).
 \end{enumerate}
 
@@ -974,8 +974,8 @@ \section{BCF specification}
 VCF is very expressive, accommodates multiple samples, and is widely used
 in the community.  Its biggest drawback is that it is big and slow.
 Files are text and therefore require a lot of space on disk.  A normal batch
-of \~100~exomes is a few GB, but large-scale VCFs with thousands of exome
-samples quickly become hundreds of GBs.  Because the file is text, it is
+of \~100~exomes is a few gigabytes, but large-scale VCFs with thousands of exome
+samples quickly become hundreds of gigabytes.  Because the file is text, it is
 extremely slow to parse.
 
 Overall, the idea behind is BCF2 is simple.  BCF2 is a binary, compressed

diff --git a/VCFv4.3.tex b/VCFv4.3.tex
@@ -874,7 +874,7 @@ \subsection{Encoding Structural Variants}
   \item An imprecise deletion of approximately 205 bp.
   \item An imprecise deletion of an ALU element relative to the reference.
   \item An imprecise insertion of an L1 element relative to the reference.
-  \item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
+  \item An imprecise duplication of approximately 21kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence).
   \item An imprecise tandem duplication of 76bp. The sample genotype is copy number 5 (but the two haplotypes are not known).
 \end{enumerate}
 
@@ -1454,7 +1454,7 @@ \section{BCF specification}
 VCF is very expressive, accommodates multiple samples, and is widely used in the community.
 Its biggest drawback is that it is big and slow.
 Files are text and therefore require a lot of space on disk.
-A normal batch of a hundred exomes is a few GB, but large-scale VCFs with thousands of exome samples quickly become hundreds of GBs.
+A normal batch of a hundred exomes is a few gigabytes, but large-scale VCFs with thousands of exome samples quickly become hundreds of gigabytes.
 Because the file is text, it is extremely slow to parse.
 
 Overall, the idea behind is BCF2 is simple.

diff --git a/VCFv4.4.tex b/VCFv4.4.tex
@@ -1882,7 +1882,7 @@ \section{BCF specification}
 VCF is very expressive, accommodates multiple samples, and is widely used in the community.
 Its biggest drawback is that it is big and slow.
 Files are text and therefore require a lot of space on disk.
-A normal batch of a hundred exomes is a few GB, but large-scale VCFs with thousands of exome samples quickly become hundreds of GBs.
+A normal batch of a hundred exomes is a few gigabytes, but large-scale VCFs with thousands of exome samples quickly become hundreds of gigabytes.
 Because the file is text, it is extremely slow to parse.
 
 Overall, the idea behind is BCF2 is simple.

diff --git a/VCFv4.5.tex b/VCFv4.5.tex
@@ -2050,7 +2050,7 @@ \section{BCF specification}
 VCF is very expressive, accommodates multiple samples, and is widely used in the community.
 Its biggest drawback is that it is big and slow.
 Files are text and therefore require a lot of space on disk.
-A normal batch of a hundred exomes is a few GB, but large-scale VCFs with thousands of exome samples quickly become hundreds of GBs.
+A normal batch of a hundred exomes is a few gigabytes, but large-scale VCFs with thousands of exome samples quickly become hundreds of gigabytes.
 Because the file is text, it is extremely slow to parse.
 
 Overall, the idea behind is BCF2 is simple.