diff --git a/VCFv4.1.tex b/VCFv4.1.tex index 41b4e069..d1dc3246 100644 --- a/VCFv4.1.tex +++ b/VCFv4.1.tex @@ -517,7 +517,7 @@ \subsection{Encoding Structural Variants} \item An imprecise deletion of approximately 105 bp. \item An imprecise deletion of an ALU element relative to the reference. \item An imprecise insertion of an L1 element relative to the reference. - \item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence). + \item An imprecise duplication of approximately 21kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence). \item An imprecise tandem duplication of 76bp. The sample genotype is copy number 5 (but the two haplotypes are not known). \end{enumerate} @@ -957,8 +957,8 @@ \section{BCF specification} VCF is very expressive, accommodates multiple samples, and is widely used in the community. Its biggest drawback is that it is big and slow. Files are text and therefore require a lot of space on disk. A normal batch -of \~100~exomes is a few GB, but large-scale VCFs with thousands of exome -samples quickly become hundreds of GBs. Because the file is text, it is +of \~100~exomes is a few gigabytes, but large-scale VCFs with thousands of exome +samples quickly become hundreds of gigabytes. Because the file is text, it is extremely slow to parse. Overall, the idea behind is BCF2 is simple. BCF2 is a binary, compressed diff --git a/VCFv4.2.tex b/VCFv4.2.tex index 477b91ea..ee4544f8 100644 --- a/VCFv4.2.tex +++ b/VCFv4.2.tex @@ -534,7 +534,7 @@ \subsection{Encoding Structural Variants} \item An imprecise deletion of approximately 205 bp. \item An imprecise deletion of an ALU element relative to the reference. \item An imprecise insertion of an L1 element relative to the reference. - \item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence). + \item An imprecise duplication of approximately 21kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence). \item An imprecise tandem duplication of 76bp. The sample genotype is copy number 5 (but the two haplotypes are not known). \end{enumerate} @@ -974,8 +974,8 @@ \section{BCF specification} VCF is very expressive, accommodates multiple samples, and is widely used in the community. Its biggest drawback is that it is big and slow. Files are text and therefore require a lot of space on disk. A normal batch -of \~100~exomes is a few GB, but large-scale VCFs with thousands of exome -samples quickly become hundreds of GBs. Because the file is text, it is +of \~100~exomes is a few gigabytes, but large-scale VCFs with thousands of exome +samples quickly become hundreds of gigabytes. Because the file is text, it is extremely slow to parse. Overall, the idea behind is BCF2 is simple. BCF2 is a binary, compressed diff --git a/VCFv4.3.tex b/VCFv4.3.tex index 78fa8d03..ac7a84fd 100644 --- a/VCFv4.3.tex +++ b/VCFv4.3.tex @@ -874,7 +874,7 @@ \subsection{Encoding Structural Variants} \item An imprecise deletion of approximately 205 bp. \item An imprecise deletion of an ALU element relative to the reference. \item An imprecise insertion of an L1 element relative to the reference. - \item An imprecise duplication of approximately 21Kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence). + \item An imprecise duplication of approximately 21kb. The sample genotype is copy number 3 (one extra copy of the duplicated sequence). \item An imprecise tandem duplication of 76bp. The sample genotype is copy number 5 (but the two haplotypes are not known). \end{enumerate} @@ -1454,7 +1454,7 @@ \section{BCF specification} VCF is very expressive, accommodates multiple samples, and is widely used in the community. Its biggest drawback is that it is big and slow. Files are text and therefore require a lot of space on disk. -A normal batch of a hundred exomes is a few GB, but large-scale VCFs with thousands of exome samples quickly become hundreds of GBs. +A normal batch of a hundred exomes is a few gigabytes, but large-scale VCFs with thousands of exome samples quickly become hundreds of gigabytes. Because the file is text, it is extremely slow to parse. Overall, the idea behind is BCF2 is simple. diff --git a/VCFv4.4.tex b/VCFv4.4.tex index e7b277c3..0bfbe70b 100644 --- a/VCFv4.4.tex +++ b/VCFv4.4.tex @@ -1882,7 +1882,7 @@ \section{BCF specification} VCF is very expressive, accommodates multiple samples, and is widely used in the community. Its biggest drawback is that it is big and slow. Files are text and therefore require a lot of space on disk. -A normal batch of a hundred exomes is a few GB, but large-scale VCFs with thousands of exome samples quickly become hundreds of GBs. +A normal batch of a hundred exomes is a few gigabytes, but large-scale VCFs with thousands of exome samples quickly become hundreds of gigabytes. Because the file is text, it is extremely slow to parse. Overall, the idea behind is BCF2 is simple. diff --git a/VCFv4.5.tex b/VCFv4.5.tex index 0530513b..e98c1b6f 100644 --- a/VCFv4.5.tex +++ b/VCFv4.5.tex @@ -2050,7 +2050,7 @@ \section{BCF specification} VCF is very expressive, accommodates multiple samples, and is widely used in the community. Its biggest drawback is that it is big and slow. Files are text and therefore require a lot of space on disk. -A normal batch of a hundred exomes is a few GB, but large-scale VCFs with thousands of exome samples quickly become hundreds of GBs. +A normal batch of a hundred exomes is a few gigabytes, but large-scale VCFs with thousands of exome samples quickly become hundreds of gigabytes. Because the file is text, it is extremely slow to parse. Overall, the idea behind is BCF2 is simple.