Skip to content

PennChopMicrobiomeProgram/marc_db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

marc_db

A simple database and query interface for the microbial archive

Schema description

Isolates

isolate_id: needs to be Unique. The sequencing lab is currently in charge of the numbering system, but we need to double check to make sure they are unique (Throw and error if there are duplicates for now.)

subject_id: Unique ID created for each MRN, obtained from program 1.

specimen_id: Unique ID created for each specimen barcode, obtained from program 1.

source: Blood, oral swab etc

suspected_organism: Can be labeled as "unknown" which is distinct from NA or left blank

special_collection: can be empty. If these isolates are collected under the purview of a project, specify it here (like bacteremia / NICU)

received_date

cryobanking_date

Aliquots

The isolates are aliquoted and kept in tubes in the freezer.

The table will be populated from the columns Tube barcode a/b/c and Box-name_position a/b/c. We don’t need the bead tube barcodes. These hold the barcodes of the glycerol stocks of the isolates. You need both tube barcode and box name to locate an isolate. An isolate can have more than one aliquot (a/b/c). For example old isolates have 3 aliquots, the newer ones only have 2. These would be multiple entries in the database with the same isolate_id.

isolate_id

tube_barcode

box_name

Once the above database is set, let’s touch base and talk about next steps. But just as a preview, we eventually want to add more tables:

Sequencing

This will point us to which run the isolates are in. Some isolates may be run more than once. Those would be new entries in the database.

isolate_id

sample_registry_id (one isolate may be run multiple times) The runs are also registered per lane. We only need to register the first lane

AssemblyStats

These are summary results from the pipeline. This is probably the most rough draft of the schema. These results will most likely be represented in multiple tables.

Isolate_id

sga_output_id (create a unique ID by concating isolateID and entry number or timestamp (?) e.g. marc.bacteremia.1017.a_1)

time_stamp (when was it created)

config_file (information on databases on softwares sga_extension_version (add it to config file)

assembly_fasta_path: Where do we store the assembly fastq file?

N50

of contigs

species

contaminations

Other outputs as we finalize the sga pipeline

MLST

sga_output_id

schema

sequence_type

alleles

AbxResistance

sga_output_id

antibiotics

gene_name

And most likely a few more tables to contain the data from other software.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages