This code is designed to upload integration sites, PCR breakpoints, and multihits identified by intSiteCaller and upload them to the intsitesdev database. The database is currently located at microbxxx.med.upenn.edu:3306 and is described by the included schema, integration_site_schema.sql.
intSiteUploader expects to be passed the primaryAnalysisDirectory as defined when intSiteCaller was run. This directory must have the following minimal file structure:
primaryAnalysisDirectory
├── sample1
│ ├── sites.final.RData
│ ├── multihitData.RData
│ └── allSites.RData
├─miseqid.txt
└─completeMetadata.Rdata
There can be as few or as many samples as the user desires in the
primaryAnalysisDirectory, so long as each sample is represented in completeMetadata.Rdata
. See intSiteCaller's
Documentation for a description of
the values contained in these two metadata files.
Code example:
cd run20150505 # a recent processed run folder
Rscript path/to/intSiteUploader.R
Rscript intSiteUploader.R <primaryAnalysisDir> [mysql_group]
at present default group for mysql_group is intSitesDev237
Note:
- Run intSiteUploader.R only after running intSiteCaller,
- Only run one instance at a time,
- Code to initialize the database is located in the
helperfolder.
intSiteUploader.Rdoes not directly output any commands or data, as all MySQL commands are generated and executed internally.- A script to check that the data is correctly uploaded is to be coded later.
This code is dependent on the RMySQL and stats R packages as well as the mysql executable.
Additionally, the database connection information is expected to be stored in ~/.my.cnf in the following format:
[intSitesDev]
user=foo
password=bar
host=microbxxx.med.upenn.edu
port=3306
database=intsitesdev
user and password coorespond to the account you wish to use to log into the MySQL database.
To distinguish different runs we read
run id from myseqid.txt that should be present in primary analysis folder.
sqllite version of db can be created by:
sqlite3 db.sqlite3 < integration_site_schema.sql