Patient bam processing by pj-sullivan · Pull Request #7 · rokitalab/ggsashimi

pj-sullivan · 2025-09-30T15:46:54Z

Closes #3

Control bams will be processed separately; this is just for the patient samples.
Once sbfs is configured following the README instructions, run:

cd analyses
bash 01-cram-to-bam.sh -i input/test-input.tsv -m input/manifest.tsv
md5sum -c results/bams/md5sum.txt

rjcorb · 2025-11-03T14:26:02Z

as discussed in Slack-- the shell script should be updated to take input/manifest arguments via command line to make this more robust. The input files provided here could be included as test files in repo.

pj-sullivan · 2025-11-11T18:46:26Z

Made the input files arguments, and added (optional) inputs for defining the columns with the information needed for the script, so input files with a different format can be used.

Also added an md5sum for the resulting files to confirm that the script is working as expected.

rjcorb

This ran successfully for me and the md5sums match yours. I made a few suggestions that I think will ensure this will be more robust against different input data.

rjcorb · 2025-11-26T18:45:51Z

analyses/01-cram-to-bam.sh

+#!/bin/bash
+
+## Define default variables
+kf_id_col=1   # KF patient ID column


I think we should be doing these queries by BS_ID rather than patient ID, because there are often multiple RNA-seq cram files from different sample collections from the same patient.

rjcorb · 2025-11-26T18:51:04Z

analyses/01-cram-to-bam.sh

+chr_col=3     # Chromosome
+pos_col=4     # Position
+label_col=11  # Additional label to add to plot for identification, i.e. gene
+window=10000  # Bases to plot either side of the position given


A few notes here:

To make this more robust, I think this should be updated to consider both the start and end positions and make the windows around those. I think for variant cases only one position is relevant, but for splice events where the exons might be far apart, we'd want to make sure we're capturing a window around this interval for plotting.

I think we might want to create a standardized file format for these input files to make sure these columns indices are always correct. I think the input file should include: ID, chr, start_pos, end_pos, gene_name (anything else?). And then you can build in a check to make sure any input file that is supplied has these columns in this order.

patient bam processing

36c337f

pj-sullivan requested a review from rjcorb September 30, 2025 15:46

pj-sullivan self-assigned this Sep 30, 2025

restructure analyses

c16e470

pj-sullivan marked this pull request as ready for review October 28, 2025 14:41

Base automatically changed from pj-sullivan/docker to master November 11, 2025 17:20

pj-sullivan added 3 commits November 11, 2025 12:54

make inputs arguments

0530ee6

download reference genome

011d7ec

index reference genome and create result md5sum

94f7d9f

pj-sullivan added the ready for review label Nov 11, 2025

rjcorb requested changes Nov 26, 2025

View reviewed changes

pj-sullivan removed the ready for review label Dec 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patient bam processing#7

Patient bam processing#7
pj-sullivan wants to merge 5 commits intomasterfrom
pj-sullivan/bams

pj-sullivan commented Sep 30, 2025 •

edited

Loading

Uh oh!

rjcorb commented Nov 3, 2025

Uh oh!

pj-sullivan commented Nov 11, 2025

Uh oh!

rjcorb left a comment

Uh oh!

rjcorb Nov 26, 2025

Uh oh!

rjcorb Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pj-sullivan commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjcorb commented Nov 3, 2025

Uh oh!

pj-sullivan commented Nov 11, 2025

Uh oh!

rjcorb left a comment

Choose a reason for hiding this comment

Uh oh!

rjcorb Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

rjcorb Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pj-sullivan commented Sep 30, 2025 •

edited

Loading