Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work with remote file in pileup_bam? #2

Open
cmdcolin opened this issue Dec 23, 2017 · 6 comments
Open

Work with remote file in pileup_bam? #2

cmdcolin opened this issue Dec 23, 2017 · 6 comments
Assignees

Comments

@cmdcolin
Copy link

Hi there. Great work on this library.
I was curious if you have any ideas on how you could envision working with remote files in a context similar to pileup_bam

When we use the "open file" dialog we get a nice File type Blob.which works well with the webworker. I was having trouble getting the idea to correspond with remote files though. Are there other htslib functions that could help or should the idea of a remote blob be implemented ?

Thanks!

@pjb7687
Copy link
Member

pjb7687 commented Dec 27, 2017

Hi cmdcolin,

Since cross-domain request is not allowed on web browser, so I guess your question is to open file remotely within the same domain. As you noticed already, currently this library does not contain a module for reading remote files yet. I guess it would be tricky to implement since all available JavaScript API that retrieves remote blob are asynchronous. Well, I think there are two possible ways:

  1. Writing C addon using Emterprter

Emscripten has some APIs which support async operations, called 'Emterpreter'. Emterpreter automatically saves and restores stack for the asnyc operations. For example, 'wget' is supported by Emscripten via 'Emterpreter':
https://kripken.github.io/emscripten-site/docs/api_reference/emscripten.h.html#c.emscripten_wget

Maybe an addon module for HTSlib could be written using such APIs, however, recently I found that Emterpreter is not still so stable yet... So I don't recommend this way for now, but in the future it would be the easiest solution.

  1. Writing JavaScript library that reads remote files

Since BAM files are basially bgzip files, which means it is comprised of blocks (padded by zeros). Each block is a valid gzip file, and it can be retrieved using JavaScript and processed by HTSlib. JavaScript will hold the results from the HTSlib and show everything together to user.

I personally prefer solution 1) since it would be cleaner than 2), but I will wait until Emterpreter becomes stable. But still, 2) is also a possible solution I guess.

Happy new year!

Jeongbin

@cmdcolin
Copy link
Author

Thanks a bunch for the info! Yes it would just be local domain. That first option does sound enticing. I was wanting to get just a small demo going so it could be used in a genome browser. I was looking at the post.js.in to see if that would be a place that option 2 could be implemented but maybe I'll wait for the option 1. Thanks again for the info I had started a similar idea with https://GitHub.com/cmdcolin/htslib_emscripten but I hardly had any examples so pileup_bam was very exciting to see

@cmdcolin
Copy link
Author

cmdcolin commented Jan 6, 2018

Maybe emscripten fetch be used? It says it can be done synchronously and also do byte range requests https://kripken.github.io/emscripten-site/docs/api_reference/fetch.html

@pjb7687
Copy link
Member

pjb7687 commented Jan 8, 2018

It looks like synchronous operation requries pthread which is only supported by Firefox Nightly (https://kripken.github.io/emscripten-site/docs/porting/pthreads.html?highlight=pthread). However I think it could be also usable in the future.

@cmdcolin
Copy link
Author

cmdcolin commented Jan 8, 2018

Hmm. I am not sure why that comment exists or if what I was doing is sort of a wrong path, but I was able to get a little fetch example working.

I tried to first get it working based on this repo, but I had trouble getting USE_PTHREADS to turn on when the LINKABLE flag is also turned on in CMakeLists.txt so I sort of ported some code from this repo back to https://github.com/cmdcolin/htslib_emscripten and got a little fetch example to work

It required restructuring some of the stuff so that it didn't launch a webworker from inside a webworker and then passing filenames instead of the regular File blobs. Nevertheless I think it is starting to look hopeful

Here is some console log output that I got for fetching a file

t1

I didn't get range header enabled to get part of the bam file but maybe that is conditional on this issue emscripten-core/emscripten#5865

@cmdcolin
Copy link
Author

I'm hoping this PR can help get part of this issue addressed :) emscripten-core/emscripten#6207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants