-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removing sample(s) from the talon database #104
Comments
Also interested to know. When this happened I just deleted the database and initialize a new one as there isn’t a —resume flag. |
Yes, that has been my approach in the past too, but my new dataset is HUGE and had already been running for nearly 2 weeks before the interruption so I really don't want to do that this time! |
You can use this python code to check if the dataset has been added to your database: import sqlite3
import pandas as pd
db = 'database_name.db'
with sqlite3.connect(db) as conn:
q = 'SELECT dataset_name FROM dataset'
datasets = pd.read_sql_query(q, conn)
print(datasets.dataset_name.tolist()) TALON is pretty good about discarding or not pushing incomplete changes to the database but this is not a surefire method. What I typically do is I make a backup copy of my TALON database before trying to add new datasets to it. That way, if the run fails, I can simply restart using the backup. I'm sorry there's not a better way to do this but this is definitely something that I learned based on getting burned in the past as well. |
That's simple and ingenious. Not sure why I did not think to do that. TY for replies as always. |
Thanks for your input everyone. Does anyone know if there is a way to remove an indivual dataset from a database? If so, I could just remove the dataset that it was part way through proccessing and then readd it... The partially processed dataset definately exists in the database, but I'm not convinced that it has been fully added. I used |
Hi, thanks for the above suggestions. I am also interested in the removal of a sample from the database - was there any update to whether this is possible please? Thanks! |
Is it possible to extract the sample description, the second column in the config file. I been playing around with sqlite3 module trying to get the column headers of the dataset table in the database but its a beyond me. |
You should be able to pull that info out using the following sql query: As an aside, if you're interested in navigating the stuff in the TALON database, I'd definitely recommend downloading a DB viewer such as this one. You can look through the tables and write / test out queries on your tables so you don't have to open up python and sqlite3 every time you want to poke around. As another aside, I am much more comfortable in pandas in python than I am in manipulating these tables through sqlite3. If that's more your speed, there are sqlite3 functions that will literally dump a table from a database into a pandas table (see here for example) to make it easy to work on. |
Hi
How does talon process datasets that have only partially been added to the input database? When running talon again using the same config file and input databse will it continue from part way through the partial dataset, or skip the dataset since part of it is already present? My HPC job was interrupted during processing and I want to know how I can tell whether the sample that was being added at the time has been completed, or if data is still missing after running talon again.
Thanks
The text was updated successfully, but these errors were encountered: