You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the nf-core module mergemetaphlantables uses the script merge_metaphlan_tables.py that comes along the MetaPhlAn software. This script takes a number of MetaPhlAn profiles as input and merges them using some basic merge functionality of Python's pandas module.
Prior to merging, the script determines the sample name of the profile by parsing the filename and removing the file extension and the addition _profile: https://github.com/biobakery/MetaPhlAn/blob/b7e6670831f4842afdf3b0a8531a6f676ed56c45/metaphlan/utils/merge_metaphlan_tables.py#L36
Applying this to the filenaming scheme used by taxprofiler, for which the MetaPhlAn profiles have filenames following the scheme <sample name>_<database name>.metaphlan_profile.txt, this leads to the case that each sample name will be <sample name>_<database name>.metaphlan.
While the nf-core module mergemetaphlantables does the job of merging the tables, I as the user have to manually edit this merged tables and clean the sample names when I don't want to have the database name and the suffix .metaphlan in the sample names.
Therefore, I would suggest that it would make sense to either replace the MetaPhlAn script merge_metaphlan_tables.py with a custom script that can handle the divergent filename pattern introduced by nf-core/taxprofiler or adding some code, e.g. sed, to remove the additional suffix.
The text was updated successfully, but these errors were encountered:
Yes, I indeed have. But currently taxpasta seems to fail on MetaPhlAn output, at least for the tables I was trying to run it on. It is related to the bug here: taxprofiler/taxpasta#140
merge_metaphlan_tables.py does a simple join on the name of the clades and ignores the NCBI tax ids. I am currently more happy with this sort of merging than the one discussed in the issue above, in which one would sum up all taxa without a tax id to a new category called "unclassified".
Description of feature
Currently, the nf-core module
mergemetaphlantables
uses the scriptmerge_metaphlan_tables.py
that comes along the MetaPhlAn software. This script takes a number of MetaPhlAn profiles as input and merges them using some basic merge functionality of Python'spandas
module.Prior to merging, the script determines the sample name of the profile by parsing the filename and removing the file extension and the addition
_profile
: https://github.com/biobakery/MetaPhlAn/blob/b7e6670831f4842afdf3b0a8531a6f676ed56c45/metaphlan/utils/merge_metaphlan_tables.py#L36Applying this to the filenaming scheme used by taxprofiler, for which the MetaPhlAn profiles have filenames following the scheme
<sample name>_<database name>.metaphlan_profile.txt
, this leads to the case that each sample name will be<sample name>_<database name>.metaphlan
.While the nf-core module
mergemetaphlantables
does the job of merging the tables, I as the user have to manually edit this merged tables and clean the sample names when I don't want to have the database name and the suffix.metaphlan
in the sample names.Therefore, I would suggest that it would make sense to either replace the MetaPhlAn script
merge_metaphlan_tables.py
with a custom script that can handle the divergent filename pattern introduced by nf-core/taxprofiler or adding some code, e.g.sed
, to remove the additional suffix.The text was updated successfully, but these errors were encountered: