- generate_dendrogram.py: finished implementing the dendrogram subcommand
- generate_dendrogram.py: Began to create the logic to construct dendrograms from the DRIVE output files
- network_algorithm.py: moved relevant code for the network algorithm into its own folder
- dendrogram: Adding the ability to generate a dendrogram from the drive networks
- dendrogram_subcommand.py: fixed a type annotation to be compatible with python 3.9
- testing-and-parser/init.py: added integration test for dendrogram functionality and fixed bug where isort messed up imports
- pvalues.py: updated the size of the network to reflect removal of proband
- pyproject.toml: updated the logging module to pull it from pypi
- logging: added a log message to report the drive version
- cluster.py: Changed the minimum network size to be more intuitive and allow reporting of 2 person networks
- cluster.py: removed the print statements that were clogging the screen
- filter.py: removed the chromosome check
- filter.py: Original fix didn't work. Forgot that the attribute was a named tuple and is immutable. Also realized I don't need to update the value because the downstream filter using it was redundant. Only updated the check and the logging message
- filter.py-and-pyproject.toml: Adjust code so that chromosomes can also use chr prefix
- filter.py: Couldn't read in build 38 data where the chromosome has prefix chr
- Dockerfile: Fixed the docker file to use the debian:12.6 image
- drive now writes out the exclusion IDs to a column
- case_file_parser.py: Made sure fillna was inserting integer not string which was affecting the exclusion counts
- case_file_parser.py: Fixed an issue with how the dictionary was being built with all the keys even if we only wanted one phenotype
- filter.py: added a message that tells how many individuals from the phenotype file were actually found in the IBD files
- drive.py: remove unnecessary logging statements
- case_file_parser.py: removed print statement that was used for debugging
- drive.py: added a flag to show version of drive being used
- data_container.py: adjusted the type hint to reflect the fact that the grid list is actually a set and not a list
- filter.py: Changed a description message of how many pairs were found
- fixed issue where no segments were being found
- case_file_parser.py: Added a way to specify a specific phecode column
- case_file_parser.py: Changed the parser to use xopen for better performance
- fixed styling errors and a type annotation
- case_file_parser.py: changed to using pandas in the parser because it is more efficient
- removed the xopen dependencies because it might have been slower than open
- pyproject.toml: updated the version number
- case_file_parser.py: added ability to specify one phenotype column in a matrix
- ClusterHandler.redo_clustering: bug caused when no individuals shared pairwise IBD segments and graph was improperly formed
- IbdFilter: for the id columns to be read in as string when reading ibd file chunks
- case_file_parser.py: Fixed a bug where the case file parser was not identifying the appropriate index if the user only wished to find a specific haplotype
- drive.py: fixed a typer in the recluster flag help message
- switched from using typer to just argparse
- case_file_parser.py: fixed bug that the wrong method was called on a set
- linted the filter.py file
- Made sure that the program supports build38 ibd files
- cluster.py: fixed a bug that caused the IDs to mismatch
- pvalues.py: added an output column that describes which cases are in the network for a specific phenotype
- cluster: fixed a bug in the clustering where the incorrect ids were being used to cluster
- config.json: updated the module paths to be correct which meant importing from drive first
- pvalues.py: updated the factory import to use the module
- pvalues.py: when there is no overlap between the network members and the cohort cases then I added a section to just write N/A instead of a blank spot to be consist with prior design decisions
- callbacks.py: moved config.json into drive src folder so that it will be included in the pypi build (hopefully).
- filter.py: removed unnecessary print statements
- Added an overlap filter
- segment-filtering: started add functionality to filter.py for overlapping segment filtering
- segment-filtering: Add an option to filter to segments that overlap target loci and not just those that contain the segment
- generate_indices.py: fixed the index for the cM column in ILASH model to be correct
- network_writer: correct how the output file name is made when it contains a period
- pvalue: fixed a bug in pvalue.py where the phenotype percentage was not calculated properly
- fixed bug where the loglevel was not being reset to the original one in the record_inputs method
- fixed cases when there are no controls
- Fixed a bug where the header line was appending a new line incorrectly
- log situation where ibd_pd is empty and early termination of program
- Add a check to see if dataframe is empty after filtering for individuals. If so then it continues to next loop
- logging: Added more informative debug logging to the case_file_parser.py
- switched list in phenotype dictionary to sets
- adding more informative logging statements
- make sure that the ibd file matches the chromosome of the chromosome target
- updated the exclusion criteria so it includes blank spaces
- updated set operations
- Fixed typos that were messing with the pvalue calculation
- logging: Fixed the record_inputs method of the logger so that now it records the inputs and writes it to a file
- logging: Changed the code to adapt to the new OOP logger style
- Removed duplicate values from the ibd_vs attribute of the IbdFilter
- Finished refactoring drive
- refactor filter to remove unnecessary function and to determine all individuals in cohort
- plugins: Added a config file for the plugin systemm to specify what plugins exist
- plugins: Setup the plugin architecture
- Updated poetry lock file for dependencies
- clustering: Finished refactoring the clustering module
- Created models for the Data class and the network class
- removed the accidental file
- moved callbacks into utilities modules
- Added the load_phenotype_descriptions to the init.py
- Added a function called load_phenotype_descriptions that will read in the phecode descriptions file if the user passes it
- Added attribute to keep a list of all individuals in cohort
- phenotyping: adjusted the phenotype parser to support multiple phenotypes
- phenotypeing: refactored how phenotype files are read in to determine case, control, or exclusion individuals
- fixed merge conflict in .gitmodules
- merging the proper changes
- clustering: finished first refactor of the clustering algorithm
- Added jupyter notebook as dependency group to test.
- igraph-clustering: Fixed clustering bug
- logging: changed log levels to be 1 or 2 to reflect typers use of -v or -vv
- started adding in the redo clustering steps to the code
- Adjusted the imports
- Made sure to reset the index so that the dataframe is the same as the original code
- PhenotypeFileParser: added functionality to the parser so it can determine cases/controls/exclusions
- gitignore: Add the /tests/test_input directory to version controls
- logging: Changed the logging in the filter.py file
- name-change: Changed the filter module to filters to avoid name collision with filter function
- Added a directory for utility functions.
- clustering: Broke the Networks class into two classes
- clustering: Finished the initial cluster step and fixed the missing attribute in igraph 0.10.4
- Created the clustering class and wrote functionality for the initial clustering step
- Starting refactoring the clustering and reorganizing the models
- type-hints: using type hints from Typing module
- Error-Handling: updated the Filter._remove_dups method to raise a KeyError if columns aren't present
- logging: added a logging submodule
- logging: Added a submodule for logging
- filtering-ibd-file: Restructured hung-hsin's filtering into a module
- logging: added logging to the program
- log: removed the messed up log folder
- removed the log from the .gitmodules
- debugging: added a str method to indices classes
- DRIVE.py: moved DRIVE.py to drive.py and added a pre-commit hook
- global-variables: removed more global variables
- global-variable-removal: removed some global variables that were constants and converted them to user options in the commandline
- drive/generate_indices.py: remove branching in create_indices
- file-indices: Switched to a strategy pattern
- split_target_string: Added a check to the split_target_string function
- drive/DRIVE.py: refactor how the target string was split and how to indices are created
- drive/callbacks.py: created callback function to make sure that the input ibd file exist
- DRIVE.py: Refactored CLI to use typer-cli
- profiles/python_3_8_drive_dcm_hh.prof: addeed a profile for python 3.8
- pyproject.toml: Added static type checkers and linters
- profiles: added directory to keep track of profiles