My starting point for new projects dealing with data, pandas needs "Highspeed Intuitive Features" Pandas has many great features but I keep having to re-build most of the same tools for each job.
Pandas needs an upgrade
- A datatype 'detection' function so it can operate more independently
- A tool for importing, extraction, transform and loading of now cleaned data to a database
- A database selection function
- A long term memory pulling from github
I took inspiration from Guido Van Rossum's approach to python, Joel Grus approach to jupyter notebooks, the 12 factor app and Joel Splonsky.
- Begin with the end in mind so start with a template
- Establish your .env variables
- Create .gitignore file
- Add your name
- What is your preference for the database
- Where are we accessing data from
- Do which statistical tests
- Enable feature selection for ML training
- What else?
There will be bugs we will save those to sqlite db saved on the repo
PG_fetch.py
compare_methods_to_insert_bulk_data gives error 'AttributeError: module 'cursor' has no attribute 'execute''my_lite_store.db
is duplicated in code and belongs in /data
PR are accepted
- Complete the GitHub workflows
- Label 3 bugs and fix one
- Suggest 3 features and submit on PR
- SOURCE [Muhd-Shahid](https://github.com/Muhd-Shahid/Learn-Python-Data-Access/blob/main/PostgreSQL/Part%204%20%20Comparison %20of%20Methods%20for%20Importing%20bulk%20CSV%20data%20Into%20PostgreSQL%20Using%20Python.ipynb)
Not in paticular order
- .env loaded from # TODO https://pypi.org/project/python-dotenv/
- .gitignore setup for the repo
- data_dirty/ PATH setup from .env
- data_dirty/ added somehow to .bash_profile
- postgresql database setup via #TODO https://medium.com/analytics-vidhya/part-4-pandas-dataframe-to-postgresql-using-python-8ffdb0323c09
- maybe do pandas logging # TODO https://towardsdatascience.com/introducing-pandas-log-3240a5e57e21