These are the notes for myself taken from the specialization IBM - Data Engineering and Machine Learning Using Spark. I have switched from using One Note for any of my note taking to Juypyter notebooks, hosted on Google Colab, mainly because I realized it was a better format for taking technical notes. I could use code block examples and automatically run turning the notes into a better form of active rather than passive learning. I've also found that it is much easier doing the formatting I want using Markdown than the autoformating that Microsoft Office products use, which continues to be a mystery to me. Though this class did use python, most of the code found in the Notebook is unusable before they are for giving examples and are not complete code that can be run. This is the first of the IBM Data Engineering classes that I created in a jupyter notebook. The rest are all in One Note.
My thoughts for that class is that it was fairly poor. This is also reflected on the general score that it received on Coursera, which is 3.8 out of 5 stars, one of the poorer reviewed classes I have seen there. It is the 12th of 13 classes that are part of the IBM Data Engineering Specialization Course, and all of them are generally much better rated. The Course is plagued by too little information, and no real labs before diving straight into a graded lab, part of which was impossible to complete due to a bug. The class seems extremely rushed in its creation, but is a necessary class to complete for the IBM Data Engineering Certificate if people want to receive that. The final graded assignment to complete the class required a lot of guesswork on my end, as material did not have large parts of the final lab. Finally, IBM tends to suggest its own proprietary technology to solve IT issues in its classes, and this is particular class is no exception, despite the fact that most of their solutions would not be considered the best solution from a range of others to chose from. It often also makes the answers to its exams in order to pass its tests semi marketing material (What are the benefits of IBM Cloudant?) I decided to take the IBM classes in Coursera because at 13 courses total, it seemed like a good way to see if there was anything I missed, and reinforce my older knowledge in Data Engineering. I would not say it wasn't useful in that regard, but I believe in the future I will take other classes Amazon and Google offer that I felt had better quality (and are cheaper as well).
In these notes, I tried to draw from my own personal and professional experiences to relate to the subject.
The specialization is estimated to take about 7 hours to learn, one of the shorter classes in the IBM Data Engineering Specialization. It took much longer mainly from having to do troubleshooting to pass the class.
I have made some personal anotations for myself through the comments feature of Google Colab, which I do not believe will transfer through into github. Others are welcome to use these notes as they see fit. None of the Code pieces will actually Run, since they are generally incomplete code to be used illustrate as examples.