Titanic-Dataset-Tutorial/todo.txt at master · MakeSchool-Tutorials/Titanic-Dataset-Tutorial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102


Concepts

inquery-based tutorials

mean, median, mode

variance & covariance

percent

Correlation, line of best fit prediction

DONE Visualization with matplotlib

PDFs

-no partial derivatives


Standard deviation

Linear Regression


Probability—https://www.dataquest.io/blog/basic-statistics-in-python-probability/


Let's use our little function to take a bigger average.

1. Go to the US Census website [Educational Attainment page](https://www.census.gov/data/tables/2019/demo/educational-attainment/cps-detailed-tables.html)
2. Click on "Both Sexes" and download that spreadsheet
3. Look at the row labeled "Total"
4. Write a function that finds out the average educational attainment of an American in 2019

This last step is a bit of a leap so let's break it down.

1. Let's assign a coefficient to each level of attainment (1,2,3,4, etc)
1. Make a new list where each element is the number of people who achieved that level of education times the level coefficient.
1. Then take the average of that new list.
1. Then divide that average by the total people
1. We should, in the end have a coefficient number that stands for the average level of educational attainment.


```py

data = np.array([8603,	13372,	62259,	34690,	22738,	49937,	22214,	3136,	4529])

def mean_edu_attainment(dataset, total):
  # create list weighted to index of position
  weighted_list = []

  for n, i in dataset:
    weighted_list.push(n*i)

  mean = compute_mean(weighted_list)

  # calculate total
  total = 0
  for n in dataset
    total += n

  mean_attainment = mean/total

  return mean_attainment

mean_edu_attainment(data, total)

```

Linear Regression
  - Remove outliers


Mean and standard divination, what was the chance someone got >80

Data detective—tempurature of two cities, difference and sum.

Nan values

Titanic stowaways, try to determine their ages, what class they hid in etc.

Excellent article on types of bar charts
https://towardsdatascience.com/histograms-and-density-plots-in-python-f6bda88f5ac0

Amazing article on this Dataset
https://towardsdatascience.com/predicting-the-survival-of-titanic-passengers-30870ccc7e8


.astype('category').cat.codes)


tips dataset—https://github.com/mwaskom/seaborn-data/blob/master/tips.csv
Visualizing tips dataset—https://seaborn.pydata.org/tutorial/relational.html


Normal distribution and z-score
https://www.youtube.com/watch?v=4Fta6KQ1QHQ