diff --git "a/Day01-20/02.\347\254\254\344\270\200\344\270\252Python\347\250\213\345\272\217.md" "b/Day01-20/02.\347\254\254\344\270\200\344\270\252Python\347\250\213\345\272\217.md" index c1458234a..c2fade7c9 100755 --- "a/Day01-20/02.\347\254\254\344\270\200\344\270\252Python\347\250\213\345\272\217.md" +++ "b/Day01-20/02.\347\254\254\344\270\200\344\270\252Python\347\250\213\345\272\217.md" @@ -96,7 +96,7 @@ Visual Studio Code 是由微软开发能够在 Windows、 Linux 和 macOS 等操 按照行业惯例,我们学习任何一门编程语言写的第一个程序都是输出`hello, world`,因为这段代码是伟大的丹尼斯·里奇(C 语言之父,和肯·汤普森一起开发了 Unix 操作系统)和布莱恩·柯尼汉(awk 语言的发明者)在他们的不朽著作《*The C Programming Language*》中写的第一段代码,下面是对应的 Python 语言的版本。 -```Python +```python print('hello, world') ``` @@ -104,7 +104,7 @@ print('hello, world') 上面的代码只有一个语句,在这个语句中,我们用到了一个名为`print`的函数,它可以帮助我们输出指定的内容;`print`函数圆括号中的`'hello, world'`是一个字符串,它代表了一段文本内容;在 Python 语言中,我们可以用单引号或双引号来表示一个字符串。不同于 C、C++ 或 Java 这样的编程语言,Python 代码中的语句不需要用分号来表示结束,也就是说,如果我们想再写一条语句,只需要回车换行即可,代码如下所示。此外,Python 代码也不需要通过编写名为`main`的入口函数来使其运行,提供入口函数是编写可执行的 C、C++ 或 Java 代码必须要做的事情,这一点很多程序员都不陌生,但是在 Python 语言中它并不是必要的。 -```Python +```python print('hello, world') print('goodbye, world') ``` @@ -134,7 +134,7 @@ Python 中有两种形式的注释: 1. 单行注释:以`#`和空格开头,可以注释掉从`#`开始后面一整行的内容。 2. 多行注释:三个引号(通常用双引号)开头,三个引号结尾,通常用于添加多行说明性内容。 -```Python +```python """ 第一个Python程序 - hello, world diff --git "a/Day01-20/03.Python\350\257\255\350\250\200\344\270\255\347\232\204\345\217\230\351\207\217.md" "b/Day01-20/03.Python\350\257\255\350\250\200\344\270\255\347\232\204\345\217\230\351\207\217.md" index 977f446a5..924516ff6 100755 --- "a/Day01-20/03.Python\350\257\255\350\250\200\344\270\255\347\232\204\345\217\230\351\207\217.md" +++ "b/Day01-20/03.Python\350\257\255\350\250\200\344\270\255\347\232\204\345\217\230\351\207\217.md" @@ -16,7 +16,7 @@ 1. 整型(`int`):Python 中可以处理任意大小的整数,而且支持二进制(如`0b100`,换算成十进制是4)、八进制(如`0o100`,换算成十进制是64)、十进制(`100`)和十六进制(`0x100`,换算成十进制是256)的表示法。运行下面的代码,看看会输出什么。 - ```Python + ```python print(0b100) # 二进制整数 print(0o100) # 八进制整数 print(100) # 十进制整数 @@ -25,7 +25,7 @@ 2. 浮点型(`float`):浮点数也就是小数,之所以称为浮点数,是因为按照科学记数法表示时,一个浮点数的小数点位置是可变的,浮点数除了数学写法(如`123.456`)之外还支持科学计数法(如`1.23456e2`,表示$\small{1.23456 \times 10^{2}}$)。运行下面的代码,看看会输出什么。 - ```Python + ```python print(123.456) # 数学写法 print(1.23456e2) # 科学计数法 ``` @@ -53,7 +53,7 @@ 下面通过例子来说明变量的类型和变量的使用。 -```Python +```python """ 使用变量保存数据并进行加减乘除运算 @@ -71,7 +71,7 @@ print(a / b) # 3.75 在 Python 中可以使用`type`函数对变量的类型进行检查。程序设计中函数的概念跟数学上函数的概念非常类似,数学上的函数相信大家并不陌生,它包括了函数名、自变量和因变量。如果暂时不理解函数这个概念也不要紧,我们会在后续的内容中专门讲解函数的定义和使用。 -```Python +```python """ 使用type函数检查变量的类型 @@ -98,7 +98,7 @@ print(type(d)) # 下面的例子为大家演示了Python中类型转换的操作。 -```Python +```python """ 变量的类型转换操作 diff --git "a/Day01-20/07.\345\210\206\346\224\257\345\222\214\345\276\252\347\216\257\347\273\223\346\236\204\345\256\236\346\210\230.md" "b/Day01-20/07.\345\210\206\346\224\257\345\222\214\345\276\252\347\216\257\347\273\223\346\236\204\345\256\236\346\210\230.md" index 8c1808e20..5c06f09b3 100755 --- "a/Day01-20/07.\345\210\206\346\224\257\345\222\214\345\276\252\347\216\257\347\273\223\346\236\204\345\256\236\346\210\230.md" +++ "b/Day01-20/07.\345\210\206\346\224\257\345\222\214\345\276\252\347\216\257\347\273\223\346\236\204\345\256\236\346\210\230.md" @@ -6,7 +6,7 @@ > **说明**:素数指的是只能被 1 和自身整除的正整数(不包括 1),之前我们写过判断素数的代码,这里相当于是一个升级版本。 -```Python +```python """ 输出100以内的素数 @@ -29,7 +29,7 @@ for num in range(2, 100): > **说明**:斐波那契数列(Fibonacci sequence),通常也被称作黄金分割数列,是意大利数学家莱昂纳多·斐波那契(Leonardoda Fibonacci)在《计算之书》中研究理想假设条件下兔子成长率问题而引入的数列,因此这个数列也常被戏称为“兔子数列”。斐波那契数列的特点是数列的前两个数都是 1,从第三个数开始,每个数都是它前面两个数的和。按照这个规律,斐波那契数列的前 10 个数是:`1, 1, 2, 3, 5, 8, 13, 21, 34, 55`。斐波那契数列在现代物理、准晶体结构、化学等领域都有直接的应用。 -```Python +```python """ 输出斐波那契数列中的前20个数 @@ -125,7 +125,7 @@ for x in range(0, 21): > **说明**:CRAPS又称花旗骰,是美国拉斯维加斯非常受欢迎的一种的桌上赌博游戏。该游戏使用两粒骰子,玩家通过摇两粒骰子获得点数进行游戏。简化后的规则是:玩家第一次摇骰子如果摇出了 7 点或 11 点,玩家胜;玩家第一次如果摇出 2 点、3 点或 12 点,庄家胜;玩家如果摇出其他点数则游戏继续,玩家重新摇骰子,如果玩家摇出了 7 点,庄家胜;如果玩家摇出了第一次摇的点数,玩家胜;其他点数玩家继续摇骰子,直到分出胜负。为了增加代码的趣味性,我们设定游戏开始时玩家有 1000 元的赌注,每局游戏开始之前,玩家先下注,如果玩家获胜就可以获得对应下注金额的奖励,如果庄家获胜,玩家就会输掉自己下注的金额。游戏结束的条件是玩家破产(输光所有的赌注)。 -```Python +```python """ Craps赌博游戏 diff --git "a/Day01-20/09.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\210\227\350\241\250-2.md" "b/Day01-20/09.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\210\227\350\241\250-2.md" index 59476a0a0..c28c4cc8a 100644 --- "a/Day01-20/09.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\210\227\350\241\250-2.md" +++ "b/Day01-20/09.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\210\227\350\241\250-2.md" @@ -115,7 +115,7 @@ print(nums2) 场景三: 有一个整数列表`nums1`,创建一个新的列表`nums2`,将`nums1`中大于`50`的元素放到`nums2`中。 -```Python +```python nums1 = [35, 12, 97, 64, 55] nums2 = [] for num in nums1: @@ -126,7 +126,7 @@ print(nums2) 使用列表生成式做同样的事情,代码如下所示。 -```Python +```python nums1 = [35, 12, 97, 64, 55] nums2 = [num for num in nums1 if num > 50] print(nums2) @@ -148,7 +148,7 @@ print(scores[0][1]) 如果想通过键盘输入的方式来录入5个学生3门课程的成绩并保存在列表中,可以使用如下所示的代码。 -```Python +```python scores = [] for _ in range(5): temp = [] diff --git "a/Day01-20/11.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\255\227\347\254\246\344\270\262.md" "b/Day01-20/11.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\255\227\347\254\246\344\270\262.md" index 2c5c38639..d71877425 100755 --- "a/Day01-20/11.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\255\227\347\254\246\344\270\262.md" +++ "b/Day01-20/11.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\255\227\347\254\246\344\270\262.md" @@ -130,7 +130,7 @@ print(len('goodbye, world')) # 14 #### 索引和切片 -字符串的索引和切片操作跟列表几乎区别,因为字符串也是一种有序序列,可以通过正向或反向的整数索引访问其中的元素。但是有一点需要注意,因为**字符串是不可变类型**,所以**不能通过索引运算修改字符串中的字符**。 +字符串的索引和切片操作跟列表、元组几乎没有区别,因为字符串也是一种有序序列,可以通过正向或反向的整数索引访问其中的元素。但是有一点需要注意,因为**字符串是不可变类型**,所以**不能通过索引运算修改字符串中的字符**。 ```python s = 'abc123456' diff --git "a/Day01-20/12.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\351\233\206\345\220\210.md" "b/Day01-20/12.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\351\233\206\345\220\210.md" index 967b5ef77..a4b745806 100755 --- "a/Day01-20/12.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\351\233\206\345\220\210.md" +++ "b/Day01-20/12.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\351\233\206\345\220\210.md" @@ -1,6 +1,6 @@ ## 常用数据结构之集合 -在学习了列表和元组之后,我们再来学习一种容器型的数据类型,它的名字叫集合(set)。说到集合这个词大家一定不会陌生,在数学课本上就有这个概念。如果我们**把一定范围的、确定的、可以区别的事物当作一个整体来看待**,那么这个整体就是集合,集合中的各个事物称为集合的**元素**。通常,集合需要满足以下特性: +在学习了列表和元组之后,我们再来学习一种容器型的数据类型,它的名字叫集合(set)。说到集合这个词大家一定不会陌生,在数学课本上就有这个概念。如果我们**把一定范围的、确定的、可以区别的事物当作一个整体来看待**,那么这个整体就是集合,集合中的各个事物称为集合的**元素**。通常,集合需要满足以下要求: 1. **无序性**:一个集合中,每个元素的地位都是相同的,元素之间是无序的。 2. **互异性**:一个集合中,任何两个元素都是不相同的,即元素在集合中只能出现一次。 @@ -8,7 +8,7 @@ Python 程序中的集合跟数学上的集合没有什么本质区别,需要强调的是上面所说的无序性和互异性。无序性说明集合中的元素并不像列中的元素那样存在某种次序,可以通过索引运算就能访问任意元素,**集合并不支持索引运算**。另外,集合的互异性决定了**集合中不能有重复元素**,这一点也是集合区别于列表的地方,我们无法将重复的元素添加到一个集合中。集合类型必然是支持`in`和`not in`成员运算的,这样就可以确定一个元素是否属于集合,也就是上面所说的集合的确定性。**集合的成员运算在性能上要优于列表的成员运算**,这是集合的底层存储特性决定的,此处我们暂时不做讨论,大家记住这个结论即可。 -> **说明**:集合底层使用了哈希存储(散列存储),对哈希存储感兴趣的读者可以看看维基百科上“散列表”这个词条。 +> **说明**:集合底层使用了哈希存储(散列存储),对哈希存储不了解的读者可以先看看“Hello 算法”网站对[哈希表](https://www.hello-algo.com/chapter_hashing/)的讲解,感谢作者的开源精神。 ### 创建集合 @@ -31,7 +31,9 @@ set5 = {num for num in range(1, 20) if num % 3 == 0 or num % 7 == 0} print(set5) ``` -需要提醒大家,集合中的元素必须是`hashable`类型,使用哈希存储的容器都会对元素提出这一要求。所谓`hashable`类型指的是能够计算出哈希码的数据类型,通常不可变类型都是`hashable`类型,如整数(`int`)、浮点小数(`float`)、布尔值(`bool`)、字符串(`str`)、元组(`tuple`)等。可变类型都不是`hashable`类型,因为可变类型无法计算出确定的哈希码,所以它们不能放到集合中。例如:我们不能将列表作为集合中的元素;同理,由于集合本身也是可变类型,所以集合也不能作为集合中的元素。我们可以创建出嵌套的列表,但是我们不能创建出嵌套的集合,这一点在使用集合的时候一定要引起注意。 +需要提醒大家,集合中的元素必须是`hashable`类型,所谓`hashable`类型指的是能够计算出哈希码的数据类型,通常不可变类型都是`hashable`类型,如整数(`int`)、浮点小数(`float`)、布尔值(`bool`)、字符串(`str`)、元组(`tuple`)等。可变类型都不是`hashable`类型,因为可变类型无法计算出确定的哈希码,所以它们不能放到集合中。例如:我们不能将列表作为集合中的元素;同理,由于集合本身也是可变类型,所以集合也不能作为集合中的元素。我们可以创建出嵌套列表(列表的元素也是列表),但是我们不能创建出嵌套的集合,这一点在使用集合的时候一定要引起注意。 + +> **温馨提示**:如果不理解上面提到的哈希码、哈希存储这些概念,可以先放放,因为它并不影响你继续学习和使用 Python 语言。当然,如果是计算机专业的小伙伴,不理解哈希存储是很难被原谅的,要赶紧去补课了。 ### 元素的遍历 @@ -47,7 +49,7 @@ for elem in set1: ### 集合的运算 -Python 为集合类型提供了非常丰富的运算符,主要包括:成员运算、交集运算、并集运算、差集运算、比较运算(相等性、子集、超集)等。 +Python 为集合类型提供了非常丰富的运算,主要包括:成员运算、交集运算、并集运算、差集运算、比较运算(相等性、子集、超集)等。 #### 成员运算 @@ -130,7 +132,7 @@ print(set2.issuperset(set1)) # True ### 集合的方法 -刚才我们说过,Python 中的集合是可变类型,我们可以通过集合类型的方法向集合添加元素或从集合中删除元素。 +刚才我们说过,Python 中的集合是可变类型,我们可以通过集合的方法向集合添加元素或从集合中删除元素。 ```python set1 = {1, 10, 100} @@ -151,7 +153,7 @@ set1.clear() print(set1) # set() ``` -> **说明**:删除集合元素的`remove`方法在元素不存在时会引发`KeyError`错误,所以上面的代码中我们先通过成员运算判断元素是否在集合中。集合类型还有一个`pop`方法可以从集合中随机删除一个元素,该方法在删除元素的同时会获得被删除的元素,而`remove`和`discard`方法仅仅是删除元素,不会获得被删除的元素。 +> **说明**:删除元素的`remove`方法在元素不存在时会引发`KeyError`错误,所以上面的代码中我们先通过成员运算判断元素是否在集合中。集合类型还有一个`pop`方法可以从集合中随机删除一个元素,该方法在删除元素的同时会返回(获得)被删除的元素,而`remove`和`discard`方法仅仅是删除元素,不会返回(获得)被删除的元素。 集合类型还有一个名为`isdisjoint`的方法可以判断两个集合有没有相同的元素,如果没有相同元素,该方法返回`True`,否则该方法返回`False`,代码如下所示。 diff --git "a/Day01-20/13.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\255\227\345\205\270.md" "b/Day01-20/13.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\255\227\345\205\270.md" index e548323a9..7d5849f67 100755 --- "a/Day01-20/13.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\255\227\345\205\270.md" +++ "b/Day01-20/13.\345\270\270\347\224\250\346\225\260\346\215\256\347\273\223\346\236\204\344\271\213\345\255\227\345\205\270.md" @@ -1,8 +1,8 @@ ## 常用数据结构之字典 -迄今为止,我们已经为大家介绍了Python中的三种容器型数据类型(列表、元组、集合),但是这些数据类型仍然不足以帮助我们解决所有的问题。例如,我们需要一个变量来保存一个人的多项信息,包括:姓名、年龄、身高、体重、家庭住址、本人手机号、紧急联系人手机号,此时你会发现,我们之前学过的列表、元组和集合类型都不够好使。 +迄今为止,我们已经为大家介绍了 Python 中的三种容器型数据类型(列表、元组、集合),但是这些数据类型仍然不足以帮助我们解决所有的问题。例如,我们需要一个变量来保存一个人的多项信息,包括:姓名、年龄、身高、体重、家庭住址、本人手机号、紧急联系人手机号,此时你会发现,我们之前学过的列表、元组和集合类型都不够好使。 -```Python +```python person1 = ['王大锤', 55, 168, 60, '成都市武侯区科华北路62号1栋101', '13122334455', '13800998877'] person2 = ('王大锤', 55, 168, 60, '成都市武侯区科华北路62号1栋101', '13122334455', '13800998877') person3 = {'王大锤', 55, 168, 60, '成都市武侯区科华北路62号1栋101', '13122334455', '13800998877'} @@ -20,7 +20,7 @@ Python 程序中的字典跟现实生活中的字典很像,它以键值对( Python 中创建字典可以使用`{}`字面量语法,这一点跟上一节课讲的集合是一样的。但是字典的`{}`中的元素是以键值对的形式存在的,每个元素由`:`分隔的两个值构成,`:`前面是键,`:`后面是值,代码如下所示。 -```Python +```python xinhua = { '麓': '山脚下', '路': '道,往来通行的地方;方面,地区:南~货,外~货;种类:他俩是一~人', @@ -44,7 +44,7 @@ print(person) 当然,如果愿意,我们也可以使用内置函数`dict`或者是字典的生成式语法来创建字典,代码如下所示。 -```Python +```python # dict函数(构造器)中的每一组参数就是字典中的一组键值对 person = dict(name='王大锤', age=55, height=168, weight=60, addr='成都市武侯区科华北路62号1栋101') print(person) # {'name': '王大锤', 'age': 55, 'height': 168, 'weight': 60, 'addr': '成都市武侯区科华北路62号1栋101'} @@ -60,10 +60,16 @@ items3 = {x: x ** 3 for x in range(1, 6)} print(items3) # {1: 1, 2: 8, 3: 27, 4: 64, 5: 125} ``` -想知道字典中一共有多少组键值对,仍然是使用`len`函数;如果想对字典进行遍历,可以用`for`循环,但是需要注意,`for`循环只是对字典的键进行了遍历,不过没关系,在学习了字典的索引运算后,我们可以通过字典的键获取到和这个键对应的值。 +想知道字典中一共有多少组键值对,仍然是使用`len`函数;如果想对字典进行遍历,可以用`for`循环,但是需要注意,`for`循环只是对字典的键进行了遍历,不过没关系,在学习了字典的索引运算后,我们可以通过字典的键访问它对应的值。 -```Python -person = {'name': '王大锤', 'age': 55, 'height': 168, 'weight': 60, 'addr': '成都市武侯区科华北路62号1栋101'} +```python +person = { + 'name': '王大锤', + 'age': 55, + 'height': 168, + 'weight': 60, + 'addr': '成都市武侯区科华北路62号1栋101' +} print(len(person)) # 5 for key in person: print(key) @@ -71,9 +77,30 @@ for key in person: ### 字典的运算 -对于字典类型来说,成员运算和索引运算肯定是很重要的,前者可以判定指定的键在不在字典中,后者可以通过键获取对应的值或者向字典中添加新的键值对。值得注意的是,字典的索引不同于列表的索引,列表中的元素因为有属于自己有序号,所以列表的索引是一个整数;字典中因为保存的是键值对,所以字典需要用键去索引对应的值。需要**特别提醒**大家注意的是,**字典中的键必须是不可变类型**,例如整数(`int`)、浮点数(`float`)、字符串(`str`)、元组(`tuple`)等类型,这一点跟集合类型对元素的要求是一样的;很显然,之前我们讲的列表(`list`)和集合(`set`)不能作为字典中的键,字典类型本身也不能再作为字典中的键,因为字典也是可变类型,但是字典可以作为字典中的值。大家可以先看看下面的代码,了解一下字典的成员运算和索引运算。 +对于字典类型来说,成员运算和索引运算肯定是很重要的,前者可以判定指定的键在不在字典中,后者可以通过键访问对应的值或者向字典中添加新的键值对。值得注意的是,字典的索引不同于列表的索引,列表中的元素因为有属于自己有序号,所以列表的索引是一个整数;字典中因为保存的是键值对,所以字典需要用键去索引对应的值。需要**特别提醒**大家注意的是,**字典中的键必须是不可变类型**,例如整数(`int`)、浮点数(`float`)、字符串(`str`)、元组(`tuple`)等类型,这一点跟集合类型对元素的要求是一样的;很显然,之前我们讲的列表(`list`)和集合(`set`)不能作为字典中的键,字典类型本身也不能再作为字典中的键,因为字典也是可变类型,但是列表、集合、字典都可以作为字典中的值,例如: -```Python +```python +person = { + 'name': '王大锤', + 'age': 55, + 'height': 168, + 'weight': 60, + 'addr': ['成都市武侯区科华北路62号1栋101', '北京市西城区百万庄大街1号'], + 'car': { + 'brand': 'BMW X7', + 'maxSpeed': '250', + 'length': 5170, + 'width': 2000, + 'height': 1835, + 'displacement': 3.0 + } +} +print(person) +``` + +大家可以看看下面的代码,了解一下字典的成员运算和索引运算。 + +```python person = {'name': '王大锤', 'age': 55, 'height': 168, 'weight': 60, 'addr': '成都市武侯区科华北路62号1栋101'} # 成员运算 @@ -118,7 +145,7 @@ for key, value in person.items(): print(f'{key}:\t{value}') ``` -字典的`update`方法会用一个字典更新另一个字典中的键值对。例如,有两个字典`x`和`y`,当执行`x.update(y)`操作时,`x`跟`y`相同的键对应的值会`y`中的值被更新,而`y`中有但`x`中没有的键值对会直接添加到`x`中,代码如下所示。 +字典的`update`方法实现两个字典的合并操作。例如,有两个字典`x`和`y`,当执行`x.update(y)`操作时,`x`跟`y`相同的键对应的值会被`y`中的值更新,而`y`中有但`x`中没有的键值对会直接添加到`x`中,代码如下所示。 ```python person1 = {'name': '王大锤', 'age': 55, 'height': 178} @@ -127,7 +154,16 @@ person1.update(person2) print(person1) # {'name': '王大锤', 'age': 25, 'height': 178, 'addr': '成都市武侯区科华北路62号1栋101'} ``` -可以通过`pop`或`popitem`方法从字典中删除元素,前者会返回键对应的值,但是如果字典中不存在指定的键,会引发`KeyError`错误;后者在删除元素时,会返回键和值组成的二元组。字典的`clear`方法会清空字典中所有的键值对,代码如下所示。 +如果使用 Python 3.9 及以上的版本,也可以使用`|`运算符来完成同样的操作,代码如下所示。 + +```python +person1 = {'name': '王大锤', 'age': 55, 'height': 178} +person2 = {'age': 25, 'addr': '成都市武侯区科华北路62号1栋101'} +person1 |= person2 +print(person1) # {'name': '王大锤', 'age': 25, 'height': 178, 'addr': '成都市武侯区科华北路62号1栋101'} +``` + +可以通过`pop`或`popitem`方法从字典中删除元素,前者会返回(获得)键对应的值,但是如果字典中不存在指定的键,会引发`KeyError`错误;后者在删除元素时,会返回(获得)键和值组成的二元组。字典的`clear`方法会清空字典中所有的键值对,代码如下所示。 ```python person = {'name': '王大锤', 'age': 25, 'height': 178, 'addr': '成都市武侯区科华北路62号1栋101'} @@ -141,7 +177,7 @@ print(person) # {} 跟列表一样,从字典中删除元素也可以使用`del`关键字,在删除元素的时候如果指定的键索引不到对应的值,一样会引发`KeyError`错误,具体的做法如下所示。 -```Python +```python person = {'name': '王大锤', 'age': 25, 'height': 178, 'addr': '成都市武侯区科华北路62号1栋101'} del person['age'] del person['addr'] @@ -154,7 +190,7 @@ print(person) # {'name': '王大锤', 'height': 178} **例子1**:输入一段话,统计每个英文字母出现的次数,按出现次数从高到低输出。 -```Python +```python sentence = input('请输入一段话: ') counter = {} for ch in sentence: @@ -204,7 +240,7 @@ x 出现了 1 次. > **说明**:可以用字典的生成式语法来创建这个新字典。 -```Python +```python stocks = { 'AAPL': 191.88, 'GOOG': 1186.96, @@ -226,4 +262,4 @@ print(stocks2) ### 总结 -Python 程序中的字典跟现实生活中字典非常像,允许我们**以键值对的形式保存数据**,再**通过键索引对应的值**。这是一种非常**有利于数据检索**的数据类型。再次提醒大家注意,**字典中的键必须是不可变类型**,字典中的值可以是任意类型。 +Python 程序中的字典跟现实生活中字典非常像,允许我们**以键值对的形式保存数据**,再**通过键访问对应的值**。字典是一种非常**有利于数据检索**的数据类型,但是需要再次提醒大家,**字典中的键必须是不可变类型**,列表、集合、字典等类型的数据都不能作为字典的键。 diff --git "a/Day01-20/16.\345\207\275\346\225\260\344\275\277\347\224\250\350\277\233\351\230\266.md" "b/Day01-20/16.\345\207\275\346\225\260\344\275\277\347\224\250\350\277\233\351\230\266.md" index 66fb21438..692144d66 100755 --- "a/Day01-20/16.\345\207\275\346\225\260\344\275\277\347\224\250\350\277\233\351\230\266.md" +++ "b/Day01-20/16.\345\207\275\346\225\260\344\275\277\347\224\250\350\277\233\351\230\266.md" @@ -6,7 +6,7 @@ 我们回到之前讲过的一个例子,设计一个函数,传入任意多个参数,对其中`int`类型或`float`类型的元素实现求和操作。我们对之前的代码稍作调整,让整个代码更加紧凑一些,如下所示。 -```Python +```python def calc(*args, **kwargs): items = list(args) + list(kwargs.values()) result = 0 @@ -18,7 +18,7 @@ def calc(*args, **kwargs): 如果我们希望上面的`calc`函数不仅仅可以做多个参数的求和,还可以实现更多的甚至是自定义的二元运算,我们该怎么做呢?上面的代码只能求和是因为函数中使用了`+=`运算符,这使得函数跟加法运算形成了耦合关系,如果能解除这种耦合关系,函数的通用性和灵活性就会更好。解除耦合的办法就是将`+`运算符变成函数调用,并将其设计为函数的参数,代码如下所示。 -```Python +```python def calc(init_value, op_func, *args, **kwargs): items = list(args) + list(kwargs.values()) result = init_value @@ -55,7 +55,7 @@ print(calc(1, mul, 1, 2, 3, 4, 5)) # 120 如果我们没有提前定义好`add`和`mul`函数,也可以使用 Python 标准库中的`operator`模块提供的`add`和`mul`函数,它们分别代表了做加法和做乘法的二元运算,我们拿过来直接使用即可,代码如下所示。 -```Python +```python import operator print(calc(0, operator.add, 1, 2, 3, 4, 5)) # 15 @@ -64,7 +64,7 @@ print(calc(1, operator.mul, 1, 2, 3, 4, 5)) # 120 Python 内置函数中有不少高阶函数,我们前面提到过的`filter`和`map`函数就是高阶函数,前者可以实现对序列中元素的过滤,后者可以实现对序列中元素的映射,例如我们要去掉一个整数列表中的奇数,并对所有的偶数求平方得到一个新的列表,就可以直接使用这两个函数来做到,具体的做法是如下所示。 -```Python +```python def is_even(num): """判断num是不是偶数""" return num % 2 == 0 @@ -82,7 +82,7 @@ print(new_nums) # [144, 64, 3600, 2704] 当然,要完成上面代码的功能,也可以使用列表生成式,列表生成式的做法更为简单优雅。 -```Python +```python old_nums = [35, 12, 8, 99, 60, 52] new_nums = [num ** 2 for num in old_nums if num % 2 == 0] print(new_nums) # [144, 64, 3600, 2704] @@ -110,7 +110,7 @@ print(new_strings) # ['in', 'zoo', 'pear', 'apple', 'waxberry'] 在使用高阶函数的时候,如果作为参数或者返回值的函数本身非常简单,一行代码就能够完成,也不需要考虑对函数的复用,那么我们可以使用 lambda 函数。Python 中的 lambda 函数是没有的名字函数,所以很多人也把它叫做**匿名函数**,lambda 函数只能有一行代码,代码中的表达式产生的运算结果就是这个匿名函数的返回值。之前的代码中,我们写的`is_even`和`square`函数都只有一行代码,我们可以考虑用 lambda 函数来替换掉它们,代码如下所示。 -```Python +```python old_nums = [35, 12, 8, 99, 60, 52] new_nums = list(map(lambda x: x ** 2, filter(lambda x: x % 2 == 0, old_nums))) print(new_nums) # [144, 64, 3600, 2704] @@ -120,7 +120,7 @@ print(new_nums) # [144, 64, 3600, 2704] 前面我们说过,Python 中的函数是“一等函数”,函数是可以直接赋值给变量的。在学习了 lambda 函数之后,前面我们写过的一些函数就可以用一行代码来实现它们了,大家可以看看能否理解下面的求阶乘和判断素数的函数。 -```Python +```python import functools import operator diff --git "a/Day01-20/17.\345\207\275\346\225\260\351\253\230\347\272\247\345\272\224\347\224\250.md" "b/Day01-20/17.\345\207\275\346\225\260\351\253\230\347\272\247\345\272\224\347\224\250.md" index 6abd0f50b..0a5fbcbb7 100755 --- "a/Day01-20/17.\345\207\275\346\225\260\351\253\230\347\272\247\345\272\224\347\224\250.md" +++ "b/Day01-20/17.\345\207\275\346\225\260\351\253\230\347\272\247\345\272\224\347\224\250.md" @@ -6,7 +6,7 @@ Python 语言中,装饰器是“**用一个函数装饰另外一个函数并为其提供额外的能力**”的语法现象。装饰器本身是一个函数,它的参数是被装饰的函数,它的返回值是一个带有装饰功能的函数。通过前面的描述,相信大家已经听出来了,装饰器是一个高阶函数,它的参数和返回值都是函数。但是,装饰器的概念对编程语言的初学者来说,还是让人头疼的,下面我们先通过一个简单的例子来说明装饰器的作用。假设有名为`downlaod`和`upload`的两个函数,分别用于文件的上传和下载,如下所示。 -```Python +```python import random import time @@ -33,7 +33,7 @@ upload('Python从入门到住院.pdf') 现在有一个新的需求,我们希望知道调用`download`和`upload`函数上传下载文件到底用了多少时间,这应该如何实现呢?相信很多小伙伴已经想到了,我们可以在函数开始执行的时候记录一个时间,在函数调用结束后记录一个时间,两个时间相减就可以计算出下载或上传的时间,代码如下所示。 -```Python +```python start = time.time() download('MySQL从删库到跑路.avi') end = time.time() @@ -62,7 +62,7 @@ def record_time(func): 看懂这个结构后,我们就可以把记录时间的功能写到这个装饰器中,代码如下所示。 -```Python +```python import time @@ -85,7 +85,7 @@ def record_time(func): 写装饰器虽然颇费周折,但是这是个一劳永逸的骚操作,将来再有记录函数执行时间的需求时,我们只需要添加上面的装饰器即可。使用上面的装饰器函数有两种方式,第一种方式就是直接调用装饰器函数,传入被装饰的函数并获得返回值,我们可以用这个返回值直接替代原来的函数,那么在调用时就已经获得了装饰器提供的额外的能力(记录执行时间),大家试试下面的代码就明白了。 -```Python +```python download = record_time(download) upload = record_time(upload) download('MySQL从删库到跑路.avi') @@ -94,7 +94,7 @@ upload('Python从入门到住院.pdf') 在 Python 中,使用装饰器很有更为便捷的**语法糖**(编程语言中添加的某种语法,这种语法对语言的功能没有影响,但是使用更加方法,代码的可读性也更强,我们将其称之为“语法糖”或“糖衣语法”),可以用`@装饰器函数`将装饰器函数直接放在被装饰的函数上,效果跟上面的代码相同。我们把完整的代码为大家罗列出来,大家可以再看看我们是如何定义和使用装饰器的。 -```Python +```python import random import time @@ -133,7 +133,7 @@ upload('Python从入门到住院.pdf') 如果在代码的某些地方,我们想去掉装饰器的作用执行原函数,那么在定义装饰器函数的时候,需要做一点点额外的工作。Python 标准库`functools`模块的`wraps`函数也是一个装饰器,我们将它放在`wrapper`函数上,这个装饰器可以帮我们保留被装饰之前的函数,这样在需要取消装饰器时,可以通过被装饰函数的`__wrapped__`属性获得被装饰之前的函数。 -```Python +```python import random import time @@ -181,7 +181,7 @@ upload.__wrapped__('Python从新手到大师.pdf') Python 中允许函数嵌套定义,也允许函数之间相互调用,而且一个函数还可以直接或间接的调用自身。函数自己调用自己称为递归调用,那么递归调用有什么用处呢?现实中,有很多问题的定义本身就是一个递归定义,例如我们之前讲到的阶乘,非负整数`N`的阶乘是`N`乘以`N-1`的阶乘,即 $\small{N! = N \times (N-1)!}$ ,定义的左边和右边都出现了阶乘的概念,所以这是一个递归定义。既然如此,我们可以使用递归调用的方式来写一个求阶乘的函数,代码如下所示。 -```Python +```python def fac(num): if num in (0, 1): return 1 @@ -190,7 +190,7 @@ def fac(num): 上面的代码中,`fac`函数中又调用了`fac`函数,这就是所谓的递归调用。代码第2行的`if`条件叫做递归的收敛条件,简单的说就是什么时候要结束函数的递归调用,在计算阶乘时,如果计算到`0`或`1`的阶乘,就停止递归调用,直接返回`1`;代码第4行的`num * fac(num - 1)`是递归公式,也就是阶乘的递归定义。下面,我们简单的分析下,如果用`fac(5)`计算`5`的阶乘,整个过程会是怎样的。 -```Python +```python # 递归调用函数入栈 # 5 * fac(4) # 5 * (4 * fac(3)) @@ -211,7 +211,7 @@ print(fac(5)) # 120 再举一个之前讲过的生成斐波那契数列的例子,因为斐波那契数列前两个数都是`1`,从第三个数开始,每个数是前两个数相加的和,可以记为`f(n) = f(n - 1) + f(n - 2)`,很显然这又是一个递归的定义,所以我们可以用下面的递归调用函数来计算第​`n`个斐波那契数。 -```Python +```python def fib1(n): if n in (1, 2): return 1 @@ -224,7 +224,7 @@ for i in range(1, 21): 需要提醒大家,上面计算斐波那契数的代码虽然看起来非常简单明了,但执行性能是比较糟糕的。大家可以试一试,把上面代码`for`循环中`range`函数的第二个参数修改为`51`,即输出前50个斐波那契数,看看需要多长时间,也欢迎大家在评论区留下你的代码执行时间。至于为什么这么慢,大家可以自己思考一下原因。很显然,直接使用循环递推的方式获得斐波那契数列是更好的选择,代码如下所示。 -```Python +```python def fib2(n): a, b = 0, 1 for _ in range(n): diff --git "a/Day01-20/18.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\345\205\245\351\227\250.md" "b/Day01-20/18.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\345\205\245\351\227\250.md" index 59e9e7b28..365fa94a8 100755 --- "a/Day01-20/18.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\345\205\245\351\227\250.md" +++ "b/Day01-20/18.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\345\205\245\351\227\250.md" @@ -28,7 +28,7 @@ 在 Python 语言中,我们可以使用`class`关键字加上类名来定义类,通过缩进我们可以确定类的代码块,就如同定义函数那样。在类的代码块中,我们需要写一些函数,我们说过类是一个抽象概念,那么这些函数就是我们对一类对象共同的动态特征的提取。写在类里面的函数我们通常称之为**方法**,方法就是对象的行为,也就是对象可以接收的消息。方法的第一个参数通常都是`self`,它代表了接收这个消息的对象本身。 -```Python +```python class Student: def study(self, course_name): @@ -42,7 +42,7 @@ class Student: 在我们定义好一个类之后,可以使用构造器语法来创建对象,代码如下所示。 -```Python +```python stu1 = Student() stu2 = Student() print(stu1) # <__main__.Student object at 0x10ad5ac50> @@ -54,7 +54,7 @@ print(hex(id(stu1)), hex(id(stu2))) # 0x10ad5ac50 0x10ad5acd0 接下来,我们尝试给对象发消息,即调用对象的方法。刚才的`Student`类中我们定义了`study`和`play`两个方法,两个方法的第一个参数`self`代表了接收消息的学生对象,`study`方法的第二个参数是学习的课程名称。Python中,给对象发消息有两种方式,请看下面的代码。 -```Python +```python # 通过“类.方法”调用方法 # 第一个参数是接收消息的对象 # 第二个参数是学习的课程名称 @@ -74,7 +74,7 @@ stu2.play() # 学生正在玩游戏. 我们对上面的`Student`类稍作修改,给学生对象添加`name`(姓名)和`age`(年龄)两个属性。 -```Python +```python class Student: """学生""" @@ -94,7 +94,7 @@ class Student: 修改刚才创建对象和给对象发消息的代码,重新执行一次,看看程序的执行结果有什么变化。 -```Python +```python # 调用Student类的构造器创建对象并传入初始化参数 stu1 = Student('骆昊', 44) stu2 = Student('王大锤', 25) @@ -117,7 +117,7 @@ stu2.play() # 王大锤正在玩游戏. > **要求**:定义一个类描述数字时钟,提供走字和显示时间的功能。 -```Python +```python import time @@ -167,7 +167,7 @@ while True: > **要求**:定义一个类描述平面上的点,提供计算到另一个点距离的方法。 -```Python +```python class Point: """平面上的点""" diff --git "a/Day01-20/19.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\350\277\233\351\230\266.md" "b/Day01-20/19.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\350\277\233\351\230\266.md" index 7ba887df0..c6b540f03 100755 --- "a/Day01-20/19.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\350\277\233\351\230\266.md" +++ "b/Day01-20/19.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\350\277\233\351\230\266.md" @@ -6,7 +6,7 @@ 在很多面向对象编程语言中,对象的属性通常会被设置为私有(private)或受保护(protected)的成员,简单的说就是不允许直接访问这些属性;对象的方法通常都是公开的(public),因为公开的方法是对象能够接受的消息,也是对象暴露给外界的调用接口,这就是所谓的访问可见性。在 Python 中,可以通过给对象属性名添加前缀下划线的方式来说明属性的访问可见性,例如,可以用`__name`表示一个私有属性,`_name`表示一个受保护属性,代码如下所示。 -```Python +```python class Student: def __init__(self, name, age): @@ -30,7 +30,7 @@ Python 语言属于动态语言,维基百科对动态语言的解释是:“ 在 Python 中,我们可以动态为对象添加属性,这是 Python 作为动态类型语言的一项特权,代码如下所示。需要提醒大家的是,对象的方法其实本质上也是对象的属性,如果给对象发送一个无法接收的消息,引发的异常仍然是`AttributeError`。 -```Python +```python class Student: def __init__(self, name, age): @@ -44,7 +44,7 @@ stu.sex = '男' # 给学生对象动态添加sex属性 如果不希望在使用对象时动态的为对象添加属性,可以使用 Python 语言中的`__slots__`魔法。对于`Student`类来说,可以在类中指定`__slots__ = ('name', 'age')`,这样`Student`类的对象只能有`name`和`age`属性,如果想动态添加其他属性将会引发异常,代码如下所示。 -```Python +```python class Student: __slots__ = ('name', 'age') @@ -64,7 +64,7 @@ stu.sex = '男' 举一个例子,定义一个三角形类,通过传入三条边的长度来构造三角形,并提供计算周长和面积的方法。计算周长和面积肯定是三角形对象的方法,这一点毫无疑问。但是在创建三角形对象时,传入的三条边长未必能构造出三角形,为此我们可以先写一个方法来验证给定的三条边长是否可以构成三角形,这种方法很显然就不是对象方法,因为在调用这个方法时三角形对象还没有创建出来。我们可以把这类方法设计为静态方法或类方法,也就是说这类方法不是发送给三角形对象的消息,而是发送给三角形类的消息,代码如下所示。 -```Python +```python class Triangle(object): """三角形""" @@ -134,7 +134,7 @@ print(f'面积: {t.area}') 面向对象的编程语言支持在已有类的基础上创建新类,从而减少重复代码的编写。提供继承信息的类叫做父类(超类、基类),得到继承信息的类叫做子类(派生类、衍生类)。例如,我们定义一个学生类和一个老师类,我们会发现他们有大量的重复代码,而这些重复代码都是老师和学生作为人的公共属性和行为,所以在这种情况下,我们应该先定义人类,再通过继承,从人类派生出老师类和学生类,代码如下所示。 -```Python +```python class Person: """人""" diff --git "a/Day01-20/20.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\345\272\224\347\224\250.md" "b/Day01-20/20.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\345\272\224\347\224\250.md" index 9fa08a51a..00fa3faae 100755 --- "a/Day01-20/20.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\345\272\224\347\224\250.md" +++ "b/Day01-20/20.\351\235\242\345\220\221\345\257\271\350\261\241\347\274\226\347\250\213\345\272\224\347\224\250.md" @@ -10,7 +10,7 @@ 牌的属性显而易见,有花色和点数。我们可以用 0 到 3 的四个数字来代表四种不同的花色,但是这样的代码可读性会非常糟糕,因为我们并不知道黑桃、红心、草花、方块跟 0 到 3 的数字的对应关系。如果一个变量的取值只有有限多个选项,我们可以使用枚举。与 C、Java 等语言不同的是,Python 中没有声明枚举类型的关键字,但是可以通过继承`enum`模块的`Enum`类来创建枚举类型,代码如下所示。 -```Python +```python from enum import Enum @@ -21,14 +21,14 @@ class Suite(Enum): 通过上面的代码可以看出,定义枚举类型其实就是定义符号常量,如`SPADE`、`HEART`等。每个符号常量都有与之对应的值,这样表示黑桃就可以不用数字 0,而是用`Suite.SPADE`;同理,表示方块可以不用数字 3, 而是用`Suite.DIAMOND`。注意,使用符号常量肯定是优于使用字面常量的,因为能够读懂英文就能理解符号常量的含义,代码的可读性会提升很多。Python 中的枚举类型是可迭代类型,简单的说就是可以将枚举类型放到`for-in`循环中,依次取出每一个符号常量及其对应的值,如下所示。 -```Python +```python for suite in Suite: print(f'{suite}: {suite.value}') ``` 接下来我们可以定义牌类。 -```Python +```python class Card: """牌""" @@ -44,7 +44,7 @@ class Card: 可以通过下面的代码来测试下`Card`类。 -```Python +```python card1 = Card(Suite.SPADE, 5) card2 = Card(Suite.HEART, 13) print(card1) # ♠5 @@ -53,7 +53,7 @@ print(card2) # ♥K 接下来我们定义扑克类。 -```Python +```python import random @@ -85,7 +85,7 @@ class Poker: 可以通过下面的代码来测试下`Poker`类。 -```Python +```python poker = Poker() print(poker.cards) # 洗牌前的牌 poker.shuffle() @@ -94,7 +94,7 @@ print(poker.cards) # 洗牌后的牌 定义玩家类。 -```Python +```python class Player: """玩家""" @@ -113,7 +113,7 @@ class Player: 创建四个玩家并将牌发到玩家的手上。 -```Python +```python poker = Poker() poker.shuffle() players = [Player('东邪'), Player('西毒'), Player('南帝'), Player('北丐')] @@ -134,7 +134,7 @@ for player in players: 修改后的`Card`类代码如下所示。 -```Python +```python class Card: """牌""" @@ -161,7 +161,7 @@ class Card: 通过对上述需求的分析,可以看出部门经理、程序员、销售员都是员工,有相同的属性和行为,那么我们可以先设计一个名为`Employee`的父类,再通过继承的方式从这个父类派生出部门经理、程序员和销售员三个子类。很显然,后续的代码不会创建`Employee` 类的对象,因为我们需要的是具体的员工对象,所以这个类可以设计成专门用于继承的抽象类。Python 语言中没有定义抽象类的关键字,但是可以通过`abc`模块中名为`ABCMeta` 的元类来定义抽象类。关于元类的概念此处不展开讲解,当然大家不用纠结,照做即可。 -```Python +```python from abc import ABCMeta, abstractmethod @@ -179,7 +179,7 @@ class Employee(metaclass=ABCMeta): 在上面的员工类中,有一个名为`get_salary`的方法用于结算月薪,但是由于还没有确定是哪一类员工,所以结算月薪虽然是员工的公共行为但这里却没有办法实现。对于暂时无法实现的方法,我们可以使用`abstractmethod`装饰器将其声明为抽象方法,所谓**抽象方法就是只有声明没有实现的方法**,**声明这个方法是为了让子类去重写这个方法**。接下来的代码展示了如何从员工类派生出部门经理、程序员、销售员这三个子类以及子类如何重写父类的抽象方法。 -```Python +```python class Manager(Employee): """部门经理""" @@ -213,7 +213,7 @@ class Salesman(Employee): 我们通过下面的代码来完成这个工资结算系统,由于程序员和销售员需要分别录入本月的工作时间和销售额,所以在下面的代码中我们使用了 Python 内置的`isinstance`函数来判断员工对象的类型。我们之前讲过的`type`函数也能识别对象的类型,但是`isinstance`函数更加强大,因为它可以判断出一个对象是不是某个继承结构下的子类型,你可以简单的理解为`type`函数是对对象类型的精准匹配,而`isinstance`函数是对对象类型的模糊匹配。 -```Python +```python emps = [Manager('刘备'), Programmer('诸葛亮'), Manager('曹操'), Programmer('荀彧'), Salesman('张辽')] for emp in emps: if isinstance(emp, Programmer): diff --git "a/Day21-30/21.\346\226\207\344\273\266\350\257\273\345\206\231\345\222\214\345\274\202\345\270\270\345\244\204\347\220\206.md" "b/Day21-30/21.\346\226\207\344\273\266\350\257\273\345\206\231\345\222\214\345\274\202\345\270\270\345\244\204\347\220\206.md" index fb51ef340..4d46f941f 100755 --- "a/Day21-30/21.\346\226\207\344\273\266\350\257\273\345\206\231\345\222\214\345\274\202\345\270\270\345\244\204\347\220\206.md" +++ "b/Day21-30/21.\346\226\207\344\273\266\350\257\273\345\206\231\345\222\214\345\274\202\345\270\270\345\244\204\347\220\206.md" @@ -32,7 +32,7 @@ 下面的例子演示了如何读取一个纯文本文件(一般指只有字符原生编码构成的文件,与富文本相比,纯文本不包含字符样式的控制元素,能够被最简单的文本编辑器直接读取)。 -```Python +```python file = open('致橡树.txt', 'r', encoding='utf-8') print(file.read()) file.close() @@ -42,7 +42,7 @@ file.close() 除了使用文件对象的`read`方法读取文件之外,还可以使用`for-in`循环逐行读取或者用`readlines`方法将文件按行读取到一个列表容器中,代码如下所示。 -```Python +```python file = open('致橡树.txt', 'r', encoding='utf-8') for line in file: print(line, end='') @@ -57,7 +57,7 @@ file.close() 如果要向文件中写入内容,可以在打开文件时使用`w`或者`a`作为操作模式,前者会截断之前的文本内容写入新的内容,后者是在原来内容的尾部追加新的内容。 -```Python +```python file = open('致橡树.txt', 'a', encoding='utf-8') file.write('\n标题:《致橡树》') file.write('\n作者:舒婷') @@ -69,7 +69,7 @@ file.close() 请注意上面的代码,如果`open`函数指定的文件并不存在或者无法打开,那么将引发异常状况导致程序崩溃。为了让代码具有健壮性和容错性,我们可以**使用Python的异常机制对可能在运行时发生状况的代码进行适当的处理**。Python中和异常相关的关键字有五个,分别是`try`、`except`、`else`、`finally`和`raise`,我们先看看下面的代码,再来为大家介绍这些关键字的用法。 -```Python +```python file = None try: file = open('致橡树.txt', 'r', encoding='utf-8') @@ -160,7 +160,7 @@ BaseException 在Python中,可以使用`raise`关键字来引发异常(抛出异常对象),而调用者可以通过`try...except...`结构来捕获并处理异常。例如在函数中,当函数的执行条件不满足时,可以使用抛出异常的方式来告知调用者问题的所在,而调用者可以通过捕获处理异常来使得代码从异常中恢复,定义异常和抛出异常的代码如下所示。 -```Python +```python class InputError(ValueError): """自定义异常类型""" pass @@ -177,7 +177,7 @@ def fac(num): 调用求阶乘的函数`fac`,通过`try...except...`结构捕获输入错误的异常并打印异常对象(显示异常信息),如果输入正确就计算阶乘并结束程序。 -```Python +```python flag = True while flag: num = int(input('n = ')) @@ -194,7 +194,7 @@ while flag: 用`with`上下文语法改写后的代码如下所示。 -```Python +```python try: with open('致橡树.txt', 'r', encoding='utf-8') as file: print(file.read()) @@ -210,7 +210,7 @@ except UnicodeDecodeError: 读写二进制文件跟读写文本文件的操作类似,但是需要注意,在使用`open`函数打开文件时,如果要进行读操作,操作模式是`'rb'`,如果要进行写操作,操作模式是`'wb'`。还有一点,读写文本文件时,`read`方法的返回值以及`write`方法的参数是`str`对象(字符串),而读写二进制文件时,`read`方法的返回值以及`write`方法的参数是`bytes-like`对象(字节串)。下面的代码实现了将当前路径下名为`guido.jpg`的图片文件复制到`吉多.jpg`文件中的操作。 -```Python +```python try: with open('guido.jpg', 'rb') as file1: data = file1.read() @@ -225,7 +225,7 @@ print('程序执行结束.') 如果要复制的图片文件很大,一次将文件内容直接读入内存中可能会造成非常大的内存开销,为了减少对内存的占用,可以为`read`方法传入`size`参数来指定每次读取的字节数,通过循环读取和写入的方式来完成上面的操作,代码如下所示。 -```Python +```python try: with open('guido.jpg', 'rb') as file1, open('吉多.jpg', 'wb') as file2: data = file1.read(512) diff --git "a/Day21-30/22.\345\257\271\350\261\241\347\232\204\345\272\217\345\210\227\345\214\226\345\222\214\345\217\215\345\272\217\345\210\227\345\214\226.md" "b/Day21-30/22.\345\257\271\350\261\241\347\232\204\345\272\217\345\210\227\345\214\226\345\222\214\345\217\215\345\272\217\345\210\227\345\214\226.md" index 336547545..39658ec7b 100755 --- "a/Day21-30/22.\345\257\271\350\261\241\347\232\204\345\272\217\345\210\227\345\214\226\345\222\214\345\217\215\345\272\217\345\210\227\345\214\226.md" +++ "b/Day21-30/22.\345\257\271\350\261\241\347\232\204\345\272\217\345\210\227\345\214\226\345\222\214\345\217\215\345\272\217\345\210\227\345\214\226.md" @@ -67,7 +67,7 @@ let obj = { 在Python中,如果要将字典处理成JSON格式(以字符串形式存在),可以使用`json`模块的`dumps`函数,代码如下所示。 -```Python +```python import json my_dict = { @@ -91,7 +91,7 @@ print(json.dumps(my_dict)) 如果要将字典处理成JSON格式并写入文本文件,只需要将`dumps`函数换成`dump`函数并传入文件对象即可,代码如下所示。 -```Python +```python import json my_dict = { @@ -121,7 +121,7 @@ with open('data.json', 'w') as file: 我们可以通过下面的代码,读取上面创建的`data.json`文件,将JSON格式的数据还原成Python中的字典。 -```Python +```python import json with open('data.json', 'r') as file: @@ -215,7 +215,7 @@ pip install requests 获取国内新闻并显示新闻标题和链接。 -```Python +```python import requests resp = requests.get('http://api.tianapi.com/guonei/?key=APIKey&num=10') diff --git "a/Day21-30/23.Python\350\257\273\345\206\231CSV\346\226\207\344\273\266.md" "b/Day21-30/23.Python\350\257\273\345\206\231CSV\346\226\207\344\273\266.md" index d012482d3..ce639830c 100755 --- "a/Day21-30/23.Python\350\257\273\345\206\231CSV\346\226\207\344\273\266.md" +++ "b/Day21-30/23.Python\350\257\273\345\206\231CSV\346\226\207\344\273\266.md" @@ -17,7 +17,7 @@ CSV文件可以使用文本编辑器或类似于Excel电子表格这类工具打 现有五个学生三门课程的考试成绩需要保存到一个CSV文件中,要达成这个目标,可以使用Python标准库中的`csv`模块,该模块的`writer`函数会返回一个`csvwriter`对象,通过该对象的`writerow`或`writerows`方法就可以将数据写入到CSV文件中,具体的代码如下所示。 -```Python +```python import csv import random @@ -44,7 +44,7 @@ with open('scores.csv', 'w') as file: 需要说明的是上面的`writer`函数,除了传入要写入数据的文件对象外,还可以`dialect`参数,它表示CSV文件的方言,默认值是`excel`。除此之外,还可以通过`delimiter`、`quotechar`、`quoting`参数来指定分隔符(默认是逗号)、包围值的字符(默认是双引号)以及包围的方式。其中,包围值的字符主要用于当字段中有特殊符号时,通过添加包围值的字符可以避免二义性。大家可以尝试将上面第5行代码修改为下面的代码,然后查看生成的CSV文件。 -```Python +```python writer = csv.writer(file, delimiter='|', quoting=csv.QUOTE_ALL) ``` @@ -63,7 +63,7 @@ writer = csv.writer(file, delimiter='|', quoting=csv.QUOTE_ALL) 如果要读取刚才创建的CSV文件,可以使用下面的代码,通过`csv`模块的`reader`函数可以创建出`csvreader`对象,该对象是一个迭代器,可以通过`next`函数或`for-in`循环读取到文件中的数据。 -```Python +```python import csv with open('scores.csv', 'r') as file: diff --git "a/Day21-30/24.Python\350\257\273\345\206\231Excel\346\226\207\344\273\266-1.md" "b/Day21-30/24.Python\350\257\273\345\206\231Excel\346\226\207\344\273\266-1.md" index 5030aae44..6ddbc4199 100755 --- "a/Day21-30/24.Python\350\257\273\345\206\231Excel\346\226\207\344\273\266-1.md" +++ "b/Day21-30/24.Python\350\257\273\345\206\231Excel\346\226\207\344\273\266-1.md" @@ -16,7 +16,7 @@ pip install xlwt xlrd xlutils 例如在当前文件夹下有一个名为“阿里巴巴2020年股票数据.xls”的 Excel 文件,如果想读取并显示该文件的内容,可以通过如下所示的代码来完成。 -```Python +```python import xlrd # 使用xlrd模块的open_workbook函数打开指定Excel文件并获得Book对象(工作簿) @@ -65,7 +65,7 @@ print(sheet.row_slice(3, 0, 5)) 写入 Excel 文件可以通过`xlwt` 模块的`Workbook`类创建工作簿对象,通过工作簿对象的`add_sheet`方法可以添加工作表,通过工作表对象的`write`方法可以向指定单元格中写入数据,最后通过工作簿对象的`save`方法将工作簿写入到指定的文件或内存中。下面的代码实现了将5 个学生 3 门课程的考试成绩写入 Excel 文件的操作。 -```Python +```python import random import xlwt @@ -93,7 +93,7 @@ wb.save('考试成绩表.xls') 在写Excel文件时,我们还可以为单元格设置样式,主要包括字体(Font)、对齐方式(Alignment)、边框(Border)和背景(Background)的设置,`xlwt`对这几项设置都封装了对应的类来支持。要设置单元格样式需要首先创建一个`XFStyle`对象,再通过该对象的属性对字体、对齐方式、边框等进行设定,例如在上面的例子中,如果希望将表头单元格的背景色修改为黄色,可以按照如下的方式进行操作。 -```Python +```python header_style = xlwt.XFStyle() pattern = xlwt.Pattern() pattern.pattern = xlwt.Pattern.SOLID_PATTERN @@ -107,7 +107,7 @@ for index, title in enumerate(titles): 如果希望为表头设置指定的字体,可以使用`Font`类并添加如下所示的代码。 -```Python +```python font = xlwt.Font() # 字体名称 font.name = '华文楷体' @@ -126,7 +126,7 @@ header_style.font = font 如果希望表头垂直居中对齐,可以使用下面的代码进行设置。 -```Python +```python align = xlwt.Alignment() # 垂直方向的对齐方式 align.vert = xlwt.Alignment.VERT_CENTER @@ -137,7 +137,7 @@ header_style.alignment = align 如果希望给表头加上黄色的虚线边框,可以使用下面的代码来设置。 -```Python +```python borders = xlwt.Borders() props = ( ('top', 'top_colour'), ('right', 'right_colour'), @@ -153,7 +153,7 @@ header_style.borders = borders 如果要调整单元格的宽度(列宽)和表头的高度(行高),可以按照下面的代码进行操作。 -```Python +```python # 设置行高为40px sheet.row(0).set_style(xlwt.easyxf(f'font:height {20 * 40}')) titles = ('姓名', '语文', '数学', '英语') @@ -170,7 +170,7 @@ for index, title in enumerate(titles): 实现公式计算的代码如下所示。 -```Python +```python import xlrd import xlwt from xlutils.copy import copy diff --git "a/Day21-30/25.Python\350\257\273\345\206\231Excel\346\226\207\344\273\266-2.md" "b/Day21-30/25.Python\350\257\273\345\206\231Excel\346\226\207\344\273\266-2.md" index e9e1d506a..02580f116 100755 --- "a/Day21-30/25.Python\350\257\273\345\206\231Excel\346\226\207\344\273\266-2.md" +++ "b/Day21-30/25.Python\350\257\273\345\206\231Excel\346\226\207\344\273\266-2.md" @@ -16,7 +16,7 @@ pip install openpyxl 例如在当前文件夹下有一个名为“阿里巴巴2020年股票数据.xlsx”的 Excel 文件,如果想读取并显示该文件的内容,可以通过如下所示的代码来完成。 -```Python +```python import datetime import openpyxl @@ -63,7 +63,7 @@ for row_ch in range(2, sheet.max_row + 1): 下面我们使用`openpyxl`来进行写 Excel 操作。 -```Python +```python import random import openpyxl @@ -93,7 +93,7 @@ wb.save('考试成绩表.xlsx') 在使用`openpyxl`操作 Excel 时,如果要调整单元格的样式,可以直接通过单元格对象(`Cell`对象)的属性进行操作。单元格对象的属性包括字体(`font`)、对齐(`alignment`)、边框(`border`)等,具体的可以参考`openpyxl`的[官方文档](https://openpyxl.readthedocs.io/en/stable/index.html)。在使用`openpyxl`时,如果需要做公式计算,可以完全按照 Excel 中的操作方式来进行,具体的代码如下所示。 -```Python +```python import openpyxl from openpyxl.styles import Font, Alignment, Border, Side @@ -129,7 +129,7 @@ wb.save('考试成绩表.xlsx') 通过`openpyxl`库,可以直接向 Excel 中插入统计图表,具体的做法跟在 Excel 中插入图表大体一致。我们可以创建指定类型的图表对象,然后通过该对象的属性对图表进行设置。当然,最为重要的是为图表绑定数据,即横轴代表什么,纵轴代表什么,具体的数值是多少。最后,可以将图表对象添加到表单中,具体的代码如下所示。 -```Python +```python from openpyxl import Workbook from openpyxl.chart import BarChart, Reference @@ -175,7 +175,7 @@ wb.save('demo.xlsx') 运行上面的代码,打开生成的 Excel 文件,效果如下图所示。 -image-20210819235009026 + ### 总结 diff --git "a/Day21-30/26.Python\346\223\215\344\275\234Word\345\222\214PowerPoint\346\226\207\344\273\266.md" "b/Day21-30/26.Python\346\223\215\344\275\234Word\345\222\214PowerPoint\346\226\207\344\273\266.md" index 3bbfaa7b6..3281ca8f4 100755 --- "a/Day21-30/26.Python\346\223\215\344\275\234Word\345\222\214PowerPoint\346\226\207\344\273\266.md" +++ "b/Day21-30/26.Python\346\223\215\344\275\234Word\345\222\214PowerPoint\346\226\207\344\273\266.md" @@ -12,7 +12,7 @@ pip install python-docx 按照[官方文档](https://python-docx.readthedocs.io/en/latest/)的介绍,我们可以使用如下所示的代码来生成一个简单的 Word 文档。 -```Python +```python from docx import Document from docx.shared import Cm, Pt @@ -91,7 +91,7 @@ document.save('demo.docx') 对于一个已经存在的 Word 文件,我们可以通过下面的代码去遍历它所有的段落并获取对应的内容。 -```Python +```python from docx import Document from docx.document import Document as Doc @@ -125,7 +125,7 @@ for no, p in enumerate(doc.paragraphs): 接下来我们读取该文件,将占位符替换为真实信息,就可以生成一个新的 Word 文档,如下所示。 -```Python +```python from docx import Document from docx.document import Document as Doc @@ -194,7 +194,7 @@ pip install python-pptx 用 Python 操作 PowerPoint 的内容,因为实际应用场景不算很多,我不打算在这里进行赘述,有兴趣的读者可以自行阅读`python-pptx`的[官方文档](https://python-pptx.readthedocs.io/en/latest/),下面仅展示一段来自于官方文档的代码。 -```Python +```python from pptx import Presentation # 创建幻灯片对象 diff --git "a/Day21-30/27.Python\346\223\215\344\275\234PDF\346\226\207\344\273\266.md" "b/Day21-30/27.Python\346\223\215\344\275\234PDF\346\226\207\344\273\266.md" index e67661e98..00ed13520 100755 --- "a/Day21-30/27.Python\346\223\215\344\275\234PDF\346\226\207\344\273\266.md" +++ "b/Day21-30/27.Python\346\223\215\344\275\234PDF\346\226\207\344\273\266.md" @@ -12,7 +12,7 @@ pip install PyPDF2 `PyPDF2`没有办法从 PDF 文档中提取图像、图表或其他媒体,但它可以提取文本,并将其返回为 Python 字符串。 -```Python +```python import PyPDF2 reader = PyPDF2.PdfReader('test.pdf') @@ -35,7 +35,7 @@ pdf2text.py test.pdf 上面的代码中通过创建`PdfFileReader`对象的方式来读取 PDF 文档,该对象的`getPage`方法可以获得PDF文档的指定页并得到一个`PageObject`对象,通过`PageObject`对象的`rotateClockwise`和`rotateCounterClockwise`方法可以实现页面的顺时针和逆时针方向旋转,通过`PageObject`对象的`addBlankPage`方法可以添加一个新的空白页,代码如下所示。 -```Python +```python reader = PyPDF2.PdfReader('XGBoost.pdf') writer = PyPDF2.PdfWriter() @@ -54,7 +54,7 @@ with open('temp.pdf', 'wb') as file_obj: 使用`PyPDF2`中的`PdfFileWrite`对象可以为PDF文档加密,如果需要给一系列的PDF文档设置统一的访问口令,使用Python程序来处理就会非常的方便。 -```Python +```python import PyPDF2 reader = PyPDF2.PdfReader('XGBoost.pdf') @@ -73,7 +73,7 @@ with open('temp.pdf', 'wb') as file_obj: 上面提到的`PageObject`对象还有一个名为`mergePage`的方法,可以两个 PDF 页面进行叠加,通过这个操作,我们很容易实现给PDF文件添加水印的功能。例如要给上面的“XGBoost.pdf”文件添加一个水印,我们可以先准备好一个提供水印页面的 PDF 文件,然后将包含水印的`PageObject`读取出来,然后再循环遍历“XGBoost.pdf”文件的每个页,获取到`PageObject`对象,然后通过`mergePage`方法实现水印页和原始页的合并,代码如下所示。 -```Python +```python reader1 = PyPDF2.PdfReader('XGBoost.pdf') reader2 = PyPDF2.PdfReader('watermark.pdf') writer = PyPDF2.PdfWriter() @@ -99,7 +99,7 @@ pip install reportlab 下面通过一个例子为大家展示`reportlab`的用法。 -```Python +```python from reportlab.lib.pagesizes import A4 from reportlab.pdfbase import pdfmetrics from reportlab.pdfbase.ttfonts import TTFont diff --git "a/Day21-30/28.Python\345\244\204\347\220\206\345\233\276\345\203\217.md" "b/Day21-30/28.Python\345\244\204\347\220\206\345\233\276\345\203\217.md" index 88fa12b33..bfb171ec7 100755 --- "a/Day21-30/28.Python\345\244\204\347\220\206\345\233\276\345\203\217.md" +++ "b/Day21-30/28.Python\345\244\204\347\220\206\345\233\276\345\203\217.md" @@ -25,7 +25,7 @@ Pillow 中最为重要的是`Image`类,可以通过`Image`模块的`open`函 1. 读取和显示图像 - ```Python + ```python from PIL import Image # 读取图像获得Image对象 @@ -44,7 +44,7 @@ Pillow 中最为重要的是`Image`类,可以通过`Image`模块的`open`函 2. 剪裁图像 - ```Python + ```python # 通过Image对象的crop方法指定剪裁区域剪裁图像 image.crop((80, 20, 310, 360)).show() ``` @@ -53,7 +53,7 @@ Pillow 中最为重要的是`Image`类,可以通过`Image`模块的`open`函 3. 生成缩略图 - ```Python + ```python # 通过Image对象的thumbnail方法生成指定尺寸的缩略图 image.thumbnail((128, 128)) image.show() @@ -63,7 +63,7 @@ Pillow 中最为重要的是`Image`类,可以通过`Image`模块的`open`函 4. 缩放和黏贴图像 - ```Python + ```python # 读取骆昊的照片获得Image对象 luohao_image = Image.open('luohao.png') # 读取吉多的照片获得Image对象 @@ -81,7 +81,7 @@ Pillow 中最为重要的是`Image`类,可以通过`Image`模块的`open`函 5. 旋转和翻转 - ```Python + ```python image = Image.open('guido.jpg') # 使用Image对象的rotate方法实现图像的旋转 image.rotate(45).show() @@ -95,7 +95,7 @@ Pillow 中最为重要的是`Image`类,可以通过`Image`模块的`open`函 6. 操作像素 - ```Python + ```python for x in range(80, 310): for y in range(20, 360): # 通过Image对象的putpixel方法修改图像指定像素点 @@ -107,7 +107,7 @@ Pillow 中最为重要的是`Image`类,可以通过`Image`模块的`open`函 7. 滤镜效果 - ```Python + ```python from PIL import ImageFilter # 使用Image对象的filter方法对图像进行滤镜处理 @@ -125,7 +125,7 @@ Pillow 中有一个名为`ImageDraw`的模块,该模块的`Draw`函数会返 要绘制如上图所示的图像,完整的代码如下所示。 -```Python +```python import random from PIL import Image, ImageDraw, ImageFont diff --git "a/Day21-30/29.Python\345\217\221\351\200\201\351\202\256\344\273\266\345\222\214\347\237\255\344\277\241.md" "b/Day21-30/29.Python\345\217\221\351\200\201\351\202\256\344\273\266\345\222\214\347\237\255\344\277\241.md" index 7183ebdf0..f60598a69 100755 --- "a/Day21-30/29.Python\345\217\221\351\200\201\351\202\256\344\273\266\345\222\214\347\237\255\344\277\241.md" +++ "b/Day21-30/29.Python\345\217\221\351\200\201\351\202\256\344\273\266\345\222\214\347\237\255\344\277\241.md" @@ -16,7 +16,7 @@ 用手机扫码上面的二维码可以通过发送短信的方式来获取授权码,短信发送成功后,点击“我已发送”就可以获得授权码。授权码需要妥善保管,因为一旦泄露就会被其他人冒用你的身份来发送邮件。接下来,我们就可以编写发送邮件的代码了,如下所示。 -```Python +```python import smtplib from email.header import Header from email.mime.multipart import MIMEMultipart @@ -50,7 +50,7 @@ smtp_obj.sendmail( 下面的代码演示了如何发送带附件的邮件。 -```Python +```python import smtplib from email.header import Header from email.mime.multipart import MIMEMultipart @@ -95,7 +95,7 @@ smtp_obj.sendmail( 为了方便大家用 Python 实现邮件发送,我将上面的代码封装成了函数,使用的时候大家只需要调整邮件服务器域名、端口、用户名和授权码就可以了。 -```Python +```python import smtplib from email.header import Header from email.mime.multipart import MIMEMultipart @@ -151,7 +151,7 @@ def send_email(*, from_user, to_users, subject='', content='', filenames=[]): 接下来,我们可以通过`requests`库向平台提供的短信网关发起一个 HTTP 请求,通过将接收短信的手机号和短信内容作为参数,就可以发送短信,代码如下所示。 -```Python +```python import random import requests diff --git "a/Day21-30/30.\346\255\243\345\210\231\350\241\250\350\276\276\345\274\217\347\232\204\345\272\224\347\224\250.md" "b/Day21-30/30.\346\255\243\345\210\231\350\241\250\350\276\276\345\274\217\347\232\204\345\272\224\347\224\250.md" index b0d781b50..90a4d3ef1 100755 --- "a/Day21-30/30.\346\255\243\345\210\231\350\241\250\350\276\276\345\274\217\347\232\204\345\272\224\347\224\250.md" +++ "b/Day21-30/30.\346\255\243\345\210\231\350\241\250\350\276\276\345\274\217\347\232\204\345\272\224\347\224\250.md" @@ -72,7 +72,7 @@ Python 提供了`re`模块来支持正则表达式相关操作,下面是`re` #### 例子1:验证输入用户名和QQ号是否有效并给出对应的提示信息。 -```Python +```python """ 要求:用户名必须由字母、数字或下划线构成且长度在6~20个字符之间,QQ号是5~12的数字且首位不能为0 """ @@ -102,7 +102,7 @@ if m1 and m2: -```Python +```python import re # 创建正则表达式对象,使用了前瞻和回顾来保证手机号前后不应该再出现数字 @@ -131,7 +131,7 @@ while m: #### 例子3:替换字符串中的不良内容 -```Python +```python import re sentence = 'Oh, shit! 你是傻逼吗? Fuck you.' @@ -144,7 +144,7 @@ print(purified) # Oh, *! 你是*吗? * you. #### 例子4:拆分长字符串 -```Python +```python import re poem = '窗前明月光,疑是地上霜。举头望明月,低头思故乡。' diff --git "a/Day66-80/66.\346\225\260\346\215\256\345\210\206\346\236\220\346\246\202\350\277\260.md" "b/Day66-80/66.\346\225\260\346\215\256\345\210\206\346\236\220\346\246\202\350\277\260.md" index 7ea14f11f..4898beef0 100755 --- "a/Day66-80/66.\346\225\260\346\215\256\345\210\206\346\236\220\346\246\202\350\277\260.md" +++ "b/Day66-80/66.\346\225\260\346\215\256\345\210\206\346\236\220\346\246\202\350\277\260.md" @@ -1,14 +1,24 @@ ## 数据分析概述 -当今世界对信息技术的依赖程度在不断加深,每天都会有大量的数据产生,我们经常会感到数据越来越多,但是要从中发现有价值的信息却越来越难。这里所说的信息,可以理解为对数据集处理之后的结果,是从数据集中提炼出的可用于其他场合的结论性的东西,而**从原始数据中抽取出有价值的信息**的这个过程我们就称之为**数据分析**,它是数据科学工作的一部分。 +当今世界,各行各业对信息技术的依赖程度在不断加深,每天都会有大量的数据产生,我们常常会感到数据越来越多,但是要从中发现有价值的信息却越来越难。这里所说的信息,可以理解为对数据集处理之后的结果,是从数据集中提炼出的可用于支撑和指导决策的东西,而**从原始数据中抽取出有价值的信息**的这个过程我们就称之为**数据分析**,它是数据科学的重要组成部分。 -> 定义:**数据分析是有针对性的收集、加工、整理数据并采用统计、挖掘等技术对数据进行探索、分析、呈现和解释的科学**。 +> **定义1**:数据分析是有针对性的收集、加工、整理数据并采用统计、挖掘等技术对数据进行探索、分析、呈现和解释的科学。 +> +> **定义2**:数据分析是通过收集、整理和分析数据,从中提取有价值的信息和洞察,以支持决策和优化过程的活动。(GPT-4o) +> +> **定义3**:数据分析是通过系统性的收集、整理、处理、检验和解释数据,从中提取有价值的信息、形成结论并支持决策的过程,其核心是利用统计、算法和逻辑方法揭示数据背后的规律、趋势或关联。(DeepSeek) -### 数据分析师的职责和技能栈 +对于想从事数据分析工作的人来说,需要掌握两个部分的技能,一是“数据思维”,二是“分析工具”,如下图所示。 -HR在发布招聘需求时,通常将数据工程、数据分析、数据挖掘等岗位都统称为数据分析岗位,但是根据工作性质的不同,又可以分为偏工程的**数据治理方向**、偏业务的**数据分析方向**、偏算法的**数据挖掘方向**、偏开发的**数据开发方向**、偏产品的**数据产品经理**。我们通常所说的数据分析师主要是指**业务数据分析师**,很多数据分析师的职业生涯都是从这个岗位开始的,而且这个岗位也是招聘数量最多的岗位。业务数据分析师在公司通常不属于研发部门而属于运营部门,所以这个岗位也称为**数据运营**或**商业分析**,这类人员通常也被称为“BI工程师”。通常招聘信息对这个岗位的描述(JD)是: +![](res/contents_of_data_analysis.png) + +上图中,分析工具部分其实是比较容易掌握的,像 SQL 或 Python 这样的编程语言,只要经过系统的学习和适量的练习,大部分人都是可以驾驭的;像 Power BI、Tableau 这样的商业智能工具,更是让我们通过“拖拉拽”操作就能完成数据的可视化并在此基础上产生商业洞察,上手难度会更低。相反,数据思维部分的内容对大多数新手来说是不太容易驾驭的,例如“统计思维”,很多人在读书的时候都学习过“概率论和统计学”这样的课程,但是当面对实际的业务场景时,却很难将这些知识映射到业务场景来解决现实的问题。此外,如果没有掌握基本的分析方法、没有理解常用的分析模型,没有相关业务知识的积累,即便我们拿到再多有用的数据,也会感觉无从下手,更不用说产生业务洞察发现商业价值了。所以,数据思维这个部分,除了系统的学习相关知识技能,还需要不断的在实际业务场景中积累和沉淀。 + +### 数据分析师的职责 + +HR在发布招聘需求时,通常将数据工程、数据分析、数据挖掘等岗位都统称为数据分析岗位,但是根据工作性质的不同,又可以分为偏工程的**数据治理方向**、偏业务的**商业分析方向**、偏算法的**数据挖掘方向**、偏应用的**数据开发方向**、偏产品的**数据产品经理**。我们通常所说的数据分析师主要是指**业务数据分析师**,很多数据分析师的职业生涯都是从这个岗位开始的,而且这个岗位也是招聘数量最多的岗位。有些公司会将业务数据分析师归属到具体的业务部门(市场、运营、产品等),有些公司有专门的数据部门(数据分析团队或数据科学团队),还有些公司数据分析师会直接服务高层决策,属于企业战略部门。正因如此,你在招聘网站上看到的把数据分析师称为**数据运营**、**商业分析师**、**BI工程师**就不会感到奇怪了。通常,我们在招聘网站看到的对业务数据分析师岗位职责(JD)的描述如下所示: 1. 负责相关报表的输出。 2. 建立和优化指标体系。 @@ -16,25 +26,31 @@ HR在发布招聘需求时,通常将数据工程、数据分析、数据挖掘 4. 优化和驱动业务,推动数字化运营。 5. 找出潜在的市场和产品的上升空间。 -根据上面的描述,作为业务数据分析师,我们的工作不是给领导一个简单浅显的结论,而是结合公司的业务,完成**监控数据**、**揪出异常**、**找到原因**、**探索趋势**等工作。作为数据分析师,不管是用 Python 语言、Excel、SPSS或其他的商业智能工具,工具只是达成目标的手段,**数据思维是核心技能**,从实际业务问题出发到最终**发现数据中的商业价值**是终极目标。数据分析师在很多公司只是一个基础岗位,精于业务的数据分析师可以向**数据分析经理**或**数据运营总监**等管理岗位发展;对于熟悉机器学习算法的数据分析师来说,可以向**数据挖掘工程师**或**算法专家**方向发展,而这些岗位除了需要相应的数学和统计学知识,在编程能力方面也比数据分析师有更高的要求,可能还需要有大数据存储和处理的相关经验。数据治理岗位主要是帮助公司建设数据仓库或数据湖,实现数据从业务系统、埋点系统、日志系统到分析库的转移,为后续的数据分析和挖掘提供基础设施。数据治理岗位对 SQL 和 HiveSQL 有着较高的要求,需要熟练的使用 ETL 工具,此外还需要对 Hadoop 生态圈有一个较好的认知。作为数据产品经理,除了传统产品经理的技能栈之外,也需要较强的技术能力,例如要了解常用的推荐算法、机器学习模型,能够为算法的改进提供依据,能够制定相关埋点的规范和口径,虽然不需要精通各种算法,但是要站在产品的角度去考虑数据模型、指标、算法等的落地。 +根据上面的描述,作为业务数据分析师,我们的工作不是给出一个简单浅显的结论,而是结合公司的业务,完成**监控数据**、**揪出异常**、**找到原因**、**探索趋势**等工作。不管你是用 Python 语言、Excel、Tableau、SPSS或其他的商业智能工具,工具只是达成目标的手段,**数据思维是核心技能**,从实际业务问题出发到最终**发现数据中的商业价值**是终极目标。数据分析师在很多公司只是一个基础岗位,精于业务的数据分析师可以向**数据分析经理**或**数据运营总监**等管理岗位发展;对于熟悉机器学习算法的数据分析师来说,可以向**数据挖掘工程师**或**算法专家**方向发展,这些岗位除了需要相应的数学和统计学知识,在编程能力方面也比数据分析师有更高的要求,可能还需要有大数据存储和处理的相关经验。 -以下是我总结的数据分析师的技能栈,仅供参考。 +这里顺便说一下其他几个方向,数据治理岗位主要是帮助公司建设数据仓库或数据湖,实现数据从业务系统、埋点系统、日志系统到数据仓库或数据湖的转移,为后续的数据分析和挖掘提供基础设施。数据治理岗位对 SQL 和 HiveSQL 有着较高的要求,需要熟练的使用 ETL 工具,此外还需要对 Hadoop 生态圈有较好的认知。作为数据产品经理,除了传统产品经理的技能栈之外,也需要较强的技术能力,例如要了解常用的推荐算法、机器学习模型,能够为算法的改进提供依据,能够制定相关埋点的规范和口径,虽然不需要精通各种算法,但是要站在产品的角度去考虑数据模型、指标、算法等的落地。 + +### 数据分析师的技能栈 + +数据分析师的技能栈也包括硬技能和软技能,以下是我对这个职位的理解,仅供参考。 1. 计算机科学(数据分析工具、编程语言、数据库) 2. 数学和统计学(数据思维、统计思维) -3. 人工智能(机器学习中的数据挖掘算法) +3. 人工智能(机器学习和深度学习算法) 4. 业务理解能力(沟通、表达、经验) -5. 总结和表述能力(商业PPT、文字总结) +5. 总结和表述能力(总结、汇报、商业 PPT) + +当然,对于一个新手来收,不可能用短时间掌握整个技能栈的内容,但是随着这份工作的深入,上面提到的东西多多少少都会涉猎到,大家可以根据实际工作的需求去深耕其中的某个或某些技能。 -### 数据分析的流程 +### 数据分析通用流程 我们提到数据分析这个词很多时候可能指的都是**狭义的数据分析**,这类数据分析主要目标就是生成可视化报表并通过这些报表来洞察业务中的问题,这类工作一般都是具有滞后性的。**广义的数据分析**还包含了数据挖掘的部分,不仅要通过数据实现对业务的监控和分析,还要利用机器学习算法,找出隐藏在数据背后的知识,并利用这些知识为将来的决策提供支撑,具备一定的前瞻性。 基本的数据分析工作一般包含以下几个方面的内容,当然因为行业和工作内容的不同会略有差异。 1. 确定目标(输入):理解业务,确定指标口径 -2. 获取数据:数据仓库(SQL提数)、电子表格、三方接口、网络爬虫、开放数据集等 -3. 清洗数据:包括对缺失值、重复值、异常值的处理以及相关的预处理(格式化、离散化、二值化等) +2. 获取数据:数据仓库、电子表格、三方接口、网络爬虫、开放数据集等 +3. 清洗数据:缺失值、重复值、异常值的处理以及其他预处理(格式化、离散化、二值化等) 4. 数据透视:排序、统计、分组聚合、交叉表、透视表等 5. 数据呈现(输出):数据可视化,发布工作成果(数据分析报告) 6. 分析洞察(后续):解释数据的变化,提出对应的方案 @@ -50,17 +66,23 @@ HR在发布招聘需求时,通常将数据工程、数据分析、数据挖掘 ### 数据分析相关库 -使用 Python 从事数据科学相关的工作是一个非常棒的选择,因为 Python 整个生态圈中,有大量的成熟的用于数据科学的软件包(工具库)。而且不同于其他的用于数据科学的编程语言(如:Julia、R),Python 除了可以用于数据科学,还能做很多其他的事情,可以说 Python 语言几乎是无所不能的。 +使用 Python 从事数据分析相关的工作是一个非常棒的选择,首先 Python 语言非常容易上手,而且整个 Python 生态圈中,有非常多成熟的用于数据科学的软件包和工具库。不同于其他的数据科学编程语言(如:Julia、R等),Python 除了可以用于数据科学,还能做很多其他的事情。 -#### 三大神器 +#### 经典的三大神器 -1. [NumPy](https://numpy.org/):支持常见的数组和矩阵操作,通过`ndarray`类实现了对多维数组的封装,提供了操作这些数组的方法和函数集。由于 NumPy 内置了并行运算功能,当使用多核 CPU 时,Numpy会自动做并行计算。 -2. [Pandas](https://pandas.pydata.org/):pandas 的核心是其特有的数据结构`DataFrame`和`Series`,这使得 pandas 可以处理包含不同类型数据的表格和时间序列,这一点是NumPy的`ndarray`做不到的。使用 pandas,可以轻松顺利的加载各种形式的数据,然后对数据进行切片、切块、处理缺失值、聚合、重塑和可视化等操作。 +1. [NumPy](https://numpy.org/):支持常见的数组和矩阵操作,通过`ndarray`类实现了对多维数组的封装,提供了操作这些数组的方法和函数。由于 NumPy 内置了并行运算功能,当使用多核 CPU 时,NumPy 会自动做并行计算。 +2. [Pandas](https://pandas.pydata.org/):pandas 的核心是其特有的数据结构`DataFrame`和`Series`,这使得 pandas 可以处理包含不同类型数据的表格和时间序列,这一点是 NumPy 的`ndarray`做不到的。使用 pandas,可以轻松顺利的加载各种形式的数据,然后对数据进行切片、切块、重塑、清洗、聚合、呈现等操作。 3. [Matplotlib](https://matplotlib.org/):matplotlib 是一个包含各种绘图模块的库,能够根据我们提供的数据创建高质量的图表。此外,matplotlib 还提供了 pylab 模块,这个模块包含了很多像 [MATLAB](https://www.mathworks.com/products/matlab.html) 一样的绘图组件。 #### 其他相关库 1. [SciPy](https://scipy.org/):完善了 NumPy 的功能,封装了大量科学计算的算法,包括线性代数、统计检验、稀疏矩阵、信号和图像处理、最优化问题、快速傅里叶变换等。 -2. [Seaborn](https://seaborn.pydata.org/):seaborn 是基于 matplotlib 的图形可视化工具,直接使用 matplotlib 虽然可以定制出漂亮的统计图表,但是总体来说还不够简单方便,seaborn 相当于是对 matplotlib 做了封装,让用户能够以更简洁有效的方式做出各种有吸引力的统计图表。 -3. [Scikit-learn](https://scikit-learn.org/):scikit-learn 最初是 SciPy 的一部分,提供了大量机器学习可能用到的工具,包括数据预处理、监督学习(分类、回归)、无监督学习(聚类)、模式选择、交叉检验等。 -4. [Statsmodels](https://www.statsmodels.org/stable/index.html):包含了经典统计学和经济计量学算法的库。 +2. [Polars](https://pola.rs/):一个高性能的数据分析库,旨在提供比 pandas 更快的数据操作。它支持大规模数据处理,并能够利用多核 CPU 来加速计算,在处理大规模数据集时可以用来替代 pandas。 +3. [Seaborn](https://seaborn.pydata.org/):seaborn 是基于 matplotlib 的图形可视化工具,直接使用 matplotlib 虽然可以定制出漂亮的统计图表,但是总体来说还不够简单方便,seaborn 相当于是对 matplotlib 做了封装,让用户能够以更简洁有效的方式做出各种有吸引力的统计图表。 +4. [Scikit-learn](https://scikit-learn.org/):scikit-learn 最初是 SciPy 的一部分,提供了大量机器学习可能用到的工具,包括数据预处理、监督学习(分类、回归)、无监督学习(聚类)、模式选择、交叉检验等。 +5. [Statsmodels](https://www.statsmodels.org/stable/index.html):包含了经典统计学和计量经济学算法的库,帮助帮助用户完成数据探索、回归分析、假设检验等任务。 +6. [PySpark](https://spark.apache.org/):Apache Spark(大数据处理引擎)的 Python 版本,用于大规模数据处理和分布式计算,能够在分布式环境中高效地进行数据清洗、转化和分析。 +7. [Tensorflow](https://www.tensorflow.org/):TensorFlow 是一个开源的深度学习框架,由 Google 开发,主要面向深度学习任务,常用于构建和训练机器学习模型(尤其是复杂的神经网络模型)。 +8. [Keras](https://keras.io/):Keras 是一个高层次的神经网络 API,主要用于构建和训练深度学习模型。Keras 适合深度学习初学者和研究人员,因为它让构建和训练神经网络变得更加简单。 +9. [PyTorch](https://pytorch.org/):PyTorch 是一个开源的深度学习框架,由 Facebook 开发,广泛用于研究和生产环境。PyTorch 是深度学习研究中的热门框架,在计算机视觉、自然语言处理等领域得到了广泛应用。 +10. [NLTK](https://www.nltk.org/) / [SpaCy](https://spacy.io/):自然语言处理(NLP)库。 diff --git "a/Day66-80/data/2018\345\271\264\345\214\227\344\272\254\347\247\257\345\210\206\350\220\275\346\210\267\346\225\260\346\215\256.csv" "b/Day66-80/code/data/2018\345\271\264\345\214\227\344\272\254\347\247\257\345\210\206\350\220\275\346\210\267\346\225\260\346\215\256.csv" similarity index 100% rename from "Day66-80/data/2018\345\271\264\345\214\227\344\272\254\347\247\257\345\210\206\350\220\275\346\210\267\346\225\260\346\215\256.csv" rename to "Day66-80/code/data/2018\345\271\264\345\214\227\344\272\254\347\247\257\345\210\206\350\220\275\346\210\267\346\225\260\346\215\256.csv" diff --git "a/Day66-80/data/2020\345\271\264\351\224\200\345\224\256\346\225\260\346\215\256.xlsx" "b/Day66-80/code/data/2020\345\271\264\351\224\200\345\224\256\346\225\260\346\215\256.xlsx" similarity index 100% rename from "Day66-80/data/2020\345\271\264\351\224\200\345\224\256\346\225\260\346\215\256.xlsx" rename to "Day66-80/code/data/2020\345\271\264\351\224\200\345\224\256\346\225\260\346\215\256.xlsx" diff --git "a/Day66-80/data/2022\345\271\264\350\202\241\347\245\250\346\225\260\346\215\256.xlsx" "b/Day66-80/code/data/2022\345\271\264\350\202\241\347\245\250\346\225\260\346\215\256.xlsx" similarity index 100% rename from "Day66-80/data/2022\345\271\264\350\202\241\347\245\250\346\225\260\346\215\256.xlsx" rename to "Day66-80/code/data/2022\345\271\264\350\202\241\347\245\250\346\225\260\346\215\256.xlsx" diff --git "a/Day66-80/data/2023\345\271\264\345\214\227\344\272\254\347\247\257\345\210\206\350\220\275\346\210\267\346\225\260\346\215\256.csv" "b/Day66-80/code/data/2023\345\271\264\345\214\227\344\272\254\347\247\257\345\210\206\350\220\275\346\210\267\346\225\260\346\215\256.csv" similarity index 100% rename from "Day66-80/data/2023\345\271\264\345\214\227\344\272\254\347\247\257\345\210\206\350\220\275\346\210\267\346\225\260\346\215\256.csv" rename to "Day66-80/code/data/2023\345\271\264\345\214\227\344\272\254\347\247\257\345\210\206\350\220\275\346\210\267\346\225\260\346\215\256.csv" diff --git a/Day66-80/data/boston_house_price.csv b/Day66-80/code/data/boston_house_price.csv similarity index 100% rename from Day66-80/data/boston_house_price.csv rename to Day66-80/code/data/boston_house_price.csv diff --git "a/Day66-80/data/\346\237\220\346\213\233\350\201\230\347\275\221\347\253\231\346\213\233\350\201\230\346\225\260\346\215\256.csv" "b/Day66-80/code/data/\346\237\220\346\213\233\350\201\230\347\275\221\347\253\231\346\213\233\350\201\230\346\225\260\346\215\256.csv" similarity index 100% rename from "Day66-80/data/\346\237\220\346\213\233\350\201\230\347\275\221\347\253\231\346\213\233\350\201\230\346\225\260\346\215\256.csv" rename to "Day66-80/code/data/\346\237\220\346\213\233\350\201\230\347\275\221\347\253\231\346\213\233\350\201\230\346\225\260\346\215\256.csv" diff --git a/Day66-80/code/day01.ipynb b/Day66-80/code/day01.ipynb new file mode 100644 index 000000000..0a8d1c614 --- /dev/null +++ b/Day66-80/code/day01.ipynb @@ -0,0 +1,602 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c664c108-059f-402a-b216-5ba4caa2d98b", + "metadata": {}, + "source": [ + "## Python数据分析第1天\n", + "\n", + "### 热身练习\n", + "\n", + "如下列表保存着本公司从2022年1月到12月五个销售区域(南京、无锡、苏州、徐州、南通)的销售额(以百万元为单位),请利用这些数据完成以下操作:\n", + "\n", + "```python\n", + "sales_month = [f'{i:>2d}月' for i in range(1, 13)]\n", + "sales_area = ['南京', '无锡', '苏州', '徐州', '南通']\n", + "sales_data = [\n", + " [32, 17, 12, 20, 28],\n", + " [41, 30, 17, 15, 35],\n", + " [35, 18, 13, 11, 24],\n", + " [12, 42, 44, 21, 34],\n", + " [29, 11, 42, 32, 50],\n", + " [10, 15, 11, 12, 26],\n", + " [16, 28, 48, 22, 28],\n", + " [31, 40, 45, 30, 39],\n", + " [25, 41, 47, 42, 47],\n", + " [47, 21, 13, 49, 48],\n", + " [41, 36, 17, 36, 22],\n", + " [22, 25, 15, 20, 37]\n", + "]\n", + "```\n", + "\n", + "1. 统计本公司每个月的销售额。\n", + "2. 统计本公司销售额的月环比。\n", + "3. 统计每个销售区域全年的销售额。\n", + "4. 按销售额从高到低排序销售区域及其销售额。\n", + "5. 统计全年最高的销售额出现在哪个月哪个区域。\n", + "6. 找出哪个销售区域的业绩最不稳定。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9d87cfc-deb0-46eb-b98c-2799a4908bc8", + "metadata": {}, + "outputs": [], + "source": [ + "sales_month = [f'{i:>2d}月' for i in range(1, 13)]\n", + "sales_area = ['南京', '无锡', '苏州', '徐州', '南通']\n", + "sales_data = [\n", + " [32, 17, 12, 20, 28],\n", + " [41, 30, 17, 15, 35],\n", + " [35, 18, 13, 11, 24],\n", + " [12, 42, 44, 21, 34],\n", + " [29, 11, 42, 32, 50],\n", + " [10, 15, 11, 12, 26],\n", + " [16, 28, 48, 22, 28],\n", + " [31, 40, 45, 30, 39],\n", + " [25, 41, 47, 42, 47],\n", + " [47, 21, 13, 49, 48],\n", + " [41, 36, 17, 36, 22],\n", + " [22, 25, 15, 20, 37]\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc581dfc-9108-46fa-ace2-60ace650434e", + "metadata": {}, + "outputs": [], + "source": [ + "# 魔法指令 - %whos - 查看变量\n", + "%whos" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a50e4c3e-6dc1-426f-977b-aef9a5c9a02f", + "metadata": {}, + "outputs": [], + "source": [ + "print = 100" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c0b54ca-1556-4a14-9a6a-b6bd6af5d822", + "metadata": {}, + "outputs": [], + "source": [ + "# 魔法指令 - %xdel - 删除变量\n", + "%xdel print" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fe8eb05f-f45b-491a-b98e-6f6c924997ff", + "metadata": {}, + "outputs": [], + "source": [ + "# 1. 统计本公司每个月的销售额。\n", + "monthly_sales = []\n", + "for i, month in enumerate(sales_month):\n", + " monthly_sales.append(sum(sales_data[i]))\n", + " print(f'{month}销售额: {monthly_sales[i]}百万')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "53e6bf88-e6a9-4ac9-a7fe-bd1d18ff88f5", + "metadata": {}, + "outputs": [], + "source": [ + "# 2. 统计本公司销售额的月环比。\n", + "for i in range(1, len(monthly_sales)):\n", + " temp = (monthly_sales[i] - monthly_sales[i - 1]) / monthly_sales[i - 1]\n", + " print(f'{sales_month[i]}: {temp:.2%}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f5a130d6-b781-4ee3-a96b-d1fe5e3b4b90", + "metadata": {}, + "outputs": [], + "source": [ + "# 3. 统计每个销售区域全年的销售额。\n", + "arealy_sales = {}\n", + "for j, area in enumerate(sales_area):\n", + " temp = [sales_data[i][j] for i in range(len(sales_month))]\n", + " arealy_sales[area] = sum(temp)\n", + " print(f'{area}: {arealy_sales[area]}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a7bd0510-5e68-4e58-ac3b-6c531f7abccb", + "metadata": {}, + "outputs": [], + "source": [ + "# 4. 按销售额从高到低排序销售区域及其销售额。\n", + "sorted_keys = sorted(arealy_sales, key=lambda x: arealy_sales[x], reverse=True)\n", + "for key in sorted_keys:\n", + " print(f'{key}: {arealy_sales[key]}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b4b2f3e8-c5c2-481e-b277-9623d30892ac", + "metadata": {}, + "outputs": [], + "source": [ + "# 5. 统计全年最高的销售额出现在哪个月哪个区域。\n", + "max_value = sales_data[0][0]\n", + "max_i, max_j = 0, 0\n", + "for i in range(len(sales_month)):\n", + " for j in range(len(sales_area)):\n", + " temp = sales_data[i][j]\n", + " if temp > max_value:\n", + " max_value = temp\n", + " max_i, max_j = i, j\n", + "print(sales_month[max_i], sales_area[max_j])" + ] + }, + { + "cell_type": "markdown", + "id": "647d0a87-b672-4e0c-81cc-a3bbb76dca11", + "metadata": {}, + "source": [ + "总体方差:\n", + "$$\n", + "\\sigma^{2} = \\frac{1}{N} \\sum_{i=1}^{N}(x_{i} - \\mu)^{2}\n", + "$$\n", + "\n", + "样本方差:\n", + "$$\n", + "s^{2} = \\frac{1}{n - 1} \\sum_{i=1}^{n}(x_{i} - \\bar{x})^{2}\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b43fb247-32fc-4e10-a9ee-488fd1f56a9a", + "metadata": {}, + "outputs": [], + "source": [ + "# 6. 找出哪个销售区域的业绩最不稳定。\n", + "import statistics as stats\n", + "\n", + "arealy_vars = []\n", + "for j, area in enumerate(sales_area):\n", + " temp = [sales_data[i][j] for i in range(len(sales_month))]\n", + " arealy_vars.append(stats.pvariance(temp))\n", + "sales_area[arealy_vars.index(max(arealy_vars))]" + ] + }, + { + "cell_type": "markdown", + "id": "3ea677d0-7a33-43e5-b10b-ddfcb82f7f6a", + "metadata": {}, + "source": [ + "### 三大神器\n", + "\n", + "1. numpy - Numerical Python - 核心是`ndarray`类型,可以用来表示N维数组,提供了一系列处理数据的运算、函数和方法。\n", + "2. pandas - Panel Data Set - 封装了和数据分析(加载、重塑、清洗、预处理、透视、呈现)相关的类型、函数和诸多的方法,为数据分析提供了一站式解决方案。它的核心有三个数据类型,分别是:`Series`、`DataFrame`、`Index`。\n", + "3. matplotlib - 封装了各种常用的统计图表,帮助我们实现数据呈现。\n", + "4. scipy - Scientific Python - 针对NumPy进行了很好的补充,提供了高级的数据运算的函数和方法。\n", + "5. scikit-learn - 封装了常用的机器学习(分类、聚类、回归等)算法,除此之外,还提供了数据预处理、特征工程、模型验证相关的函数和方法。\n", + "6. sympy - Symbolic Python - 封装了符号运算相关操作。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0db758cc-d83c-47c4-9a0b-c7ef5abd6c18", + "metadata": {}, + "outputs": [], + "source": [ + "# 魔法指令 - %pip - 调用包管理工具pip\n", + "# %pip install numpy pandas matplotlib openpyxl" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8eb6970b-3907-4b84-af60-67cbf67f2e74", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n", + "plt.rcParams['axes.unicode_minus'] = False" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5fb76dec-cd51-4e79-9bd2-3b210ae20522", + "metadata": {}, + "outputs": [], + "source": [ + "np.__version__" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6369df9-7577-496c-bfc1-2fce096c0162", + "metadata": {}, + "outputs": [], + "source": [ + "pd.__version__" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eb5733cd-38f7-4afd-b45b-70c1439ab36b", + "metadata": {}, + "outputs": [], + "source": [ + "# 将嵌套列表处理成二维数组\n", + "data = np.array(sales_data)\n", + "data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da304104-8cf0-4425-b3b4-dcb148ac4b3a", + "metadata": {}, + "outputs": [], + "source": [ + "# 沿着1轴求和(每个月的销售额)\n", + "data.sum(axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1507ac63-f53b-4e36-a7fb-b9c636fd81ea", + "metadata": {}, + "outputs": [], + "source": [ + "# 沿着0轴求和(每个区域的销售)\n", + "data.sum(axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26be450d-44ba-4d83-9351-c52a13c2c338", + "metadata": {}, + "outputs": [], + "source": [ + "# 总体方差\n", + "data.var(axis=0).round(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "81e5b2a0-c86e-4720-909f-ce8b1b6fdd58", + "metadata": {}, + "outputs": [], + "source": [ + "# 样本方差\n", + "data.var(axis=0, ddof=1).round(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba4e0f0a-e711-4041-8834-1e3be86ce8a4", + "metadata": {}, + "outputs": [], + "source": [ + "# 构造DataFrame对象(处理二维数据)\n", + "df = pd.DataFrame(data, columns=sales_area, index=sales_month)\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d1a6a43-6dfc-41e3-98c8-be2681e0d547", + "metadata": {}, + "outputs": [], + "source": [ + "# 求和(默认沿着0轴)\n", + "df.sum()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a478ec0e-499f-4e31-b8c2-ba45e691b834", + "metadata": {}, + "outputs": [], + "source": [ + "# 排序\n", + "df.sum().sort_values(ascending=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f221833-855c-45ad-91b2-e3f4da627704", + "metadata": {}, + "outputs": [], + "source": [ + "# 求和(指定沿着1轴)\n", + "df.sum(axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "80df8865-4ea0-4c72-a581-215cd953cfbe", + "metadata": {}, + "outputs": [], + "source": [ + "# 计算月环比\n", + "df.sum(axis=1).pct_change()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea4579c3-11cd-4179-9c96-8dbe9a033da2", + "metadata": {}, + "outputs": [], + "source": [ + "df['合计'] = df.sum(axis=1)\n", + "df['月环比'] = df['合计'].pct_change()\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3c660052-dded-4a0a-8b72-7747d3cae816", + "metadata": {}, + "outputs": [], + "source": [ + "# 渲染DataFrame\n", + "df.style.format(\n", + " formatter={'月环比': '{:.2%}'},\n", + " na_rep='------'\n", + ").bar(\n", + " subset='合计'\n", + ").background_gradient(\n", + " 'RdYlBu', subset='月环比'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a092c12c-dab6-4272-b1cd-5218998fcd90", + "metadata": {}, + "outputs": [], + "source": [ + "# 将DataFrame输出到Excel文件\n", + "df.to_excel('sales.xlsx', sheet_name='data')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "54c3f505-e866-4c4e-a3f8-f55a71a95c3f", + "metadata": {}, + "outputs": [], + "source": [ + "# 魔法指令 - %config - 修改配置\n", + "# %config InlineBackend.figure_format = 'svg'\n", + "get_ipython().run_line_magic('config', 'InlineBackend.figure_format = \"svg\"')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3951055d-d5d2-4e4e-bbe7-a1b40a6731e0", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制柱状图\n", + "plt.figure(figsize=(8, 4), dpi=200)\n", + "df.plot(ax=plt.gca(), kind='bar', y='合计', legend=False)\n", + "plt.xticks(rotation=0)\n", + "plt.savefig('aa.png')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "8a5236f7-072b-466c-9be3-afbab394f5cb", + "metadata": {}, + "source": [ + "### 魔法指令" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d5c6a18b-2863-4855-8ef7-2c0aa99b7d5c", + "metadata": {}, + "outputs": [], + "source": [ + "# 查看当前工作路径 - print working directory\n", + "%pwd" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "80a9f9e0-1528-40cf-910c-f3c8e5e7e3b9", + "metadata": {}, + "outputs": [], + "source": [ + "# 查看指定路径文件列表 - list directory contents\n", + "%ls" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "620a54ed-9c29-4058-9d20-c4df72ba4c62", + "metadata": {}, + "outputs": [], + "source": [ + "# 执行系统命令\n", + "%system date" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "659215ed-113a-4d8f-9036-0fcf47c96021", + "metadata": {}, + "outputs": [], + "source": [ + "# 保存运行过的代码\n", + "%save temp.py" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8fc9c4e4-1423-40f3-b4ee-db2ba2e5d125", + "metadata": {}, + "outputs": [], + "source": [ + "# 加载指定文件内容\n", + "%load temp.py" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58a08283-561c-43d4-8db6-74cde401b8a9", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计代码执行时间\n", + "%timeit (1, 2, 3, 4, 5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22a271ab-3f5c-4167-b89e-66a31e891cbd", + "metadata": {}, + "outputs": [], + "source": [ + "# 查看历史输入\n", + "%hist" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d4ffa792-f1a0-4be9-b2aa-642ee0b9a1ae", + "metadata": {}, + "outputs": [], + "source": [ + "# 查看魔法指令\n", + "%lsmagic" + ] + }, + { + "cell_type": "markdown", + "id": "a15db907-c068-41d7-a24c-8f1c5c20d4ec", + "metadata": {}, + "source": [ + "### 获取帮助" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e037694-9357-46b9-864a-c5f93e1aa8c8", + "metadata": {}, + "outputs": [], + "source": [ + "np.random?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11a97abd-d73d-493e-b727-9c4ded3e5060", + "metadata": {}, + "outputs": [], + "source": [ + "np.random.normal?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66503921-cd69-4394-80ea-7fecf6ecdc33", + "metadata": {}, + "outputs": [], + "source": [ + "np.random.r*?" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Day66-80/code/day02.ipynb b/Day66-80/code/day02.ipynb new file mode 100644 index 000000000..9113ae0c9 --- /dev/null +++ b/Day66-80/code/day02.ipynb @@ -0,0 +1,10086 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "f11bbe2b-7325-4bf0-abfb-4b4e64292145", + "metadata": {}, + "source": [ + "## NumPy入门\n", + "\n", + "NumPy是Python数据科学三方库中最为重要的基石,提供了数据存储和运算的能力,其他很多跟数据科学相关的库底层都依赖了NumPy。NumPy的核心是名为`ndarray`的数据类型,用来表示任意维度的数组,相较于Python的`list`,它具有以下优势:\n", + "\n", + "1. 有更好的性能,可以利用硬件的并行计算能力和缓存优化,相较于`list`在处理数据的性能上有着数量级的差异。\n", + "2. 功能更加强大,`ndarray`提供了丰富的运算和方法来处理数据,NumPy中还针对数组操作封装了大量的函数。\n", + "3. 向量化操作,NumPy中的函数以及`ndarray`的方法都是对作用于整个数组,无需使用显示的循环,代码更加简单优雅。" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "15630f70-be3c-4690-96a6-0b134a685efb", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n", + "plt.rcParams['axes.unicode_minus'] = False" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "8a115f09-0477-4e62-a910-d9284f32fbd1", + "metadata": {}, + "outputs": [], + "source": [ + "# %save hello.py" + ] + }, + { + "cell_type": "markdown", + "id": "a8c3b5a1-9ad1-4511-b7f8-beacce89cf69", + "metadata": {}, + "source": [ + "### 创建数组对象\n", + "\n", + "1. 通过`array`/`asarray`函数将列表处理成数组对象\n", + "2. 通过`arange`函数指定起始值、终止值和跨度创建数组对象\n", + "3. 通过`linspace`函数指定起始值、终止值和元素个数创建等差数列\n", + "4. 通过`logspace`函数指定起始值(指数)、终止值(指数)、元素个数、底数(默认10)创建等比数列\n", + "5. 通过`fromstring`/`fromfile`函数从字符串或文件中读取数据创建数组对象\n", + "6. 通过`fromiter`函数通过迭代器获取数据创建数组对象\n", + "7. 通过生成随机元素的方式创建数组对象\n", + "8. 通过`zeros`/`zeros_like`函数创建全0元素的数组对象\n", + "9. 通过`ones`/`ones_like`函数创建全1元素的数组对象\n", + "10. 通过`full`函数指定元素值创建数组对象\n", + "11. 通过`eye`函数创建单位矩阵\n", + "12. 通过`tile`/`repeat`函数重复元素创建数组对象" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "a035b671-2f91-473c-ac2b-e25291cf664b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 2, 3, 4, 5], dtype=int32)" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法一:通过array函数将列表处理成数组对象\n", + "array1 = np.array([1, 2, 3, 4, 5], dtype='i4')\n", + "array1" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "28a8c2f7-c197-4d5e-9006-d99242edefee", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "numpy.ndarray" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(array1)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "95eae152-6bfc-4707-bd04-50c7363e9315", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 2, 3],\n", + " [4, 5, 6]])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2 = np.array([[1, 2, 3], [4, 5, 6]])\n", + "array2" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "3c662a6d-d017-4a85-b93c-8538f334db22", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 2, 3, 4, 5, 6, 7, 8, 9])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法二:通过arange函数指定范围创建数组对象\n", + "array3 = np.arange(1, 10)\n", + "array3" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "954605a1-1004-4595-ac90-523feac7f4e9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49,\n", + " 52, 55, 58, 61, 64, 67, 70, 73, 76, 79, 82, 85, 88, 91, 94, 97])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array4 = np.arange(1, 100, 3)\n", + "array4" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "e479f343-7033-4b73-a0ce-e7f04444e915", + "metadata": {}, + "outputs": [], + "source": [ + "# 方法三:通过linspace函数创建等差数列\n", + "array5 = np.linspace(-2 * np.pi, 2 * np.pi, 120)\n", + "array6 = np.sin(array5)\n", + "array7 = np.cos(array5)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "57b0e126-3276-4f10-b1d1-288597842d35", + "metadata": {}, + "outputs": [], + "source": [ + "%config InlineBackend.figure_format = 'svg'\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "62a6b0ed-acaf-45a1-90b0-7caf73184d09", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " 2024-09-19T22:45:17.121512\n", + " image/svg+xml\n", + " \n", + " \n", + " Matplotlib v3.9.2, https://matplotlib.org/\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure(figsize=(8, 4))\n", + "# 绘制折线图\n", + "plt.plot(array5, array6, marker='.', color='darkgreen')\n", + "plt.plot(array5, array7, marker='.', color='coral')\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "fc6ed6f5-6844-47df-8dde-fd9e598af48d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024])" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法四:通过logspace函数创建等比数列\n", + "array8 = np.logspace(0, 10, num=11, base=2, dtype='i8')\n", + "array8" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "f3d2ba81-cf13-4d96-afef-f9221b4b4a68", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1, 11, 111, 2, 22, 222])" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法五:通过fromstring/fromfile/fromregex函数从字符串读取数据创建数组\n", + "array9 = np.fromstring('1, 11, 111, 2, 22, 222', sep=',', dtype='i8')\n", + "array9" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "1c691969-f93c-4e32-9ba5-ab5e074a6409", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.core.interactiveshell import InteractiveShell\n", + "\n", + "InteractiveShell.ast_node_interactivity = 'last_expr'" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "d8eb5b58-d129-459f-9969-543191fb1966", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47])" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array10 = np.fromfile('res/prime.txt', dtype='i8', sep='\\n', count=15)\n", + "array10" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "ff01ce19-fc81-41db-88a2-647299ec940c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 面试官:请说一下Python中的迭代器是什么?它跟生成器是什么关系?\n", + "# 迭代器是实现了迭代器协议的对象。在Python中迭代器协议是两个魔术方法:__iter__、__next__\n", + "# 我们可以通过next函数或者for-in循环从迭代器中获取数据\n", + "# 迭代器的编写相对比较麻烦,所以在Python中可以用创建生成器的方式简化迭代器语法\n", + "\n", + "\n", + "def fib(count):\n", + " a, b = 0, 1\n", + " for _ in range(count):\n", + " a, b = b, a + b\n", + " yield a\n", + "\n", + "\n", + "gen = fib(50)\n", + "gen" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "5a2c15a6-5744-4eb6-8f94-3132f7e0b1b6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1, 1, 2, 3, 5,\n", + " 8, 13, 21, 34, 55,\n", + " 89, 144, 233, 377, 610,\n", + " 987, 1597, 2584, 4181, 6765,\n", + " 10946, 17711, 28657, 46368, 75025,\n", + " 121393, 196418, 317811, 514229, 832040,\n", + " 1346269, 2178309, 3524578, 5702887, 9227465,\n", + " 14930352, 24157817, 39088169, 63245986, 102334155,\n", + " 165580141, 267914296, 433494437, 701408733, 1134903170,\n", + " 1836311903, 2971215073, 4807526976, 7778742049, 12586269025])" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法六:通过fromiter函数从迭代器中读取数据创建数组对象\n", + "array11 = np.fromiter(fib(50), dtype='i8')\n", + "array11" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "ca62e6fb-f0c9-4f12-acdb-978507e37f94", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[90, 45, 91, 71],\n", + " [85, 2, 98, 76],\n", + " [58, 50, 72, 13],\n", + " [66, 90, 26, 69],\n", + " [23, 44, 68, 98]])" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法七:通过生成随机元素创建数组对象\n", + "array12 = np.random.randint(0, 101, (5, 4))\n", + "array12" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "2ec313a2-bdd6-492f-9176-172b1ec54534", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0.35742922, 0.49173669, 0.14993948, 0.15556126, 0.48435648,\n", + " 0.57329703, 0.7256331 , 0.96709102, 0.79687864, 0.95782978])" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array13 = np.random.random(10)\n", + "array13" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "c8ca9327-c30e-4068-b0bc-c18c49a48c89", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([155., 173., 172., ..., 171., 176., 157.])" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array14 = np.random.normal(169, 8.5, 5000).round(0)\n", + "array14" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "e6f0fe78-41bf-45c9-a433-cf3cc40cdc11", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " 2024-09-19T22:45:17.194053\n", + " image/svg+xml\n", + " \n", + " \n", + " Matplotlib v3.9.2, https://matplotlib.org/\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# 绘制直方图\n", + "plt.hist(array14, bins=15, color='#6B8A7A')\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "88de114b-c346-4150-b33c-f2d14dd84193", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.]])" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法八:通过zeros/zeros_like函数创建全0元素的数组对象\n", + "array15 = np.zeros((5, 4))\n", + "array15" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "b03bc9d0-1274-46a4-a9b6-28f634a9d034", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0, 0, 0],\n", + " [0, 0, 0]])" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array16 = np.zeros_like(array2)\n", + "array16" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "db9f61e1-d50a-4f7f-a1b4-66798c0976ef", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1., 1., 1., 1.],\n", + " [1., 1., 1., 1.],\n", + " [1., 1., 1., 1.],\n", + " [1., 1., 1., 1.],\n", + " [1., 1., 1., 1.]])" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法九:通过ones/ones_like函数创建全0元素的数组对象\n", + "array17 = np.ones((5, 4))\n", + "array17" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "07ddd6d3-2e91-4a1b-857d-5f8b6867904d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 1, 1],\n", + " [1, 1, 1]])" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array18 = np.ones_like(array2)\n", + "array18" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "6b45e1db-8e1c-4af9-9495-c3a7f06b9311", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[100, 100, 100, 100],\n", + " [100, 100, 100, 100],\n", + " [100, 100, 100, 100],\n", + " [100, 100, 100, 100],\n", + " [100, 100, 100, 100]])" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法十:通过full函数指定值和形状创建数组对象\n", + "array19 = np.full((5, 4), 100)\n", + "array19" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "1ae4620a-11bd-464b-abc2-83f7f8b7e8ba", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法十一:通过eye函数创建单位矩阵\n", + "# identify matrix --> I --> eye\n", + "array20 = np.eye(10)\n", + "array20" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "0ae939c8-0977-42b0-8d8a-351f55b0471e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3,\n", + " 3, 3, 3, 3, 3, 3, 3, 3])" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 方法十二:通过repeat/tile函数重复元素创建数组对象\n", + "array21 = np.repeat([1, 2, 3], 10)\n", + "array21" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "c1a8d343-a30d-4144-baa8-b000a65c070d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1,\n", + " 2, 3, 1, 2, 3, 1, 2, 3])" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array22 = np.tile([1, 2, 3], 10)\n", + "array22" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "96e8c639-1a14-453c-89a5-6fd74ca89a88", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[[ 36, 33, 28],\n", + " [ 36, 33, 28],\n", + " [ 36, 33, 28],\n", + " ...,\n", + " [ 32, 31, 29],\n", + " [ 32, 31, 27],\n", + " [ 31, 32, 26]],\n", + "\n", + " [[ 37, 34, 29],\n", + " [ 38, 35, 30],\n", + " [ 38, 35, 30],\n", + " ...,\n", + " [ 31, 30, 28],\n", + " [ 31, 30, 26],\n", + " [ 30, 31, 25]],\n", + "\n", + " [[ 38, 35, 30],\n", + " [ 38, 35, 30],\n", + " [ 38, 35, 30],\n", + " ...,\n", + " [ 30, 29, 27],\n", + " [ 30, 29, 25],\n", + " [ 29, 30, 25]],\n", + "\n", + " ...,\n", + "\n", + " [[239, 178, 123],\n", + " [237, 176, 121],\n", + " [235, 174, 119],\n", + " ...,\n", + " [ 78, 68, 56],\n", + " [ 76, 66, 54],\n", + " [ 73, 65, 52]],\n", + "\n", + " [[238, 177, 120],\n", + " [236, 175, 118],\n", + " [234, 173, 116],\n", + " ...,\n", + " [ 80, 70, 58],\n", + " [ 78, 68, 56],\n", + " [ 74, 67, 51]],\n", + "\n", + " [[237, 176, 119],\n", + " [236, 175, 118],\n", + " [234, 173, 116],\n", + " ...,\n", + " [ 83, 71, 59],\n", + " [ 81, 69, 57],\n", + " [ 77, 68, 53]]], dtype=uint8)" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 补充:读图片获得一个三维数组对象\n", + "guido_image = plt.imread('res/guido.jpg')\n", + "guido_image" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "dff598ad-ce04-4115-bffc-b70decd6a54e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(750, 500, 3)" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "b8e6b35a-e9bb-489e-ab54-9fc2874b5708", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " 2024-09-19T22:45:17.291166\n", + " image/svg+xml\n", + " \n", + " \n", + " Matplotlib v3.9.2, https://matplotlib.org/\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.imshow(guido_image)" + ] + }, + { + "cell_type": "markdown", + "id": "0e6bc238-b4f7-45ed-8d3e-194576f67fa9", + "metadata": {}, + "source": [ + "### 数组对象的属性\n", + "\n", + "1. `size` - 元素的个数\n", + "2. `dtype` - 元素的数据类型\n", + "3. `ndim` - 数组的维度\n", + "4. `shape` - 数组的形状\n", + "5. `itemsize` - 每个元素占用的内存空间大小(字节)\n", + "6. `nbytes` - 所有元素占用的内存空间大小(字节)\n", + "7. `T` - 转置\n", + "8. `flags` - 内存信息\n", + "9. `base` - 根基" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "699ac4e0-a11a-4f23-9469-052371e6a140", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 2, 3, 4, 5], dtype=int32)" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array1" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "292a05b0-351f-4e6b-8968-aebe3b859b0e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 大小 - 元素个数\n", + "array1.size" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "0b382c41-8a42-4946-8d82-27c959d08cf8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dtype('int32')" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 数据类型\n", + "array1.dtype" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "17c532d9-f1bf-4da9-9a98-4b450997d32b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 维度\n", + "array1.ndim" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "c1d57809-19cd-4540-a23a-05d5d639b98b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5,)" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 形状 - 元组\n", + "array1.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "3376c6e2-ec8e-4743-81dd-2409fd869a52", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 每个元素占用内存空间大小(字节)\n", + "array1.itemsize" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "29f66406-f940-434f-8d03-c0706cfa412b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "20" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 所有元素占用内存空间大小(字节)\n", + "array1.nbytes" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "68066920-3062-41cf-9cf7-012850461d70", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 2, 3],\n", + " [4, 5, 6]])" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "43e80ef1-edcf-45b7-8143-d4a159a71c0b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 4],\n", + " [2, 5],\n", + " [3, 6]])" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2.T" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "ea0cc4f7-b504-458f-acdb-ff79d9730a19", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2.size" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "b268f3dc-4652-4736-8831-b21e1f3e76d8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dtype('int64')" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2.dtype" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "b93e1712-3e62-4464-90e7-d90847e1763e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2.ndim" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "0f17fd54-6969-4104-9d0e-cfc70678e663", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 3)" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "f573369f-07ef-4dea-a46a-a4f5d837ec5f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2.itemsize" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "cc8887de-4600-47f4-beb3-37979ec079f4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "48" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2.nbytes" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "9457f0c9-b433-4ec5-961d-6e8ebedad2db", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + " C_CONTIGUOUS : True\n", + " F_CONTIGUOUS : False\n", + " OWNDATA : True\n", + " WRITEABLE : True\n", + " ALIGNED : True\n", + " WRITEBACKIFCOPY : False" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2.flags" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "22adf5d7-c491-4558-af40-fa1a7fef7d6b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1125000" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image.size" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "adf08994-775b-4276-8392-95a9da821fb8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dtype('uint8')" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image.dtype" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "id": "81f03343-beac-4fe2-9b5e-9e72ca4387ae", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image.ndim" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "2d01a9e3-62f1-4145-83c3-3f16ac647975", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(750, 500, 3)" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "47576187-1b9b-4c08-b63a-9abe6a1352a9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image.itemsize" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "id": "7304e015-1f67-4920-a045-baf66c60e9df", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1125000" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image.nbytes" + ] + }, + { + "cell_type": "markdown", + "id": "bca272d8-c859-4c9f-ad8f-76b8ba2926bf", + "metadata": {}, + "source": [ + "### 数组对象的运算\n", + "\n", + "#### 算术运算\n", + "\n", + "1. 与标量运算\n", + "2. 与数组运算 - 两个数组形状相同" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "5bec613e-df64-4a14-82a8-5ffa05d8ec48", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([11, 12, 13, 14, 15], dtype=int32)" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array1 + 10" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "id": "c2c2001a-4743-44ba-93e6-31b089196e31", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 5, 10, 15],\n", + " [20, 25, 30]])" + ] + }, + "execution_count": 55, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2 * 5" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "575bb5dc-bb20-471e-9237-e500d2f3796a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 4, 9],\n", + " [16, 25, 36]])" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2 ** 2" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "id": "82bd25ad-a422-46cb-aa1c-41dba47fd54b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 1, 3],\n", + " [4, 7, 2]])" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp1 = np.random.randint(1, 10, (2, 3))\n", + "temp1" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "id": "2a2a9ce0-41ef-417b-a082-b1a8b15213c6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 2, 3, 6],\n", + " [ 8, 12, 8]])" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp1 + array2" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "id": "af6ac896-2992-4e78-8fcc-9d2a55ccd188", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 2, 9],\n", + " [16, 35, 12]])" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp1 * array2" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "id": "1cf8cb80-3ba2-4b6a-8985-a4b63bdceda2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 1, 27],\n", + " [ 256, 16807, 64]])" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp1 ** array2" + ] + }, + { + "cell_type": "markdown", + "id": "f6f13e4c-754a-4017-8898-52b00c97a910", + "metadata": {}, + "source": [ + "#### 比较运算\n", + "\n", + "1. 与标量运算\n", + "2. 与数组运算" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "id": "bd4f7a63-341a-4a8a-9042-a2c286e606c6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([False, False, False, True, True])" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array1 > 3" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "id": "68ebdc1a-ca87-439b-9aaf-bc59742c04f0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[False, False, False],\n", + " [ True, True, True]])" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2 > 3" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "id": "9d292f4f-3067-4fce-85aa-b4787bb90d24", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[False, False, False],\n", + " [False, True, False]])" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp1 > array2" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "id": "ea44e117-566f-4fad-a9ee-6ab380461a75", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ True, False, True],\n", + " [ True, False, False]])" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp1 == array2" + ] + }, + { + "cell_type": "markdown", + "id": "5e2ed3d9-72ba-42ac-a1c3-00f469d7a3bc", + "metadata": {}, + "source": [ + "#### 逻辑运算\n", + "\n", + "1. 与标量的运算\n", + "2. 与数组的运算" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "id": "621c9302-b475-4151-8627-36001424a38d", + "metadata": {}, + "outputs": [], + "source": [ + "temp2 = np.array([True, False, True, False, True])\n", + "temp3 = np.array([True, False, False, False, True])" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "id": "355c4862-058d-47a0-9d36-331881f26c6e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ True, False, True, False, True])" + ] + }, + "execution_count": 66, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp2 & True" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "711b9eda-9fe2-49b6-903c-3921507abafa", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ True, True, True, True, True])" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp2 | True" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "id": "0ec28456-84a7-4161-8bc6-765c4410ca7a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ True, False, False, False, True])" + ] + }, + "execution_count": 68, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp2 & temp3" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "id": "11c56fd5-ae20-4e55-96c5-aacf2b4e3df1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ True, False, True, False, True])" + ] + }, + "execution_count": 69, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp2 | temp3" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "id": "afcc3303-0705-42b0-a503-b3efbd68590a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([False, True, False, True, False])" + ] + }, + "execution_count": 70, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "~temp2" + ] + }, + { + "cell_type": "markdown", + "id": "df7b1f3f-4d89-45ea-9982-69ae9069dc8c", + "metadata": {}, + "source": [ + "#### 索引运算\n", + "\n", + "1. 普通索引 - 跟列表的索引运算类似\n", + "2. 花式索引 - 用列表或数组充当数组的索引\n", + "3. 布尔索引 - 用保存布尔值的数组充当索引\n", + "4. 切片索引 - 跟列表的切片运算类似" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "id": "e845e1b3-d137-4832-b016-ed8d64c18a8f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([42, 49, 40, 75, 55, 99, 44, 80, 74])" + ] + }, + "execution_count": 71, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4 = np.random.randint(1, 100, 9)\n", + "temp4" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "id": "a6b49f68-f43b-42be-aa8b-0bbe4ca52f79", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(99)" + ] + }, + "execution_count": 72, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4[5]" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "id": "35bc13a1-1ce8-4bf8-ba6a-310d145788da", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(99)" + ] + }, + "execution_count": 73, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4[-4]" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "id": "4f53346f-a086-400f-84a7-a7ce264826bd", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([42, 49, 40, 75, 55, 99, 44, 80, 74])" + ] + }, + "execution_count": 74, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4[5] = 99\n", + "temp4" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "id": "0f6f4528-191c-4ca8-809d-6503d4076a53", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[95, 91, 74, 23, 37],\n", + " [90, 74, 38, 87, 24],\n", + " [ 9, 85, 23, 33, 36],\n", + " [86, 76, 57, 12, 22]])" + ] + }, + "execution_count": 75, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5 = np.random.randint(1, 100, (4, 5))\n", + "temp5" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "id": "7bcbf6e0-7e2e-4e41-bab8-cd9b99e67ad3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(38)" + ] + }, + "execution_count": 76, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[1][2]" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "id": "a7a9a1c8-ecbc-4e7e-9f4e-3a7e8dadb249", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(38)" + ] + }, + "execution_count": 77, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[1, 2]" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "id": "979c64b0-a600-4229-859c-252bb597185d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[95, 91, 74, 23, 37],\n", + " [90, 74, 38, 87, 24],\n", + " [ 9, 85, 23, 33, 36],\n", + " [86, 76, 57, 12, 99]])" + ] + }, + "execution_count": 78, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[-1, -1] = 99\n", + "temp5" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "id": "4a5823e6-d3ff-411e-8f06-858dbdac006b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[95, 91, 74, 23, 37],\n", + " [90, 74, 38, 87, 24],\n", + " [ 9, 85, 23, 33, 36],\n", + " [86, 55, 57, 12, 99]])" + ] + }, + "execution_count": 79, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[-1, 1] = 55\n", + "temp5" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "id": "6282b8c7-d23a-4079-ad98-88bc606ff93f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[36, 33, 28],\n", + " [36, 33, 28],\n", + " [36, 33, 28],\n", + " ...,\n", + " [32, 31, 29],\n", + " [32, 31, 27],\n", + " [31, 32, 26]], dtype=uint8)" + ] + }, + "execution_count": 80, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "id": "0f7e83f3-4ab2-44a6-b537-c32645fe2abc", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([36, 33, 28], dtype=uint8)" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image[0, 0]" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "id": "03477e7b-04f2-4de3-bd4d-f070b1983304", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.uint8(33)" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "guido_image[0, 0, 1]" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "id": "0e4f6f3f-0cef-4e6a-89d9-d3281f59c5d8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([49, 49, 49, 40, 40, 80, 99, 99])" + ] + }, + "execution_count": 83, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 花式索引 - fancy index - 用放整数的列表或者数组充当数组的索引\n", + "temp4[[1, 1, 1, 2, 2, -2, -4, -4]]" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "id": "1ead8a5d-f4c0-4f5a-8709-b072ba676118", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([23, 74, 74, 33, 23, 23, 23])" + ] + }, + "execution_count": 84, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[[0, 1, 1, 2, 0, 0, 0], [3, 1, 1, -2, -2, -2, -2]]" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "id": "5e31efc0-8f1a-4a8d-ab85-8e1cc592d5d8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([42, 75, 99, 80])" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 布尔索引 - 用放布尔值的数组或列表充当数组的索引 - 实现数据筛选\n", + "temp4[[True, False, False, True, False, True, False, True, False]]" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "id": "6145ef49-423f-40e0-acc7-8a8f897f4fb6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([False, False, False, True, False, True, False, True, True])" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4 > 70" + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "id": "d48715c6-c743-4457-9896-211d1ad74f97", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([75, 99, 80, 74])" + ] + }, + "execution_count": 87, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4[temp4 > 70]" + ] + }, + { + "cell_type": "code", + "execution_count": 88, + "id": "efe7652f-1a75-4eed-9b33-2d52dfd25626", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ True, False, True, False, False, False, True, True, True])" + ] + }, + "execution_count": 88, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4 % 2 == 0" + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "id": "4f2d6b1b-259d-4721-8dc6-326be4a73d57", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([42, 40, 44, 80, 74])" + ] + }, + "execution_count": 89, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4[temp4 % 2 == 0]" + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "id": "261f57fd-e06a-4aca-8c84-e70b43b42795", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([False, False, False, False, False, False, False, True, True])" + ] + }, + "execution_count": 90, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(temp4 > 70) & (temp4 % 2 == 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "id": "5a52756e-e275-4750-b559-708bbe8fc045", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([80, 74])" + ] + }, + "execution_count": 91, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4[(temp4 > 70) & (temp4 % 2 == 0)]" + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "id": "51be708e-afc5-47b2-9392-6c53738eb7d1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([42, 40, 75, 99, 44, 80, 74])" + ] + }, + "execution_count": 92, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4[(temp4 > 70) | (temp4 % 2 == 0)]" + ] + }, + { + "cell_type": "code", + "execution_count": 93, + "id": "93bd6975-bf01-4801-81e7-fc0bf7b21285", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ True, True, True, False, False],\n", + " [ True, True, False, True, False],\n", + " [False, True, False, False, False],\n", + " [ True, False, False, False, True]])" + ] + }, + "execution_count": 93, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5 > 70" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "id": "df324e6a-85d1-40de-acce-bfbb35e8a4cf", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([95, 91, 74, 90, 74, 87, 85, 86, 99])" + ] + }, + "execution_count": 94, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[temp5 > 70]" + ] + }, + { + "cell_type": "code", + "execution_count": 95, + "id": "6388f55b-40af-4cd1-abbc-29da10641580", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([74, 90, 74, 86])" + ] + }, + "execution_count": 95, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[(temp5 > 70) & (temp5 % 2 == 0)]" + ] + }, + { + "cell_type": "code", + "execution_count": 96, + "id": "e30c750e-5517-4314-b6d1-4fc28c5a454b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([42, 49, 40, 75, 55, 99, 44, 80, 74])" + ] + }, + "execution_count": 96, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4" + ] + }, + { + "cell_type": "code", + "execution_count": 97, + "id": "6e294ace-b236-4554-b34c-6f2b00bd295f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([40, 75, 55, 99, 44])" + ] + }, + "execution_count": 97, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 切片索引 - slice\n", + "temp4[2:7]" + ] + }, + { + "cell_type": "code", + "execution_count": 98, + "id": "1519ca3a-9844-4b20-a00f-96b61780d998", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([40, 55, 44])" + ] + }, + "execution_count": 98, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 切片索引 - slice\n", + "temp4[2:7:2]" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "id": "9b96d847-5aec-4e7f-b6a7-48f4deecf454", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([44, 99, 55, 75, 40])" + ] + }, + "execution_count": 99, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp4[6:1:-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 100, + "id": "5dbcfa29-9930-471e-81d3-100f22e6293d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[95, 91, 74, 23, 37],\n", + " [90, 74, 38, 87, 24],\n", + " [ 9, 85, 23, 33, 36],\n", + " [86, 55, 57, 12, 99]])" + ] + }, + "execution_count": 100, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5" + ] + }, + { + "cell_type": "code", + "execution_count": 101, + "id": "a1307d26-d1a3-4201-9ffa-314a81300712", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[74, 38, 87],\n", + " [85, 23, 33]])" + ] + }, + "execution_count": 101, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[1:3, 1:4]" + ] + }, + { + "cell_type": "code", + "execution_count": 102, + "id": "e95ddb0d-ee31-4690-a91b-a0f0137ce07a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[33, 36],\n", + " [12, 99]])" + ] + }, + "execution_count": 102, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[2:, 3:]" + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "id": "d69e9936-049a-4a66-af22-bb7feff1b9e7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[23, 33],\n", + " [57, 12]])" + ] + }, + "execution_count": 103, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[2:, 2:4]" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "id": "befe1ce3-4742-4fd9-97de-3d9608ccf4c1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[95, 91, 74],\n", + " [90, 74, 38],\n", + " [ 9, 85, 23]])" + ] + }, + "execution_count": 104, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[:3, :3]" + ] + }, + { + "cell_type": "code", + "execution_count": 105, + "id": "ba3fcdcd-f999-41d4-9e96-858dc0f3d70d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[95, 91, 74],\n", + " [90, 74, 38],\n", + " [ 9, 85, 23],\n", + " [86, 55, 57]])" + ] + }, + "execution_count": 105, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp5[:, :3]" + ] + }, + { + "cell_type": "code", + "execution_count": 106, + "id": "753b5012-78fd-4a21-997e-132c5a15f636", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgAAAABACAYAAABsv8+/AAAAE3RFWHRUaXRsZQBncmF5IGNvbG9ybWFw9iBr6wAAABl0RVh0RGVzY3JpcHRpb24AZ3JheSBjb2xvcm1hcH2S+3MAAAAwdEVYdEF1dGhvcgBNYXRwbG90bGliIHYzLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZ2GZxVMAAAAydEVYdFNvZnR3YXJlAE1hdHBsb3RsaWIgdjMuOS4yLCBodHRwczovL21hdHBsb3RsaWIub3JnTz9adAAAAUBJREFUeJzt1rENQjEQBcFn998zRBRA8CXEziQO7POle7a9tu2cs/L5ce9N3n/7zvx/zj/9n/322/87++8AgBwBAABBAgAAggQAAAQJAAAIEgAAECQAACBIAABAkAAAgCABAABBAgAAggQAAAQJAAAIEgAAECQAACBIAABAkAAAgCABAABBAgAAggQAAAQJAAAIEgAAECQAACBIAABAkAAAgCABAABBAgAAggQAAAQJAAAIEgAAECQAACBIAABAkAAAgCABAABBAgAAggQAAAQJAAAIEgAAECQAACBIAABAkAAAgCABAABBAgAAggQAAAQJAAAIEgAAECQAACBIAABAkAAAgCABAABBAgAAggQAAAQJAAAIEgAAECQAACBIAABAkAAAgCABAABBAgAAggQAAAQJAAAIEgAAECQAACDoDY2LBHzusuGnAAAAAElFTkSuQmCC", + "text/html": [ + "
gray
\"gray
under
bad
over
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 106, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "plt.get_cmap('gray')" + ] + }, + { + "cell_type": "code", + "execution_count": 107, + "id": "2de0bf4c-61f4-48d9-a12b-9accbb883962", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[False, False, False, ..., False, False, False],\n", + " [False, False, False, ..., False, False, False],\n", + " [False, False, False, ..., False, False, False],\n", + " ...,\n", + " [ True, True, True, ..., False, False, False],\n", + " [ True, True, True, ..., False, False, False],\n", + " [ True, True, True, ..., False, False, False]])" + ] + }, + "execution_count": 107, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.mean(guido_image, axis=2) >= 128" + ] + }, + { + "cell_type": "code", + "execution_count": 138, + "id": "780b60d2-c634-4294-baea-00e696143942", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " 2024-09-19T22:49:50.432942\n", + " image/svg+xml\n", + " \n", + " \n", + " Matplotlib v3.9.2, https://matplotlib.org/\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# 创建画布\n", + "plt.figure(figsize=(15, 9))\n", + "\n", + "# 原图\n", + "# 创建坐标系\n", + "plt.subplot(2, 4, 1)\n", + "plt.imshow(guido_image)\n", + "# 垂直翻转\n", + "plt.subplot(2, 4, 2)\n", + "plt.imshow(guido_image[::-1])\n", + "# 水平翻转\n", + "plt.subplot(2, 4, 3)\n", + "plt.imshow(guido_image[:, ::-1])\n", + "# 抠图\n", + "plt.subplot(2, 4, 4)\n", + "plt.imshow(guido_image[30:350, 80:310])\n", + "# 降采样\n", + "plt.subplot(2, 4, 5)\n", + "plt.imshow(guido_image[::10, ::10])\n", + "# 反色\n", + "plt.subplot(2, 4, 6)\n", + "plt.imshow(guido_image[:, :, ::-1])\n", + "# 灰度图\n", + "plt.subplot(2, 4, 7)\n", + "plt.imshow(guido_image[:, :, 0], cmap=plt.cm.gray)\n", + "# 二值化\n", + "plt.subplot(2, 4, 8)\n", + "plt.imshow(np.mean(guido_image, axis=2) >= 128, cmap='gray')\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 154, + "id": "abe85cca-abf4-4805-9e69-b2971021e741", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 154, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " 2024-09-19T22:54:56.111360\n", + " image/svg+xml\n", + " \n", + " \n", + " Matplotlib v3.9.2, https://matplotlib.org/\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# 局部马赛克效果\n", + "guido_image_copy = guido_image.copy()\n", + "\n", + "n = 12\n", + "\n", + "for i in range(120, 350, n):\n", + " for j in range(120, 310, n):\n", + " color = guido_image_copy[i, j]\n", + " guido_image_copy[i: i + n, j: j + n] = color\n", + "\n", + "plt.imshow(guido_image_copy)" + ] + }, + { + "cell_type": "code", + "execution_count": 110, + "id": "0e1f177e-db9d-4585-8c15-2cb5a928ccd4", + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install pillow" + ] + }, + { + "cell_type": "code", + "execution_count": 111, + "id": "0143bdbe-afde-4741-8d11-2bcbe34de477", + "metadata": {}, + "outputs": [], + "source": [ + "# from PIL import Image\n", + "\n", + "# 灰度图\n", + "# Image.fromarray(guido_image[:, :, 0]).show()" + ] + }, + { + "cell_type": "code", + "execution_count": 112, + "id": "487ec2f5-97bc-413f-9a06-c8f6875ebab8", + "metadata": {}, + "outputs": [], + "source": [ + "# from PIL import ImageFilter\n", + "\n", + "# 滤镜效果\n", + "# Image.fromarray(guido_image).filter(ImageFilter.CONTOUR).show()" + ] + }, + { + "cell_type": "code", + "execution_count": 113, + "id": "15aa7ec2-8bff-45fb-89b8-19790838aff3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(750, 500, 3)" + ] + }, + "execution_count": 113, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "obama_image = plt.imread('res/obama.jpg')\n", + "obama_image.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 114, + "id": "e1df160a-3e97-4594-a0ba-43bbd3c384ae", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 114, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " 2024-09-19T22:45:18.229325\n", + " image/svg+xml\n", + " \n", + " \n", + " Matplotlib v3.9.2, https://matplotlib.org/\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.imshow(obama_image)" + ] + }, + { + "cell_type": "code", + "execution_count": 115, + "id": "6c57d787-8a35-4a56-8b10-ceaa08b49612", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(750, 500, 3)" + ] + }, + "execution_count": 115, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp6 = (guido_image * 0.6 + obama_image * 0.4).astype('u1')\n", + "temp6.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 116, + "id": "54ece4ed-4346-458a-8b57-58a9930c6dce", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 116, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " 2024-09-19T22:45:18.326714\n", + " image/svg+xml\n", + " \n", + " \n", + " Matplotlib v3.9.2, https://matplotlib.org/\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.imshow(temp6)" + ] + }, + { + "cell_type": "code", + "execution_count": 117, + "id": "a56d7ce5-ec59-4d35-bbfd-52b33069c6f5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 117, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " 2024-09-19T22:45:18.419184\n", + " image/svg+xml\n", + " \n", + " \n", + " Matplotlib v3.9.2, https://matplotlib.org/\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "temp7 = np.random.randint(0, 256, (16, 16, 3))\n", + "plt.imshow(temp7)" + ] + }, + { + "cell_type": "markdown", + "id": "6649825a-9c1e-4190-adc4-a98c8c97a553", + "metadata": {}, + "source": [ + "### 数组对象的方法\n", + "\n", + "1. 获取描述性统计信息\n", + " - `sum`\n", + " - `cumsum` / `cumprod`\n", + " - `mean`\n", + " - `np.median`\n", + " - `stats.mode`\n", + " - `max`\n", + " - `min`\n", + " - `ptp`\n", + " - `np.quantile` / `stats.iqr`\n", + " - `var`\n", + " - `std`\n", + " - `stats.variation`\n", + " - `stats.skew`\n", + " - `stats.kurtosis`\n", + "2. 其他相关方法\n", + " - `round`\n", + " - `argmax` / `argmin`\n", + " - `nonzero`\n", + " - `copy` / `view`\n", + " - `astype`\n", + " - `clip`\n", + " - `reshape` / `resize`\n", + " - `dump` / `np.load`\n", + " - `tofile`\n", + " - `fill`\n", + " - `flatten` / `ravel`\n", + " - `sort` / `argsort`\n", + " - `swapaxes` / `transpose`\n", + " - `tolist`" + ] + }, + { + "cell_type": "code", + "execution_count": 118, + "id": "22e444ec-9b9e-4807-986a-1b3b229d87af", + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install -U scipy" + ] + }, + { + "cell_type": "code", + "execution_count": 119, + "id": "4d7745c2-30fa-4a15-bae5-39b9597c1462", + "metadata": {}, + "outputs": [], + "source": [ + "from scipy import stats" + ] + }, + { + "cell_type": "code", + "execution_count": 120, + "id": "d8b8fbbc-43d5-467b-a32a-116639baedac", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([76, 81, 85, 79, 83, 82, 91, 80, 87, 86, 70, 82, 84, 77, 83, 85, 76,\n", + " 74, 80, 80, 82, 76, 68, 77, 80, 78, 77, 73, 81, 76, 85, 81, 84, 85,\n", + " 74, 84, 70, 76, 78, 80, 86, 75, 94, 79, 84, 78, 72, 86, 74, 68])" + ] + }, + "execution_count": 120, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scores1 = np.fromstring(\n", + " '76, 81, 85, 79, 83, 82, 91, 80, 87, 86, '\n", + " '70, 82, 84, 77, 83, 85, 76, 74, 80, 80, '\n", + " '82, 76, 68, 77, 80, 78, 77, 73, 81, 76, '\n", + " '85, 81, 84, 85, 74, 84, 70, 76, 78, 80, '\n", + " '86, 75, 94, 79, 84, 78, 72, 86, 74, 68', \n", + " sep=',',\n", + " dtype='i8'\n", + ")\n", + "scores1" + ] + }, + { + "cell_type": "code", + "execution_count": 121, + "id": "7bb13bb7-ba31-459c-85ce-ef0dc85abd96", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(3982)" + ] + }, + "execution_count": 121, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 求和\n", + "scores1.sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 122, + "id": "f7d4eb31-33f0-436e-a53b-98276d22ddef", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(3982)" + ] + }, + "execution_count": 122, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.sum(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": 123, + "id": "b450c23a-26b8-4766-b15a-5470efb8e37a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 76, 157, 242, 321, 404, 486, 577, 657, 744, 830, 900,\n", + " 982, 1066, 1143, 1226, 1311, 1387, 1461, 1541, 1621, 1703, 1779,\n", + " 1847, 1924, 2004, 2082, 2159, 2232, 2313, 2389, 2474, 2555, 2639,\n", + " 2724, 2798, 2882, 2952, 3028, 3106, 3186, 3272, 3347, 3441, 3520,\n", + " 3604, 3682, 3754, 3840, 3914, 3982])" + ] + }, + "execution_count": 123, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 累积和 - cumulative sum\n", + "scores1.cumsum()" + ] + }, + { + "cell_type": "code", + "execution_count": 124, + "id": "882ff41a-4d0f-4a15-adf3-272acc398bda", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 76, 157, 242, 321, 404, 486, 577, 657, 744, 830, 900,\n", + " 982, 1066, 1143, 1226, 1311, 1387, 1461, 1541, 1621, 1703, 1779,\n", + " 1847, 1924, 2004, 2082, 2159, 2232, 2313, 2389, 2474, 2555, 2639,\n", + " 2724, 2798, 2882, 2952, 3028, 3106, 3186, 3272, 3347, 3441, 3520,\n", + " 3604, 3682, 3754, 3840, 3914, 3982])" + ] + }, + "execution_count": 124, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.cumsum(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "id": "5e43fa2e-c30e-40c3-8159-89819e5e368f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.float64(79.64)" + ] + }, + "execution_count": 125, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 算术平均\n", + "scores1.mean()" + ] + }, + { + "cell_type": "code", + "execution_count": 126, + "id": "2f59ffef-619d-4a03-a496-3972d13ee33e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.float64(79.64)" + ] + }, + "execution_count": 126, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.mean(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": 127, + "id": "3bfdf16c-894e-47e8-925c-604418cc0eb2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.float64(79.44812732667022)" + ] + }, + "execution_count": 127, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 几何平均\n", + "stats.gmean(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": 128, + "id": "d73ce1b5-3956-4c12-9237-396ffdec44fd", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.float64(79.25499854665681)" + ] + }, + "execution_count": 128, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 调和平均\n", + "stats.hmean(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": 129, + "id": "6d165492-6d0c-4f74-a2f1-75c95ca13b2d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.float64(79.58695652173913)" + ] + }, + "execution_count": 129, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 去尾平均\n", + "stats.tmean(scores1, [70, 90])" + ] + }, + { + "cell_type": "code", + "execution_count": 130, + "id": "99fbc4f2-220c-4ea0-9d24-8640c66c0115", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.float64(79.58695652173913)" + ] + }, + "execution_count": 130, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.mean(scores1[(scores1 >= 70) & (scores1 <= 90)])" + ] + }, + { + "cell_type": "code", + "execution_count": 131, + "id": "8b536052-4a67-43e5-9e0a-6005bd73a66c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.float64(80.0)" + ] + }, + "execution_count": 131, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 中位数\n", + "np.median(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": 132, + "id": "390f8e61-a108-497d-b3cf-54bd25ae3a0e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(np.int64(76), np.int64(5))" + ] + }, + "execution_count": 132, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 众数\n", + "result = stats.mode(scores1)\n", + "result.mode, result.count" + ] + }, + { + "cell_type": "code", + "execution_count": 133, + "id": "bc729130-fdfa-47be-8a5c-ce7a474ab6ea", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(94)" + ] + }, + "execution_count": 133, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 最大值\n", + "scores1.max()" + ] + }, + { + "cell_type": "code", + "execution_count": 134, + "id": "f02f74e8-e27f-4169-8fda-2ebe4504f0d4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(94)" + ] + }, + "execution_count": 134, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.amax(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": 135, + "id": "0a32b324-df80-43c8-b8b0-12b36218ef2e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(68)" + ] + }, + "execution_count": 135, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 最小值\n", + "scores1.min()" + ] + }, + { + "cell_type": "code", + "execution_count": 136, + "id": "95680bd0-075a-4832-b7b9-76cea6653e2b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(68)" + ] + }, + "execution_count": 136, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.amin(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00d3a61a-d22b-4ee3-9327-8197abee91fc", + "metadata": {}, + "outputs": [], + "source": [ + "# 全距(极差)\n", + "np.ptp(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6fd57ed-7bb3-4593-90ce-2a79a9ea8d2e", + "metadata": {}, + "outputs": [], + "source": [ + "# 四分位距离\n", + "q1, q3 = np.quantile(scores1, [0.25, 0.75])\n", + "q3 - q1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3668ed4b-4205-4f6a-9a9d-6135a238515a", + "metadata": {}, + "outputs": [], + "source": [ + "# inter-quartile range\n", + "stats.iqr(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3a3c5ed8-0331-4c94-8cac-6b947c7765fd", + "metadata": {}, + "outputs": [], + "source": [ + "# 总体方差\n", + "scores1.var()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2cdb1d1-179a-425f-930f-2ef7dda46a79", + "metadata": {}, + "outputs": [], + "source": [ + "np.var(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "493d242d-fb94-4ed9-b316-7f722f908d99", + "metadata": {}, + "outputs": [], + "source": [ + "# 样本方差\n", + "scores1.var(ddof=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b70dac6d-24f1-473e-bb27-200384202db0", + "metadata": {}, + "outputs": [], + "source": [ + "np.var(scores1, ddof=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0670037e-0ae2-461f-b01a-0f47ed036813", + "metadata": {}, + "outputs": [], + "source": [ + "# 总体标准差\n", + "np.std(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22e3361d-3bbf-4175-94e1-719ca022e5dd", + "metadata": {}, + "outputs": [], + "source": [ + "# 样本标准差\n", + "np.std(scores1, ddof=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e9b8db0c-6e78-4efb-a494-62977ce1e4e7", + "metadata": {}, + "outputs": [], + "source": [ + "# 变异系数\n", + "stats.variation(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "029b2ccc-0dbb-460c-b84e-ea2b34cca3fb", + "metadata": {}, + "outputs": [], + "source": [ + "# 偏态系数\n", + "stats.skew(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c35b17a1-5abc-4444-acbf-d757de430dcd", + "metadata": {}, + "outputs": [], + "source": [ + "# 峰度系数\n", + "stats.kurtosis(scores1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9fba00d6-866e-4ab1-9caf-18b17fe08440", + "metadata": {}, + "outputs": [], + "source": [ + "# 箱线图\n", + "plt.boxplot(scores1, showmeans=True, whis=1.5)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1cd46156-6a31-47c1-b3c6-4db92d91ac8c", + "metadata": {}, + "outputs": [], + "source": [ + "# 直方图\n", + "plt.hist(scores1, bins=6)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3c5340e8-ed88-432f-8656-f987c0972b80", + "metadata": {}, + "outputs": [], + "source": [ + "# 设置随机数的种子\n", + "np.random.seed(12)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2102845-986b-4e22-9a73-0964af537386", + "metadata": {}, + "outputs": [], + "source": [ + "scores2 = np.random.randint(60, 101, (10, 3))\n", + "scores2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e9f4921-6001-45e6-bfe0-aca2fe2aa03e", + "metadata": {}, + "outputs": [], + "source": [ + "scores2.mean()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e116ac94-7cc2-48bc-b563-2e3ccd262378", + "metadata": {}, + "outputs": [], + "source": [ + "scores2.mean(axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b7fc404-06cf-4c3e-9ef2-b67bd1d3e517", + "metadata": {}, + "outputs": [], + "source": [ + "scores2.mean(axis=1).round(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3cf52cc-48ca-49d5-bbe7-6d977e8e8343", + "metadata": {}, + "outputs": [], + "source": [ + "# axis=0 - 默认值 - 沿着0轴计算\n", + "stats.describe(scores2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c8b60634-56cc-489e-b898-792925aa79a9", + "metadata": {}, + "outputs": [], + "source": [ + "# axis=None - 不沿着任何一个轴计算\n", + "stats.describe(scores2, axis=None)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c0ba4905-ae46-40d6-a6a3-b97f830fbee5", + "metadata": {}, + "outputs": [], + "source": [ + "# axis=1 - 沿着1轴计算\n", + "result = stats.describe(scores2, axis=1)\n", + "result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9adade58-2bf1-4c46-be0d-0ef1682382f4", + "metadata": {}, + "outputs": [], + "source": [ + "result.mean.round(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24bca7b9-af35-4943-94f9-90ce5851d136", + "metadata": {}, + "outputs": [], + "source": [ + "result.variance.round(2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e90cd926-9f9d-4701-8f70-dc0f13ad24d9", + "metadata": {}, + "outputs": [], + "source": [ + "plt.boxplot(scores2, showmeans=True)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b58cd75-8df1-4593-a4cd-19c32f821f63", + "metadata": {}, + "outputs": [], + "source": [ + "np.random.seed(14)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6d792920-4ec1-4817-9e5a-b1d975dcb9a7", + "metadata": {}, + "outputs": [], + "source": [ + "temp8 = np.random.random(10)\n", + "temp8" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0385af53-d682-42a7-84da-a16b4353a6ad", + "metadata": {}, + "outputs": [], + "source": [ + "# 四舍五入\n", + "temp9 = temp8.round(1)\n", + "temp9" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7864d43f-b4ce-41d6-b8e3-cd646a12a0c6", + "metadata": {}, + "outputs": [], + "source": [ + "# 最大值的索引\n", + "temp8.argmax()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "336bc616-896f-4186-ad4d-5f6baa048564", + "metadata": {}, + "outputs": [], + "source": [ + "# 最小值的索引\n", + "temp8.argmin()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29cab4c1-1e56-4b19-a3b2-aeb30be03a88", + "metadata": {}, + "outputs": [], + "source": [ + "# 调整数组的形状\n", + "temp10 = temp8.reshape((5, 2))\n", + "# temp10 = temp8.reshape((5, 2)).copy()\n", + "temp10" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3b8e2fe8-b917-4d40-ad48-d3935a1ab262", + "metadata": {}, + "outputs": [], + "source": [ + "temp10.base" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e266d969-8bd5-4572-9544-2569fd718156", + "metadata": {}, + "outputs": [], + "source": [ + "temp10.flags" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b7e5389-cdc3-4995-99f4-dc1386a5e291", + "metadata": {}, + "outputs": [], + "source": [ + "temp10.base is temp8" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f6f2561-5392-4887-b0d2-3f639fb8be43", + "metadata": {}, + "outputs": [], + "source": [ + "temp10[2, 1] = 0.999999\n", + "temp10" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b739b914-57fd-4c3f-a106-a359e79391f9", + "metadata": {}, + "outputs": [], + "source": [ + "temp8" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eb79055b-d7d6-4602-b960-890d33928e04", + "metadata": {}, + "outputs": [], + "source": [ + "temp8[3] = 0.0001\n", + "temp8" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1f2932d9-45d3-427c-acef-a8ecc468747d", + "metadata": {}, + "outputs": [], + "source": [ + "temp10" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "089a7d87-ccbc-47c8-a934-15a2eb298242", + "metadata": {}, + "outputs": [], + "source": [ + "# 调整数组大小\n", + "temp8.resize((3, 5), refcheck=False)\n", + "temp8.round(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6af25425-046a-4f19-a5a5-235d6fd17753", + "metadata": {}, + "outputs": [], + "source": [ + "temp11 = np.resize(temp8, (4, 5)).round(1)\n", + "temp11" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "260a3d51-8041-4515-a611-971267a0be8c", + "metadata": {}, + "outputs": [], + "source": [ + "# 非零元素的索引\n", + "temp9.nonzero()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7ab48494-d866-49ec-a214-586f1fa8beb2", + "metadata": {}, + "outputs": [], + "source": [ + "# 类型转换\n", + "temp12 = np.random.randint(-100, 101, 10)\n", + "temp12" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69b3c1e2-cca1-45d5-9d0c-2ab51ab66014", + "metadata": {}, + "outputs": [], + "source": [ + "temp12.astype(np.float64)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3a7c4ce2-2838-4537-b70a-10f922f2806a", + "metadata": {}, + "outputs": [], + "source": [ + "temp12.astype('f8')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f15abc61-401e-4198-84ab-4aa086cfdf75", + "metadata": {}, + "outputs": [], + "source": [ + "temp12.astype('i1')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce590c02-4eab-4d4c-830c-07e0e887a771", + "metadata": {}, + "outputs": [], + "source": [ + "temp13 = temp12.astype('u1')\n", + "temp13" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c6b04d0-21e1-4be3-a927-095381341f57", + "metadata": {}, + "outputs": [], + "source": [ + "temp13.flags" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "711a115b-7a7d-48c1-b9de-b51673f89ab9", + "metadata": {}, + "outputs": [], + "source": [ + "temp12.astype('U')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "831f8c9c-f949-442c-ac51-66b3bfc59e1b", + "metadata": {}, + "outputs": [], + "source": [ + "# 修剪\n", + "temp9.clip(min=0.3, max=0.7)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d40724a4-a7fa-42c1-9ecc-3207579e7d7a", + "metadata": {}, + "outputs": [], + "source": [ + "# 将数组持久化到(文本)文件\n", + "temp11.tofile('temp11.txt', sep=',')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4246bae6-ab59-4ee0-ae8d-7d2cc3f60d97", + "metadata": {}, + "outputs": [], + "source": [ + "temp13 = np.fromfile('temp11.txt', sep=',').reshape(4, 5)\n", + "temp13" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a9d38dba-acb8-4745-be3f-76e7b2605c09", + "metadata": {}, + "outputs": [], + "source": [ + "# 将数组持久化到(二进制)文件\n", + "temp11.dump('temp11')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "db38d63c-e279-4002-bf02-fed4f615319a", + "metadata": {}, + "outputs": [], + "source": [ + "# 从二进制文件(pickle序列化)中加载数组\n", + "temp14 = np.load('temp11', allow_pickle=True)\n", + "temp14" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3fc26c0e-b518-4b23-a019-9e8e11e363e2", + "metadata": {}, + "outputs": [], + "source": [ + "temp15 = np.random.randint(1, 100, (2, 3, 4))\n", + "temp15" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1884ac0d-65f9-410d-987f-7235ccd54d0f", + "metadata": {}, + "outputs": [], + "source": [ + "# 扁平化\n", + "temp16 = temp15.flatten()\n", + "temp16" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "369c19c8-ef7c-4740-b8ba-cf47f63f9fd7", + "metadata": {}, + "outputs": [], + "source": [ + "# 扁平化\n", + "temp17 = temp15.ravel()\n", + "temp17" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d7411df-a939-41f6-bee0-213b555ed666", + "metadata": {}, + "outputs": [], + "source": [ + "temp16.base is temp15" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "980f80da-d3d6-45c9-86d0-733e0098e8f0", + "metadata": {}, + "outputs": [], + "source": [ + "temp16.flags" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c4db6df5-e0c4-4f7b-bc0a-50655c036e77", + "metadata": {}, + "outputs": [], + "source": [ + "temp17.base is temp15" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7339bcf2-a29f-4cf6-ac5f-8a81eb655698", + "metadata": {}, + "outputs": [], + "source": [ + "temp17.flags" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "06d8dee5-528e-4345-b5a7-ddc0413cda09", + "metadata": {}, + "outputs": [], + "source": [ + "temp16[0] = 999\n", + "temp16" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2cfa3191-5441-4de7-a300-6bd8aa11cb30", + "metadata": {}, + "outputs": [], + "source": [ + "temp15" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a4ce0ee-2971-48b3-8834-9113464cf9af", + "metadata": {}, + "outputs": [], + "source": [ + "temp17[0] = 88\n", + "temp17" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7dd7fda5-d25e-4a9a-9202-9a5140fc3439", + "metadata": {}, + "outputs": [], + "source": [ + "temp15" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fe00bb4c-1e19-4988-8020-f2f4162840f2", + "metadata": {}, + "outputs": [], + "source": [ + "# 排序 - 返回排序后的新数组\n", + "np.sort(temp16)[::-1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8fe37ba0-9bce-4bb1-bddc-3100a755439d", + "metadata": {}, + "outputs": [], + "source": [ + "# 排序 - 就地排序\n", + "temp16.sort()\n", + "temp16" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "afef80f1-5f26-46d3-9589-7eea82b224cc", + "metadata": {}, + "outputs": [], + "source": [ + "temp18 = np.random.randint(1, 100, 10)\n", + "temp18" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43f1d0f6-2a74-4512-a6ee-d285de2ee2aa", + "metadata": {}, + "outputs": [], + "source": [ + "# 给出索引的顺序 - 花式索引\n", + "temp18[temp18.argsort()]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23fc033c-a649-4da8-a0f8-1c7739e3996c", + "metadata": {}, + "outputs": [], + "source": [ + "# 转置\n", + "temp11.transpose()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3ac95b32-c9ce-404b-a26e-1e8a2351a44b", + "metadata": {}, + "outputs": [], + "source": [ + "temp11.T" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "796f513d-211f-4915-969c-1850d4f569a2", + "metadata": {}, + "outputs": [], + "source": [ + "# 交换轴\n", + "temp11.swapaxes(0, 1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1d9d0207-45d9-4ec0-bcef-9e7072501241", + "metadata": {}, + "outputs": [], + "source": [ + "temp15" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0fd160f1-710f-4591-853a-d2e960493bc4", + "metadata": {}, + "outputs": [], + "source": [ + "temp15.swapaxes(0, 1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "77569990-b8f2-4ff1-b1f6-d17a942f1a32", + "metadata": {}, + "outputs": [], + "source": [ + "temp15.swapaxes(1, 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5970d5d4-f955-4af8-8c54-46a7fff9d0f9", + "metadata": {}, + "outputs": [], + "source": [ + "# 将数组处理成列表\n", + "list1 = temp16.tolist()\n", + "print(list1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2c3e62e2-aaef-4250-a561-0f28dd51bf79", + "metadata": {}, + "outputs": [], + "source": [ + "list2 = temp11.tolist()\n", + "print(list2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37a0a054-34e3-4b2a-a07d-72a9599f012f", + "metadata": {}, + "outputs": [], + "source": [ + "list3 = temp15.tolist()\n", + "print(list3)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Day66-80/code/day03.ipynb b/Day66-80/code/day03.ipynb new file mode 100644 index 000000000..8f0cb77b4 --- /dev/null +++ b/Day66-80/code/day03.ipynb @@ -0,0 +1,3385 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6fc07f67-318b-4d79-8d4e-4eb8a2c61be2", + "metadata": {}, + "source": [ + "## NumPy进阶" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a9d74703-47d5-44f4-8566-eb7d5476c792", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n", + "plt.rcParams['axes.unicode_minus'] = False" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d139c565-6bf2-4bf6-9d66-d2755b29d1db", + "metadata": {}, + "outputs": [], + "source": [ + "%config InlineBackend.figure_format = 'svg'\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "id": "d41a57e5-6009-455b-aff9-4f96682423fc", + "metadata": {}, + "source": [ + "### NumPy中的函数\n", + "\n", + "#### 通用一元函数" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5f881886-8aca-40cb-a9f3-4514e28b8fe3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1., 2., 3., inf, nan, -inf, nan, 5.])" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# inf - infinity\n", + "# nan - not a number\n", + "array1 = np.array([1, 2, 3, np.inf, np.nan, -np.inf, np.nan, 5])\n", + "array1" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "b6e891cc-035c-4e98-9406-1e78e3623e76", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dtype('float64')" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array1.dtype" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "674995a2-e50a-45a6-b1ad-7a9f88331dd0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([False, False, False, False, True, False, True, False])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.isnan(array1)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "358641bc-510c-4f1b-9df7-5dd54de47978", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1., 2., 3., inf, -inf, 5.])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array1[~np.isnan(array1)]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "20030d7d-822e-45c7-b962-3aaf706e133c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ True, True, True, False, False, False, False, True])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.isfinite(array1)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "357cc22a-7acf-46ee-9523-631134dc8eae", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1., 2., 3., 5.])" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array1[np.isfinite(array1)]" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "c38f23a4-7d72-4ce4-9bc4-9f8cd9433fa8", + "metadata": {}, + "outputs": [], + "source": [ + "x = np.linspace(0.5, 10, 72)\n", + "y1 = np.sin(x)\n", + "y2 = np.log2(x)\n", + "y3 = np.sqrt(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "08a272bc-8765-455b-b0d0-700872071cf4", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " 2024-09-22T23:37:54.080072\n", + " image/svg+xml\n", + " \n", + " \n", + " Matplotlib v3.9.2, https://matplotlib.org/\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# 定制画布\n", + "plt.figure(figsize=(8, 4))\n", + "# 绘制折线图\n", + "plt.plot(x, y1, marker='.', label='$y=sin(x)$')\n", + "plt.plot(x, y2, label='$y=log_{2}x$', linewidth=3, color='#9c9c9c')\n", + "plt.plot(x, y3, label='$y=\\sqrt{x}$', linestyle='-.', linewidth=0.5)\n", + "# 显示图例\n", + "plt.legend(loc='center right')\n", + "# 显示图表\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "0b292f56-dab7-469e-89ed-0fb2114902aa", + "metadata": {}, + "source": [ + "#### 通用二元函数" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "8b67932a-481e-4e2d-9d83-00994a01d959", + "metadata": {}, + "outputs": [], + "source": [ + "array2 = np.array([0.1 + 0.2, 0.1 + 0.2 + 0.3])\n", + "array3 = np.array([0.3, 0.6])" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "23581a64-7b02-4f3f-8a5f-ea49ec20e48a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([False, False])" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "array2 == array3" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "ddcb612c-c7aa-44c6-b8d3-1ee123fba534", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "np.False_" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.all(array2 == array3)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "40d454ad-8c60-4132-8dde-10726180e552", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 比较两个数组元素是否(几乎)完全相等 - 有误差容忍度\n", + "np.allclose(array2, array3)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "f1aab287-9d50-4b32-94fa-26d7f183fde3", + "metadata": {}, + "outputs": [], + "source": [ + "array4 = np.array([1, 2, 3, 4, 5, 6])\n", + "array5 = np.array([2, 4, 6, 8, 10])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "ea25f2ad-e007-486b-a402-eb3436c9346c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([2, 4, 6])" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 交集\n", + "np.intersect1d(array4, array5)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "e4d2116c-c895-4597-95dc-fda67a1c99a8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1, 2, 3, 4, 5, 6, 8, 10])" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 并集\n", + "np.union1d(array4, array5)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "5348c4f2-4222-4904-bd07-a3c85e62e4c7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 3, 5])" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 差集\n", + "np.setdiff1d(array4, array5)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "9cd3f3e5-a986-469f-97ba-c739aa4b8577", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1, 3, 5, 8, 10])" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 对称差\n", + "np.setxor1d(array4, array5)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "78eb3a98-5992-452e-ab13-eaf578cab7a0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([False, True, False, True, False, True])" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 成员运算\n", + "# np.in1d(array4, array5)\n", + "np.isin(array4, array5)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "c7bbc624-dfdc-41d9-bfa2-7a4f3a387bce", + "metadata": {}, + "outputs": [], + "source": [ + "# 杰卡德相似度\n", + "user_a = np.array(['平板电脑', '尿不湿', '手机', '键盘', '手机支架', '奶瓶', '婴儿辅食', '基围虾', '巴沙鱼', '生抽', '沙拉酱'])\n", + "user_b = np.array(['平板电脑', '键盘', '充电宝', '补光灯', '生抽', '散热器', '笔记本电脑', '双肩包', '登山杖', '露营帐篷', '睡袋'])\n", + "user_c = np.array(['沐浴露', '维C泡腾片', '牛奶', '尿不湿', '平板电脑', '奶瓶', '婴儿辅食', '手机', '磨牙棒', '生抽', '基围虾'])" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "c4132979-1ef5-4e2b-93a4-2d6bfc66f38b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.15789473684210525" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.intersect1d(user_a, user_b).size / np.union1d(user_a, user_b).size" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "46dda506-908b-405a-8e9b-c14090da05b7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.4666666666666667" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.intersect1d(user_a, user_c).size / np.union1d(user_a, user_c).size" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fb5435f8-b8d4-4b37-88fb-13149a62660e", + "metadata": {}, + "outputs": [], + "source": [ + "np.setdiff1d(user_a, user_c)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8366b08-779d-4369-9f27-a9b4b9125782", + "metadata": {}, + "outputs": [], + "source": [ + "np.setdiff1d(user_c, user_a)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b1ce07b5-ec70-4512-814a-e210148ed205", + "metadata": {}, + "outputs": [], + "source": [ + "# 余弦相似度\n", + "user = np.array([5, 1, 3])\n", + "mov1 = np.array([4, 5, 1])\n", + "mov2 = np.array([5, 1, 5])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ab70c0f-cfe7-4e10-b162-0feefb36f884", + "metadata": {}, + "outputs": [], + "source": [ + "# linear algebra\n", + "# np.dot - 点积\n", + "# np.linalg.norm - 模长\n", + "np.dot(user, mov1) / (np.linalg.norm(user) * np.linalg.norm(mov1))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21a1caaf-2cf4-4cf8-836c-d244f4133098", + "metadata": {}, + "outputs": [], + "source": [ + "# np.arcos - 反余弦函数 - 弧度\n", + "# np.degrees - 弧度换算角度\n", + "np.degrees(np.arccos(np.dot(user, mov1) / (np.linalg.norm(user) * np.linalg.norm(mov1))))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b90ecb1-67d0-48be-84c5-54bf212f1292", + "metadata": {}, + "outputs": [], + "source": [ + "np.degrees(np.arccos(np.dot(user, mov2) / (np.linalg.norm(user) * np.linalg.norm(mov2))))" + ] + }, + { + "cell_type": "markdown", + "id": "c81c6238-f28c-44e8-ac54-94a69f5a6c4a", + "metadata": {}, + "source": [ + "#### 其他常用函数" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1df2d086-58cd-4324-b2cf-98d8d971a4d7", + "metadata": {}, + "outputs": [], + "source": [ + "array6 = np.array([1, 2, 3, 1, 1, 2, 2, 4, 5, 7, 3, 6, 6])\n", + "array6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70df0735-6680-4996-8cc3-10ac5d9102ab", + "metadata": {}, + "outputs": [], + "source": [ + "# 去重\n", + "array7 = np.unique(array6)\n", + "array7" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "79ad92e0-da87-401d-aa70-0514a5b61d0e", + "metadata": {}, + "outputs": [], + "source": [ + "array8 = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])\n", + "array9 = np.array([[4, 4, 4], [5, 5, 5], [6, 6, 6]])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b099b5ee-d7df-4864-89f3-bb1c7a748a59", + "metadata": {}, + "outputs": [], + "source": [ + "# 在0轴方向(垂直)堆叠 - vertical\n", + "array10 = np.vstack((array8, array9))\n", + "array10" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "204b8e34-43e3-4a4f-9e8c-8734e72a041f", + "metadata": {}, + "outputs": [], + "source": [ + "# 在1轴的方向堆叠 - horizontal\n", + "np.hstack((array8, array9))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3301cc02-3ecf-4516-b2ff-e0fa6a9041c7", + "metadata": {}, + "outputs": [], + "source": [ + "# 数组的拼接\n", + "np.concatenate((array8, array9), axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c956190c-1382-4416-9ab2-09a19f5567f6", + "metadata": {}, + "outputs": [], + "source": [ + "# 堆叠出更高维的数组\n", + "np.stack((array8, array9), axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a956be87-65e9-4391-b2be-bfbaaab7e7dc", + "metadata": {}, + "outputs": [], + "source": [ + "np.stack((array8, array9), axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "df88bd16-64c3-40af-adba-f7faed516159", + "metadata": {}, + "outputs": [], + "source": [ + "# 将一个数组拆分成多个数组\n", + "np.vsplit(array10, 3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "592ffc62-2a47-48bc-8789-3cad653d2893", + "metadata": {}, + "outputs": [], + "source": [ + "# 追加元素\n", + "np.append(array6, [10, 11, 12])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7a3355ca-1ee7-4d8f-8567-c10fed5055f6", + "metadata": {}, + "outputs": [], + "source": [ + "# 插入元素\n", + "np.insert(array6, 1, [10, 20])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b824b66-7471-4669-aa99-8f62a6b8cb2b", + "metadata": {}, + "outputs": [], + "source": [ + "array11 = np.random.randint(1, 100, 10)\n", + "array11" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ff82b6f-9a96-40c2-a4d9-4aaec7d42e9a", + "metadata": {}, + "outputs": [], + "source": [ + "# 抽取元素 - 相当于布尔索引的作用\n", + "np.extract(array11 < 50, array11)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cc00dac0-0b97-435c-960d-06c141de0b78", + "metadata": {}, + "outputs": [], + "source": [ + "# 给出一组条件和对应的处理数据的表达式,满足条件就执行对应的表达式,不满足条件取默认值\n", + "np.select([array11 < 30, array11 > 50], [array11 * 10, array11 // 10], default=100)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d7b8700-a431-4da4-99c6-54eda9f065dd", + "metadata": {}, + "outputs": [], + "source": [ + "# 给出一个条件和两个表达式,满足条件执行表达式1,不满足条件执行表达式2\n", + "np.where(array11 < 50, array11 * 10, array11 // 10)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bc672588-f36b-4951-902c-6b70ef83d1af", + "metadata": {}, + "outputs": [], + "source": [ + "array11" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa96b613-cef4-475e-b88c-f6ee8194ef4c", + "metadata": {}, + "outputs": [], + "source": [ + "# 滚动数组元素\n", + "np.roll(array11, 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bfeed7b4-d835-4a1a-9858-d27d4bc363c3", + "metadata": {}, + "outputs": [], + "source": [ + "np.roll(array11, -2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "52b9da9d-dd2b-4fea-984c-6d4b94efedab", + "metadata": {}, + "outputs": [], + "source": [ + "np.roll(array10, 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "53aa8382-51e2-4634-bb9f-4ff84ddaef60", + "metadata": {}, + "outputs": [], + "source": [ + "np.roll(array10, 2, axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f2de13b-2773-4a87-a854-ee9486dfcc0d", + "metadata": {}, + "outputs": [], + "source": [ + "array12 = np.arange(1, 10).reshape((3, 3))\n", + "array12" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3808483f-2811-45fd-adbb-a10bcc9d7dc6", + "metadata": {}, + "outputs": [], + "source": [ + "np.roll(array12, 2, axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7f6d8d16-2e35-4926-ba13-d5e9cf77660d", + "metadata": {}, + "outputs": [], + "source": [ + "np.roll(array12, 1, axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29b76e3a-8414-4658-b0c6-7795f2186fbe", + "metadata": {}, + "outputs": [], + "source": [ + "# 替换数组元素\n", + "np.put(array11, [1, 3, 5, 7], [33, 88])\n", + "array11" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eb4653ef-5c5a-4d3c-adb2-0f10d79ca6c6", + "metadata": {}, + "outputs": [], + "source": [ + "np.place(array11, array11 > 50, [44, 99])\n", + "array11" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "03bcc628-1471-44ef-ad56-88468c08548d", + "metadata": {}, + "outputs": [], + "source": [ + "guido_image = plt.imread('res/guido.jpg')\n", + "guido_image.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16fa2543-195b-47d6-8e29-2682800f91aa", + "metadata": {}, + "outputs": [], + "source": [ + "plt.imshow(np.flip(guido_image, axis=0))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "259830fa-6cd3-43be-bffe-560050f795b9", + "metadata": {}, + "outputs": [], + "source": [ + "plt.imshow(np.flip(guido_image, axis=1))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6059087d-85ba-4151-9af0-8870876a6b1a", + "metadata": {}, + "outputs": [], + "source": [ + "plt.imshow(np.flip(guido_image, axis=2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70ebbf72-87c9-41d8-8c46-f1eccb531206", + "metadata": {}, + "outputs": [], + "source": [ + "plt.imshow(guido_image)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "015548b9-819c-49ba-a13d-7777662a7414", + "metadata": {}, + "outputs": [], + "source": [ + "plt.imshow(guido_image.swapaxes(0, 1))" + ] + }, + { + "cell_type": "markdown", + "id": "7c4c00de-f16f-4aac-ae37-e0b9df0eb6c2", + "metadata": {}, + "source": [ + "#### 普通函数矢量化" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa07c8bc-9558-4d2c-b12c-b0e3814cbb48", + "metadata": {}, + "outputs": [], + "source": [ + "# 面试官:讲一讲Python语言中的装饰器\n", + "# 用一个函数去装饰另一个函数或者一个类并为其提供额外的能力(横切关注功能)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02e4d3e6-9bc7-462e-8be4-800a1dcdc632", + "metadata": {}, + "outputs": [], + "source": [ + "# 面试题:写一个装饰器,如果原函数返回字符串,那么将字符串每个单词首字母大写\n", + "from functools import wraps\n", + "\n", + "\n", + "def titlize_str(func):\n", + "\n", + " @wraps(func)\n", + " def wrapper(*args, **kwargs):\n", + " result = func(*args, **kwargs)\n", + " if isinstance(result, str):\n", + " result = result.title()\n", + " return result\n", + "\n", + " return wrapper" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e798956c-d543-4a50-a12b-f7314b98bf40", + "metadata": {}, + "outputs": [], + "source": [ + "@titlize_str\n", + "def say_hello(name):\n", + " return 'hello, ' + name" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "82145cd0-bcb2-44b8-90ef-ac461e2120bd", + "metadata": {}, + "outputs": [], + "source": [ + "# 如果不使用@语法糖(便捷语法),也可以通过下面的方式应用装饰器\n", + "# say_hello = titlize_str(say_hello)\n", + "# say_hello('tom')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "108ca8a5-ec63-458b-b95b-df759b68cb51", + "metadata": {}, + "outputs": [], + "source": [ + "say_hello('tom')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d846bc7-718f-4344-b90e-c7803a104887", + "metadata": {}, + "outputs": [], + "source": [ + "# 获取原函数\n", + "say_hello = say_hello.__wrapped__\n", + "say_hello('tom')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea620684-3375-4876-b097-49e81bf225c9", + "metadata": {}, + "outputs": [], + "source": [ + "# 优化代码的执行性能:空间换时间\n", + "from functools import lru_cache\n", + "\n", + "\n", + "@lru_cache(maxsize=128)\n", + "def fib(n):\n", + " \"\"\"获取第n个斐波那契数\"\"\"\n", + " if n in (1, 2):\n", + " return 1\n", + " return fib(n - 1) + fib(n - 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f72372e-9cc9-4751-8ee7-e8d9f40727bd", + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(1, 121):\n", + " print(i, fib(i))" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "29d231a2-57cd-4786-85cd-1366f5378185", + "metadata": {}, + "outputs": [], + "source": [ + "# 通过vectorize装饰器将普通函数做矢量化处理\n", + "@np.vectorize\n", + "def fac(n):\n", + " if n == 0:\n", + " return 1\n", + " return n * fac(n - 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "7c04cc06-ba86-4b1b-a4e1-75e4527f6dde", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 2, 3, 4, 5, 6, 7, 8])" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp = np.arange(1, 9)\n", + "temp" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "38e8026a-6e75-4e1b-a646-98c86736797f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1, 2, 6, 24, 120, 720, 5040, 40320])" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fac(temp)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "4ac8bb35-87f0-44d0-930e-3fc0c5fb63c3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([26, 68, 73, 33, 64, 54, 26, 40, 60, 36]),\n", + " array([37, 56, 65, 30, 57, 36, 61, 54, 34, 52]))" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x1 = np.random.randint(20, 80, 10)\n", + "x2 = np.random.randint(30, 70, 10)\n", + "x1, x2" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "4baced9b-fee2-4c2b-b7ca-dc5914b120b9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1, 4, 1, 3, 1, 18, 1, 2, 2, 4])" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from math import gcd, lcm\n", + "\n", + "gcd = np.vectorize(gcd)\n", + "gcd(x1, x2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08c6a075-8b61-4716-92c5-126cd78108c1", + "metadata": {}, + "outputs": [], + "source": [ + "lcm = np.vectorize(lcm)\n", + "lcm(x1, x2)" + ] + }, + { + "cell_type": "markdown", + "id": "859bba4b-a0cf-4140-a8de-b2f3fffcd355", + "metadata": {}, + "source": [ + "### 广播机制\n", + "\n", + "两个形状(shape属性)不一样的数组如果要做运算,要先通过广播机制使其形状一样才能运算。
\n", + "如果要执行广播机制使得两个数组形状一样,需要满足以下两个条件其中一个:\n", + "\n", + "1. 两个数组后缘维度(shape属性从后往前看对应的部分)相同。\n", + "2. 两个数组后缘维度不同,但是其中一方为1。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3339c56a-b68c-401e-a27c-134be60ccf14", + "metadata": {}, + "outputs": [], + "source": [ + "temp1 = np.array([[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3]])\n", + "temp2 = np.array([1, 2, 3])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ecc9498-792c-4de1-a120-585003b19087", + "metadata": {}, + "outputs": [], + "source": [ + "temp1 + temp2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2cb20c46-3f87-4346-80ab-7e2786ca1475", + "metadata": {}, + "outputs": [], + "source": [ + "temp3 = np.array([[1], [2], [3], [4]])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58932222-bb83-43b7-8cef-b9574898dbb5", + "metadata": {}, + "outputs": [], + "source": [ + "temp1 + temp3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74b6c376-05ac-4049-a4b9-5f0614de780e", + "metadata": {}, + "outputs": [], + "source": [ + "temp4 = np.array([1 ,2, 3])\n", + "temp5 = np.array([[3], [2], [1]])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eefe6354-4614-4dfb-aa94-3d146b770b3b", + "metadata": {}, + "outputs": [], + "source": [ + "temp4.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04dd44a5-6bea-41eb-aad0-b0a18d64a512", + "metadata": {}, + "outputs": [], + "source": [ + "temp5.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "764eb0c3-991d-4a5f-a4eb-7da2c0b445bd", + "metadata": {}, + "outputs": [], + "source": [ + "temp4 + temp5" + ] + }, + { + "cell_type": "markdown", + "id": "8f1022cb-c07a-4149-aafd-9f53d235da4f", + "metadata": {}, + "source": [ + "### 矩阵" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8a16c29a-088a-473b-b856-3dbd6660f9cd", + "metadata": {}, + "outputs": [], + "source": [ + "m1 = np.array([[1, 0, 2], [-1, 3, 1]])\n", + "m2 = np.array([[3, 1], [2, 1], [1, 0]])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74a54196-ca95-4425-9212-62869795aed7", + "metadata": {}, + "outputs": [], + "source": [ + "m1.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1afbceea-782d-49c9-b6d4-d0848f3fdd99", + "metadata": {}, + "outputs": [], + "source": [ + "m2.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bc65a37a-84eb-4aac-b63f-5625a0c15a3a", + "metadata": {}, + "outputs": [], + "source": [ + "m1 @ m2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "67be1c96-a20c-4862-87cb-8579d3a303f5", + "metadata": {}, + "outputs": [], + "source": [ + "np.matmul(m1, m2)" + ] + }, + { + "cell_type": "markdown", + "id": "5fd602a8-61fc-4c5c-94a6-a930bcf6fb2f", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{cases}\n", + "x_1 + 2x_2 + x_3 = 8 \\\\\n", + "3x_1 + 7x_2 + 2x_3 = 23 \\\\\n", + "2x_1 + 2x_2 + x_3 = 9\n", + "\\end{cases}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d41b4856-79c3-4f48-9d8e-58c8d6045884", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{A} = \\begin{bmatrix}\n", + "1 & 2 & 1\\\\\n", + "3 & 7 & 2\\\\\n", + "2 & 2 & 1\n", + "\\end{bmatrix}, \\quad\n", + "\\boldsymbol{x} = \\begin{bmatrix}\n", + "x_1 \\\\\n", + "x_2\\\\\n", + "x_3\n", + "\\end{bmatrix}, \\quad\n", + "\\boldsymbol{b} = \\begin{bmatrix}\n", + "8 \\\\\n", + "23\\\\\n", + "9\n", + "\\end{bmatrix}\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "edb85b44-50b3-4115-8539-5023a19bb2a1", + "metadata": {}, + "outputs": [], + "source": [ + "m3 = np.arange(1, 10, dtype='f8').reshape(3, 3)\n", + "m3[-1, -1] = 8\n", + "m3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b90dbf4-05f7-43ad-a37d-20d48d03dd3a", + "metadata": {}, + "outputs": [], + "source": [ + "# 计算矩阵的秩\n", + "np.linalg.matrix_rank(m3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a442ab40-97a9-4cf5-8d61-bfcaa3561655", + "metadata": {}, + "outputs": [], + "source": [ + "# 逆矩阵 - 奇异矩阵不能求逆矩阵\n", + "# LinAlgError: Singular matrix\n", + "np.linalg.inv(m3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19be7ff8-6a71-40ad-8e37-20008c56be7a", + "metadata": {}, + "outputs": [], + "source": [ + "# 有唯一解决的条件:系数矩阵的秩等于增广矩阵的秩,同时跟未知数的个数相同。\n", + "# 秩(rank):线性无关的行或者列的数量。\n", + "# 线性相关:一个向量可以通过其他向量做线性变换(数乘和加法)得到,那么它们就是线性相关的。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87043db3-e2bd-4a70-950a-d74163afc4d1", + "metadata": {}, + "outputs": [], + "source": [ + "A = np.array([[1, 2, 1], [3, 7, 2], [2, 2, 1]])\n", + "b = np.array([8, 23, 9]).reshape(-1, 1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "479d272b-2b46-4374-93f3-54e13af52d59", + "metadata": {}, + "outputs": [], + "source": [ + "# 系数矩阵的秩\n", + "np.linalg.matrix_rank(A)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "faa35591-09da-4232-9b38-72ee2fb824cc", + "metadata": {}, + "outputs": [], + "source": [ + "# 增广矩阵的秩\n", + "np.linalg.matrix_rank(np.hstack((A, b)))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e8d02dee-7d86-4f9d-8e33-77b5a76784a3", + "metadata": {}, + "outputs": [], + "source": [ + "# 解线性方程组\n", + "np.linalg.solve(A, b)" + ] + }, + { + "cell_type": "markdown", + "id": "336ee288-5be1-41e5-89cb-e22f465efdd2", + "metadata": {}, + "source": [ + "$$\n", + "A \\cdot x = b\n", + "$$\n", + "$$\n", + "A^{-1} \\cdot A \\cdot x = A^{-1} \\cdot b\n", + "$$\n", + "$$\n", + "I \\cdot x = A^{-1} \\cdot b\n", + "$$\n", + "$$\n", + "x = A^{-1} \\cdot b\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7c2bd2fd-8867-4dad-8bbc-b5b175f690b7", + "metadata": {}, + "outputs": [], + "source": [ + "# 通过逆矩阵解线性方程组\n", + "np.linalg.inv(A) @ b" + ] + }, + { + "cell_type": "markdown", + "id": "b876a47b-a2ab-497a-b564-69750ddb8666", + "metadata": {}, + "source": [ + "#### 补充 - 用矩阵运算实现图像处理" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "95e138b5-ed1d-40c3-acbc-821b6ae8cf41", + "metadata": {}, + "outputs": [], + "source": [ + "# 安装opencv库\n", + "# %pip install opencv-python" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "669c59e5-a477-4bf3-b87b-7486e9d2e9ef", + "metadata": {}, + "outputs": [], + "source": [ + "def basic_matrix(translation):\n", + " \"\"\"基础变换矩阵\"\"\"\n", + " return np.array([[1, 0, translation[0]], [0, 1, translation[1]], [0, 0, 1]])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0338ad44-3142-428f-b1b6-45e691962900", + "metadata": {}, + "outputs": [], + "source": [ + "import copy\n", + "\n", + "def adjust_transform_for_image(img, trans_matrix):\n", + " \"\"\"根据图像调整变换矩阵\"\"\"\n", + " height, width, *_ = img.shape\n", + " center = np.array([0.5 * width, 0.5 * height])\n", + " return basic_matrix(center) @ trans_matrix @ basic_matrix(-center)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "368578be-3f94-4a37-be51-c6a5dbce1512", + "metadata": {}, + "outputs": [], + "source": [ + "import cv2\n", + "\n", + "def apply_transform(img, transform, border_value=(204, 204, 204)):\n", + " \"\"\"仿射变换\"\"\"\n", + " return cv2.warpAffine(\n", + " img,\n", + " transform[:2, :],\n", + " dsize=(img.shape[1], img.shape[0]),\n", + " flags=cv2.INTER_LINEAR,\n", + " borderMode=cv2.BORDER_CONSTANT,\n", + " borderValue=border_value\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28b22755-4aed-4cd2-a3a2-57bd73f8eb00", + "metadata": {}, + "outputs": [], + "source": [ + "def apply(img, trans_matrix):\n", + " \"\"\"应用变换\"\"\"\n", + " temp_matrix = adjust_transform_for_image(img, trans_matrix)\n", + " out_img = apply_transform(img, temp_matrix)\n", + " return out_img" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1526fbcd-ad60-4d7c-892c-5c0ae2b7ef3f", + "metadata": {}, + "outputs": [], + "source": [ + "def scale(img, x_ratio, y_ratio):\n", + " \"\"\"缩放\"\"\"\n", + " scale_matrix = np.array([\n", + " [x_ratio, 0, 0], \n", + " [0, y_ratio, 0], \n", + " [0, 0, 1]\n", + " ])\n", + " return apply(img, scale_matrix)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e443448-9b6c-488b-83a5-0de7ac4fca18", + "metadata": {}, + "outputs": [], + "source": [ + "def rotate(img, degree):\n", + " \"\"\"旋转\"\"\"\n", + " rad = np.deg2rad(degree)\n", + " rotate_matrix = np.array([\n", + " [np.cos(rad), -np.sin(rad), 0], \n", + " [np.sin(rad), np.cos(rad), 0], \n", + " [0, 0, 1]\n", + " ])\n", + " return apply(img, rotate_matrix)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2a09187e-2707-49c2-8b8d-84ffa28b6f7e", + "metadata": {}, + "outputs": [], + "source": [ + "def transvect(img, ratio):\n", + " \"\"\"剪切影射\"\"\"\n", + " transvect_matrix = np.array([\n", + " [1, ratio, 0],\n", + " [0, 1, 0],\n", + " [0, 0, 1]\n", + " ])\n", + " return apply(img, transvect_matrix)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9561e506-3b64-49e6-9e9d-8731a2951646", + "metadata": {}, + "outputs": [], + "source": [ + "scaled_img = scale(guido_image, 1.25, 0.75)\n", + "plt.imshow(scaled_img)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56cb960b-b3f9-4109-95ba-4388a2b1762e", + "metadata": {}, + "outputs": [], + "source": [ + "rotated_img = rotate(guido_image, -45)\n", + "plt.imshow(rotated_img)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cc2ac604-0fd7-43d5-be18-407b78e46f58", + "metadata": {}, + "outputs": [], + "source": [ + "transvected_img = transvect(guido_image, -0.3)\n", + "plt.imshow(transvected_img)" + ] + }, + { + "cell_type": "markdown", + "id": "cd91252a-31d4-40d5-82ef-71f22f0bd39c", + "metadata": {}, + "source": [ + "#### 补充 - 用scipy处理图像" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9052864d-a32d-4cd3-9e2a-8df1654b8a67", + "metadata": {}, + "outputs": [], + "source": [ + "from scipy.ndimage import gaussian_filter, sobel\n", + "\n", + "# 获取灰度图\n", + "guido_image = plt.imread('res/guido.jpg')\n", + "gray_image = np.mean(guido_image, axis=2)\n", + "\n", + "plt.figure(figsize=(12, 4))\n", + "\n", + "# 灰度图\n", + "plt.subplot(1, 4, 1)\n", + "plt.imshow(gray_image, cmap=plt.cm.gray)\n", + "\n", + "# 模糊和锐化\n", + "plt.subplot(1, 4, 2)\n", + "blurred_image = gaussian_filter(gray_image, 3)\n", + "plt.imshow(blurred_image, cmap=plt.cm.gray)\n", + "\n", + "plt.subplot(1, 4, 3)\n", + "filtered_image = gaussian_filter(blurred_image, 1)\n", + "sharpen_image = blurred_image + 32 * (blurred_image - filtered_image)\n", + "plt.imshow(sharpen_image, cmap=plt.cm.gray)\n", + "\n", + "# 边缘图\n", + "plt.subplot(1, 4, 4)\n", + "# 使用索贝尔算子(邻点灰度加权差)进行边缘检测\n", + "edge_image = sobel(gray_image)\n", + "plt.imshow(edge_image, cmap=plt.cm.gray)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dbbc6edf-c643-4cb4-a98b-05e806413478", + "metadata": {}, + "outputs": [], + "source": [ + "from scipy.ndimage import rotate, zoom\n", + "\n", + "plt.figure(figsize=(12, 4))\n", + "\n", + "# 旋转\n", + "plt.subplot(1, 3, 1)\n", + "rotated_image = rotate(guido_image, -16, reshape=True)\n", + "plt.imshow(rotated_image)\n", + "\n", + "# 旋转\n", + "plt.subplot(1, 3, 2)\n", + "rotated_image = rotate(guido_image, -16, reshape=False)\n", + "plt.imshow(rotated_image)\n", + "\n", + "# 缩放\n", + "plt.subplot(1, 3, 3)\n", + "scaled_image = zoom(guido_image, zoom=(0.8, 1.25, 1))\n", + "plt.imshow(scaled_image)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "331862d6-60cd-442f-97a4-bbbfbabfc3fa", + "metadata": {}, + "source": [ + "#### 补充 - 视频流人脸识别" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "383fa323-db82-4a8b-a13b-9cbe9b8c48d3", + "metadata": {}, + "outputs": [], + "source": [ + "# 安装face_recognition库\n", + "# %pip install face_recognition" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9c1cf7b8-9d89-43d4-8954-616bce780183", + "metadata": {}, + "outputs": [], + "source": [ + "import cv2\n", + "import face_recognition\n", + "# from PIL import Image\n", + "\n", + "plt.figure(figsize=(12, 8))\n", + "\n", + "image = face_recognition.load_image_file('res/Solvay.jpg')\n", + "locations = face_recognition.face_locations(image)\n", + "for location in locations:\n", + " top, right, bottom, left = location\n", + " # Image.fromarray(image[top:bottom, left:right]).show()\n", + " cv2.rectangle(image, (left, top), (right, bottom), (255, 0, 0), 2)\n", + "plt.imshow(image)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa76d9c2-c8c3-4cec-8c91-b6b54fe6e32e", + "metadata": {}, + "outputs": [], + "source": [ + "# import cv2\n", + "# import face_recognition\n", + "# import numpy as np\n", + "\n", + "# # 获取摄像头\n", + "# video_capture = cv2.VideoCapture(0)\n", + "\n", + "# # 加载图片获取脸部特征\n", + "# obama_image = face_recognition.load_image_file(\"res/obama.jpg\")\n", + "# obama_face_encoding = face_recognition.face_encodings(obama_image)[0]\n", + "# luohao_image = face_recognition.load_image_file(\"res/luohao.png\")\n", + "# luohao_face_encoding = face_recognition.face_encodings(luohao_image)[0]\n", + "# guido_image = face_recognition.load_image_file(\"res/guido.jpg\")\n", + "# guido_face_encoding = face_recognition.face_encodings(guido_image)[0]\n", + "\n", + "# # 保存脸部特征和对应的名字\n", + "# known_face_encodings = [\n", + "# obama_face_encoding,\n", + "# luohao_face_encoding,\n", + "# guido_face_encoding\n", + "# ]\n", + "# known_face_names = [\n", + "# \"Barack\",\n", + "# \"Hao\",\n", + "# \"Guido\"\n", + "# ]\n", + "\n", + "# face_locations = []\n", + "# face_encodings = []\n", + "# face_names = []\n", + "# process_this_frame = True\n", + "\n", + "# while True:\n", + "# # 从视频中读取一帧数据\n", + "# ret, frame = video_capture.read()\n", + "\n", + "# # 调整为原始尺寸的四分之一(加速处理)\n", + "# small_frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)\n", + "\n", + "# # BGR转成RGB\n", + "# rgb_small_frame = small_frame[:, :, ::-1]\n", + "\n", + "# if process_this_frame:\n", + "# # 找到所有的人脸位置和脸部特征保存在列表中\n", + "# face_locations = face_recognition.face_locations(rgb_small_frame)\n", + "# face_encodings = face_recognition.face_encodings(rgb_small_frame, face_locations)\n", + "\n", + "# face_names = []\n", + "# for face_encoding in face_encodings:\n", + "# # 比较脸部特征\n", + "# matches = face_recognition.compare_faces(known_face_encodings, face_encoding)\n", + "# name = \"Unknown\"\n", + "\n", + "# # 通过距离判定最佳匹配并获取对应的名字\n", + "# face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)\n", + "# best_match_index = np.argmin(face_distances)\n", + "# if matches[best_match_index]:\n", + "# name = known_face_names[best_match_index]\n", + "\n", + "# face_names.append(name)\n", + "\n", + "# process_this_frame = not process_this_frame\n", + "\n", + "# # 显示结果\n", + "# for (top, right, bottom, left), name in zip(face_locations, face_names):\n", + "# # 恢复正常的尺寸\n", + "# top, right, bottom, left = top * 4, right * 4, bottom * 4, left * 4\n", + "# # 绘制一个标识人脸的矩形框\n", + "# cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)\n", + "# # 绘制一个填写名字的矩形框\n", + "# cv2.rectangle(frame, (left, bottom - 35), (right, bottom), (0, 0, 255), cv2.FILLED)\n", + "# # 绘制识别出的人脸对应的名字\n", + "# cv2.putText(frame, name, (left + 6, bottom - 6), cv2.FONT_HERSHEY_DUPLEX, 1.0, (255, 255, 255), 1)\n", + " \n", + "# cv2.imshow('Video', frame)\n", + " \n", + "# # 按键盘上的q键退出窗口 \n", + "# if cv2.waitKey(1) & 0xFF == ord('q'):\n", + "# break\n", + "\n", + "# video_capture.release()\n", + "# cv2.destroyAllWindows()" + ] + }, + { + "cell_type": "markdown", + "id": "b6989fb7-a3ab-4889-80be-f4ecce033e5c", + "metadata": {}, + "source": [ + "### 多项式" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0c719ca9-de74-4c2e-aaf2-d0a9c833f512", + "metadata": {}, + "outputs": [], + "source": [ + "# NumPy老版本用poly1d表示多项式\n", + "p1 = np.poly1d([3, 0, 2, 1])\n", + "p2 = np.poly1d([1, 2, 3])\n", + "print(p1)\n", + "print(p2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6aab279-fa9f-4d4e-b597-b3c51b478777", + "metadata": {}, + "outputs": [], + "source": [ + "# 多项式加法\n", + "print(p1 + p2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "375a1989-7231-432c-bf33-06fce490cacf", + "metadata": {}, + "outputs": [], + "source": [ + "# 多项式乘法\n", + "print(p1 * p2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4df5662a-c0bc-434d-b53f-623e09b51a76", + "metadata": {}, + "outputs": [], + "source": [ + "# 令x=2,计算多项式的值\n", + "p2(2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "df150df6-9904-45b7-8704-dbb033552dcb", + "metadata": {}, + "outputs": [], + "source": [ + "# 求导\n", + "print(p1.deriv())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d2d3002c-59d6-4e00-b00f-69f3f6ac6609", + "metadata": {}, + "outputs": [], + "source": [ + "# 求不定积分\n", + "print(p1.integ())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0965b9e9-42fc-425c-90cb-d265e3f6e42f", + "metadata": {}, + "outputs": [], + "source": [ + "p3 = np.poly1d([1, 3, 2])\n", + "print(p3)\n", + "# 令多项式等于0,求解x\n", + "print(p3.roots)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5bda2d2-0d61-46db-ac48-0f7ec3438660", + "metadata": {}, + "outputs": [], + "source": [ + "type(p3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a71c4799-183e-4096-9903-7edf9464feb7", + "metadata": {}, + "outputs": [], + "source": [ + "from numpy.polynomial import Polynomial\n", + "\n", + "# NumPy新版本用Polynomial表示多项式\n", + "p1 = Polynomial([1, 2, 0, 3])\n", + "print(p1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0002c6e3-697b-425c-a17c-802e10f07981", + "metadata": {}, + "outputs": [], + "source": [ + "print(p1.deriv())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1f9dde21-0854-4ae6-9703-9599a4204003", + "metadata": {}, + "outputs": [], + "source": [ + "print(p1.integ())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eb191b28-fb2c-460b-9c87-0041b128ba16", + "metadata": {}, + "outputs": [], + "source": [ + "# 最高次项\n", + "p1.degree()" + ] + }, + { + "cell_type": "markdown", + "id": "2d6c3103-c5b0-413f-b2de-324b0394e3ae", + "metadata": {}, + "source": [ + "### 最小二乘解" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7163cb46-44bb-487c-8dd2-7c530574ecc0", + "metadata": {}, + "outputs": [], + "source": [ + "# 每月收入\n", + "x = np.array([3200, 4811, 5386, 5564, 6120, 6691, 6906, 7483, 7587, 7890,\n", + " 8090, 8300, 8650, 8835, 8975, 9070, 9100, 9184, 9247, 9313, \n", + " 9465, 9558, 9853, 9938, 10020, 10242, 10343, 10731, 10885, 10990, \n", + " 11100, 11227, 11313, 11414, 11630, 11806, 11999, 12038, 12400, 12547, \n", + " 12890, 13050, 13360, 13850, 14890, 14990, 15500, 16899, 17010, 19880])\n", + "# 每月网购支出\n", + "y = np.array([1761, 882, 1106, 182, 1532, 1978, 2174, 2117, 2134, 1924, \n", + " 2207, 2876, 2617, 2683, 3054, 3277, 3345, 3462, 3401, 3591,\n", + " 3596, 3671, 3829, 3907, 3852, 4288, 4359, 4099, 4300, 4367,\n", + " 5019, 4873, 4674, 5174, 4666, 5797, 5782, 5451, 5487, 5448,\n", + " 6002, 6439, 6309, 6045, 5935, 6928, 7356, 6682, 6672, 6582])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a68a1d0d-0dc9-4437-bed9-6e9aff589df1", + "metadata": {}, + "outputs": [], + "source": [ + "# 定性分析 - 散点图\n", + "plt.scatter(x, y)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "89030441-eefd-4a02-ad41-ba154dbf87b2", + "metadata": {}, + "outputs": [], + "source": [ + "from scipy import stats\n", + "\n", + "# 夏皮洛检验(正态性判定)\n", + "stats.shapiro(x), stats.shapiro(y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93573426-e1c0-4025-8b8f-2d83d69e103c", + "metadata": {}, + "outputs": [], + "source": [ + "# 定量分析 - 相关系数 - correlation coefficient\n", + "# 皮尔逊相关系数(标准化的协方差 - [-1, 1])\n", + "# 1. 连续值且成对出现\n", + "# 2. 没有异常值\n", + "# 3. 来自于正态总体\n", + "np.corrcoef(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b6e47de-8e3b-4be4-a5e4-f1d95649e6f9", + "metadata": {}, + "outputs": [], + "source": [ + "# 计算皮尔逊相关系数\n", + "stats.pearsonr(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8e53b1e-b2b1-4222-a69a-dcbca0041d0e", + "metadata": {}, + "outputs": [], + "source": [ + "history_data = {key: value for key, value in zip(x, y)}\n", + "len(history_data)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1530478e-ec70-4f87-828f-1d6798c53c5e", + "metadata": {}, + "outputs": [], + "source": [ + "data = np.random.randint(1, 100, 15).tolist()\n", + "data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7d246563-86fd-418f-b058-4aae08d9b9c1", + "metadata": {}, + "outputs": [], + "source": [ + "import heapq\n", + "\n", + "# 通过堆(heap)结构快速的找到TopN元素\n", + "print(heapq.nsmallest(3, data))\n", + "print(heapq.nlargest(5, data))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "866f9fd9-58aa-4c03-9415-99aa236958d4", + "metadata": {}, + "outputs": [], + "source": [ + "# 目标:因为月收入和网购支出之间有强相关关系,所以我们可以通过月收入预测网购支出\n", + "# 方法1:输入一个月收入,找到跟这个收入最接近的N条数据,用它们的平均值预测对应的网购支出\n", + "# KNN - k最近邻算法(找到k个最近的邻居,用这k个邻居的数据来做出预测)\n", + "import heapq\n", + "\n", + "\n", + "def predicate_by_knn(income, k=5):\n", + " \"\"\"KNN算法\"\"\"\n", + " keys = heapq.nsmallest(k, history_data, key=lambda x: (x - income) ** 2)\n", + " return np.mean([history_data[key] for key in keys]).round(2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "399b9b6a-6625-4dc2-a09c-c0699c538a46", + "metadata": {}, + "outputs": [], + "source": [ + "predicate_by_knn(12800)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "db0ee4bf-83f8-489f-a543-80854571da60", + "metadata": {}, + "outputs": [], + "source": [ + "predicate_by_knn(6800)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e514003-c143-4823-91eb-57f7608f6137", + "metadata": {}, + "outputs": [], + "source": [ + "predicate_by_knn(20000, k=3)" + ] + }, + { + "cell_type": "markdown", + "id": "aa44d3ad-aa3f-44db-889a-8a69b9a9be65", + "metadata": {}, + "source": [ + "回归模型:\n", + "$$ Y = aX + b $$\n", + "\n", + "损失函数:\n", + "$$ MSE = \\frac{1} {N} \\sum (\\hat{y_i} - y_i)^2 $$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "032eb740-ac9a-4526-9eb4-b6770907b07a", + "metadata": {}, + "outputs": [], + "source": [ + "# MSE - Mean Squared Error\n", + "def get_loss(a, b):\n", + " \"\"\"损失函数\"\"\"\n", + " y_hat = a * x + b\n", + " return np.mean((y_hat - y) ** 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0d56ccc4-5a0d-4b3a-89ab-01cf690c4fba", + "metadata": {}, + "outputs": [], + "source": [ + "# 蒙特卡洛模拟(随机瞎蒙法)\n", + "import random\n", + "\n", + "min_loss = np.inf\n", + "ba, bb = None, None\n", + "\n", + "for _ in range(10000):\n", + " a = random.random() * 0.5 + 0.5\n", + " b = random.random() * 1000 - 2000\n", + " curr_loss = get_loss(a, b)\n", + " if curr_loss < min_loss:\n", + " min_loss = curr_loss\n", + " ba, bb = a, b\n", + " print(min_loss)\n", + "print(ba, bb)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "af5f7d02-f921-4712-82e3-1ce43c86c708", + "metadata": {}, + "outputs": [], + "source": [ + "plt.scatter(x, y)\n", + "plt.plot(x, ba * x + bb, color='r', linewidth=4)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "207c2da0-371f-40ea-b19f-f56cca8a7d30", + "metadata": {}, + "outputs": [], + "source": [ + "def predicate_by_regression(income):\n", + " return round(ba * income + bb, 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a9e3af32-654c-4704-9f32-cfe41f067d06", + "metadata": {}, + "outputs": [], + "source": [ + "predicate_by_regression(6800)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5133079a-59f8-4435-af1e-fb5bc13c457d", + "metadata": {}, + "outputs": [], + "source": [ + "predicate_by_regression(12800)" + ] + }, + { + "cell_type": "markdown", + "id": "200b9821-b8c8-41df-ae8e-d96bf5049415", + "metadata": {}, + "source": [ + "将回归模型带入损失函数:\n", + "$$ f(a, b) = \\frac {1} {N} \\sum_{i=1}^{N}(y_i - (ax_i + b))^2 $$\n", + "\n", + "如何让$f(a, b)$取到最小值???\n", + "\n", + "求偏导数,并令其等于0。\n", + "$$ \\frac {\\partial {f(a, b)}} {\\partial {a}} = \\frac {2} {N} \\sum_{i=1}^{N}(-x_iy_i + x_i^2a + x_ib) = 0 $$ \n", + "$$ \\frac {\\partial {f(a, b)}} {\\partial {b}} = \\frac {2} {N} \\sum_{i=1}^{N}(-y_i + x_ia + b) = 0 $$\n", + "\n", + "求解得到:\n", + "$$a = \\frac{\\sum(x_{i} - \\bar{x})(y_{i} - \\bar{y})}{\\sum(x_{i} - \\bar{x})^{2}}$$\n", + "$$b = \\bar{y} - a\\bar{x}$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9ab79db5-85c7-44ad-8ffa-2684fb499f7a", + "metadata": {}, + "outputs": [], + "source": [ + "x_bar, y_bar = np.mean(x), np.mean(y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "332777b5-bc3f-44d2-bd54-f8e9a19497aa", + "metadata": {}, + "outputs": [], + "source": [ + "ba = np.dot((x - x_bar), (y - y_bar)) / np.sum((x - x_bar) ** 2)\n", + "bb = y_bar - ba * x_bar\n", + "ba, bb" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "795f6a40-136a-4772-9e2b-5798022e408d", + "metadata": {}, + "outputs": [], + "source": [ + "plt.scatter(x, y)\n", + "plt.plot(x, ba * x + bb, color='r', linewidth=4)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2bfac239-7bf0-4784-a124-bbfdb4cdafe8", + "metadata": {}, + "outputs": [], + "source": [ + "# 拟合出一个线性回归模型\n", + "np.polyfit(x, y, deg=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d5adc92-73e2-40df-843b-02efb28c6ae3", + "metadata": {}, + "outputs": [], + "source": [ + "# 拟合出一个多项式回归模型\n", + "a, b, c = np.polyfit(x, y, deg=2)\n", + "a, b, c" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c42ee939-e4b9-479e-ad73-66e806e29c6c", + "metadata": {}, + "outputs": [], + "source": [ + "plt.scatter(x, y)\n", + "plt.plot(x, a * x ** 2 + b * x + c, color='r', linewidth=4)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3363f359-3615-4ffa-a040-41e39c83b38f", + "metadata": {}, + "outputs": [], + "source": [ + "Polynomial.fit(x, y, deg=1).convert().coef" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Day66-80/code/day04.ipynb b/Day66-80/code/day04.ipynb new file mode 100644 index 000000000..604f0a3e5 --- /dev/null +++ b/Day66-80/code/day04.ipynb @@ -0,0 +1,1572 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "196a647a-6faa-4aee-a0bf-a345852251dd", + "metadata": {}, + "source": [ + "## 深入浅出pandas\n", + "\n", + "pandas是一个支持数据分析全流程的Python开源库,它的作者Wes McKinney于2008年开始开发这个库,其主要目标是提供一个大数据分析和处理的工具。pandas封装了从数据加载、数据重塑、数据清洗到数据透视、数据呈现等一系列操作,提供了三种核心的数据类型:\n", + "1. `Series`:数据系列,表示一维的数据。跟一维数组的区别在于每条数据都有对应的索引,处理数据的方法比`ndarray`更为丰富。\n", + "2. `DataFrame`:数据框、数据窗、数据表,表示二维的数据。跟二维数组相比,`DataFrame`有行索引和列索引,而且提供了100+方法来处理数据。\n", + "3. `Index`:为`Series`和`DataFrame`提供索引服务。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eb84f909-921a-47da-87b1-61578c871422", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n", + "plt.rcParams['axes.unicode_minus'] = False\n", + "get_ipython().run_line_magic('config', \"InlineBackend.figure_format = 'svg'\")" + ] + }, + { + "cell_type": "markdown", + "id": "2102e83e-2a6d-47aa-b449-c058bea1a601", + "metadata": {}, + "source": [ + "### 创建DataFrame对象" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87dbde08-dcab-4ede-a791-b56e11dd9115", + "metadata": {}, + "outputs": [], + "source": [ + "np.random.seed(20)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c5b2767-2074-4cdf-b1ba-beff6f425942", + "metadata": {}, + "outputs": [], + "source": [ + "stu_names = ['狄仁杰', '白起', '李元芳', '苏妲己', '孙尚香']\n", + "cou_names = ['语文', '数学', '英语']\n", + "scores_arr = np.random.randint(60, 101, (5, 3))\n", + "scores_arr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8c2a6bf-ca5e-479d-ab63-f5c3620186e3", + "metadata": {}, + "outputs": [], + "source": [ + "# 方法一:通过二维数组构造DataFrame对象\n", + "df1 = pd.DataFrame(data=scores_arr, columns=cou_names, index=stu_names)\n", + "df1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "baad5381-fb7d-4cc9-9288-a05d750144af", + "metadata": {}, + "outputs": [], + "source": [ + "# 行索引\n", + "df1.index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d7f06b76-b60b-49cb-be72-adafb0978fca", + "metadata": {}, + "outputs": [], + "source": [ + "# 列索引\n", + "df1.columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13b1275d-77e5-4d5d-b227-19db3f4196fd", + "metadata": {}, + "outputs": [], + "source": [ + "# 值 - 二维数组\n", + "df1.values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dbf5bb11-1600-4ae4-bc95-369bc8189c20", + "metadata": {}, + "outputs": [], + "source": [ + "scores_dict = {\n", + " '语文': [95, 91, 69, 82, 92],\n", + " '数学': [86, 88, 80, 67, 100],\n", + " '英语': [75, 86, 71, 94, 81]\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c300bbbd-329a-4852-bf76-78ce1de02b8f", + "metadata": {}, + "outputs": [], + "source": [ + "# 方法二:通过数据字典构造DataFrame对象\n", + "df2 = pd.DataFrame(data=scores_dict, index=stu_names)\n", + "df2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "705c0de6-43ff-46c6-85d5-301743d18d43", + "metadata": {}, + "outputs": [], + "source": [ + "# 查看DataFrame信息\n", + "df2.info(memory_usage='deep')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "71417ac2-8f4b-4950-9336-de6fbc1f5da4", + "metadata": {}, + "outputs": [], + "source": [ + "# 方法三:从CSV文件加载数据创建DataFrame对象\n", + "df3 = pd.read_csv(\n", + " 'res/2023年北京积分落户数据.csv',\n", + " # encoding='utf-8', # 指定字符编码\n", + " # sep='', # 指定字段的分隔符(默认逗号)\n", + " # delimiter='#',\n", + " # header=0, # 表头所在的行\n", + " # quotechar='\"', # 包裹字符串的字符(默认双引号)\n", + " # index_col='公示编号', # 索引列\n", + " # usecols=['公示编号', '姓名', '积分分值'], # 指定加载的列\n", + " # nrows=10, # 加载的行数\n", + " # skiprows=np.arange(1, 101), # 跳过哪些行\n", + " # true_values=['是', 'Yes', 'YES'], # 哪些值会被视为布尔值True\n", + " # false_values=['否', 'No', 'NO'], # 哪些值会被视为布尔值False\n", + " # na_values=['---', 'N/A'], # 哪些值会被视为空值\n", + " # iterator=True, # 开启迭代器模式\n", + " # chunksize=1000, # 每次加载的数据体量\n", + ")\n", + "df3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "67b86b13-566b-4f97-86fd-3723ef21a87f", + "metadata": {}, + "outputs": [], + "source": [ + "df4 = pd.read_csv('res/big_data_file.csv.gz', low_memory=False)\n", + "df4" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e52ff38d-8e40-4532-8df9-4d2807a3e2ec", + "metadata": {}, + "outputs": [], + "source": [ + "df4.info(memory_usage='deep')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "65e871ff-e87c-4e6b-86cc-624af7ccbdc1", + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install pyarrow" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "48fab84a-8b86-4405-966a-6bfb99582de5", + "metadata": {}, + "outputs": [], + "source": [ + "df5 = pd.read_csv('res/big_data_file.csv.gz', engine='pyarrow')\n", + "df5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea575de9-4398-46fe-b2f8-8fb37b93179b", + "metadata": {}, + "outputs": [], + "source": [ + "df5.info(memory_usage='deep')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5723cf03-b78f-4fc9-943c-f9b10036affa", + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install xlrd xlwt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cb3387b9-3402-4b25-a5d5-ff9690a1ac06", + "metadata": {}, + "outputs": [], + "source": [ + "# 方法四:从Excel文件加载数据创建DataFrame对象\n", + "df6 = pd.read_excel(\n", + " 'res/2020年销售数据.xlsx',\n", + " sheet_name='data',\n", + ")\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d06abbd8-9a34-4ab3-a75c-76e3ed8eb36c", + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install -U pymysql cryptography sqlalchemy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5aa0e35f-2a13-4c8e-a9fd-87b0bf72307e", + "metadata": {}, + "outputs": [], + "source": [ + "# 方法五:从数据服务器加载数据创建DataFrame对象\n", + "from sqlalchemy import create_engine\n", + "\n", + "# URL \n", + "engine = create_engine('mysql+pymysql://guest:Guest.618@47.109.26.237:3306/hrs')\n", + "engine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b344f17-f5a1-4d7d-ad3c-ede4b122609c", + "metadata": {}, + "outputs": [], + "source": [ + "dept_df = pd.read_sql('tb_dept', engine, index_col='dno')\n", + "dept_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5d1ffa3-6962-4c26-ae92-a8d7bc7da0cb", + "metadata": {}, + "outputs": [], + "source": [ + "emp_df1 = pd.read_sql('tb_emp', engine, index_col='eno')\n", + "emp_df1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f84b6886-09d8-4f13-89cc-487574991dba", + "metadata": {}, + "outputs": [], + "source": [ + "emp_df2 = pd.read_sql('tb_emp2', engine, index_col='eno')\n", + "emp_df2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c60e96d2-9a0d-4901-b39c-c31760de47a0", + "metadata": {}, + "outputs": [], + "source": [ + "# 关闭连接释放资源\n", + "engine.connect().close()" + ] + }, + { + "cell_type": "markdown", + "id": "12086a7a-c161-4753-9a8e-180f9e8b2edf", + "metadata": {}, + "source": [ + "### 查看信息" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "785e58f9-b3f7-49a6-affc-8caaa66cebf1", + "metadata": {}, + "outputs": [], + "source": [ + "df6.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fd8a9156-3939-430d-9738-60b3d8a95563", + "metadata": {}, + "outputs": [], + "source": [ + "# 获取前N行\n", + "df6.head(3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b75ace23-9b92-4425-b58f-bcd81e8d72e7", + "metadata": {}, + "outputs": [], + "source": [ + "# 获取后N行\n", + "df6.tail(5)" + ] + }, + { + "cell_type": "markdown", + "id": "c2b2a909-0b40-473c-bb3f-85aca1925a19", + "metadata": {}, + "source": [ + "### 操作行、列、单元格" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fe964b3b-7f51-4202-b528-f5102d9be9f0", + "metadata": {}, + "outputs": [], + "source": [ + "# 访问列\n", + "df6['销售日期']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2e5ccb3-4b97-4a02-8316-b1321390f286", + "metadata": {}, + "outputs": [], + "source": [ + "df6.销售渠道" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "80ad78dc-4f47-4421-8478-ba7797350db4", + "metadata": {}, + "outputs": [], + "source": [ + "df6['销售渠道']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b970671-6f16-4e07-8666-715495de2832", + "metadata": {}, + "outputs": [], + "source": [ + "type(df6['销售日期'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2c9cb56b-6a2b-479e-8c57-c61683858387", + "metadata": {}, + "outputs": [], + "source": [ + "df6[['销售渠道']]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "75730cd3-0459-4a62-97ee-e037256cc98a", + "metadata": {}, + "outputs": [], + "source": [ + "type(df6[['销售渠道']])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9e097e49-b762-4c9f-9d93-98abb1701d97", + "metadata": {}, + "outputs": [], + "source": [ + "# 访问多个列 - 花式索引\n", + "df6[['销售日期', '销售区域', '直接成本']]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf31a169-549e-4182-8206-789f97316115", + "metadata": {}, + "outputs": [], + "source": [ + "df6.columns[3:7]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "792713c0-13bc-4810-86cc-5f6f6ce78719", + "metadata": {}, + "outputs": [], + "source": [ + "df6[df6.columns[3:7]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02d43b17-15e3-44d5-844b-a50d365bf863", + "metadata": {}, + "outputs": [], + "source": [ + "# 访问行 - loc属性\n", + "df6.loc[1944]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "79da6932-f985-44dc-9f4b-e051e4749c65", + "metadata": {}, + "outputs": [], + "source": [ + "df6.iloc[-1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6246b39b-7229-4e0f-af7b-0915e707492a", + "metadata": {}, + "outputs": [], + "source": [ + "# 访问多行 - 花式索引\n", + "df6.loc[[0, 100, 58, 1000, 1000, 1000, 1099]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "77321324-0ca9-4c2e-a792-3c717189cb27", + "metadata": {}, + "outputs": [], + "source": [ + "# 访问多行 - 切片索引\n", + "df6.loc[101:200]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5eb250eb-18e0-4181-a37a-dec55c633116", + "metadata": {}, + "outputs": [], + "source": [ + "# df6[101:200]\n", + "df6.iloc[101:200]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2daddd7-3635-40b1-9416-c1137315948c", + "metadata": {}, + "outputs": [], + "source": [ + "df6.iloc[-1:-101:-1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9321811f-e62b-4db5-a478-cdc0934f097b", + "metadata": {}, + "outputs": [], + "source": [ + "# 访问单元格\n", + "df6.at[2, '售价']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bd1670bc-0a13-457f-95f1-352a4d61b3a7", + "metadata": {}, + "outputs": [], + "source": [ + "df6.at[2, '售价'] = 999\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7460ef03-3f45-4cc0-99a3-85039c2606b0", + "metadata": {}, + "outputs": [], + "source": [ + "df6.iat[2, -3] = 888\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34c81da6-f58f-4c36-8596-004266e9374b", + "metadata": {}, + "outputs": [], + "source": [ + "# 添加列\n", + "df6['销售额'] = df6['售价'] * df6['销售数量']\n", + "df6['季度'] = df6['销售日期'].dt.quarter\n", + "df6['月份'] = df6['销售日期'].dt.month\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c3c60210-202d-4bd8-8804-1d657746b29c", + "metadata": {}, + "outputs": [], + "source": [ + "# 添加行 - 实际工作中基本没有意义" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6bf78f3d-05a2-4c7a-a0f0-fb6659f1bd6f", + "metadata": {}, + "outputs": [], + "source": [ + "# 删除列\n", + "# inplace=False - 默认设定 - 不修改原对象返回修改后的新对象\n", + "# inplace=True - 直接修改DataFrame对象不返回新对象 - 方法没有返回值\n", + "df6.drop(columns=['季度'], inplace=True)\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cdf8cf10-5193-4c38-8fef-bc3d38a8a0a8", + "metadata": {}, + "outputs": [], + "source": [ + "# 删除行\n", + "# df6.drop(index=[0, 1, 2, 100, 1944, 1943])\n", + "df6.drop(index=[0, 1, 2, 100, 1944, 1943], inplace=True)\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1ddfe77d-aa92-4d6a-b2db-8469b1222ed3", + "metadata": {}, + "outputs": [], + "source": [ + "df6.drop(index=df6.index[100:200], inplace=True)\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8020bbb0-740e-496a-9224-fe3495a19c92", + "metadata": {}, + "outputs": [], + "source": [ + "# 重命名\n", + "df6.rename(columns={'销售区域': '区域', '销售渠道': '渠道', '销售订单': '订单号'}, inplace=True)\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d028d2be-0944-4b70-a3ea-f7d06cdd458f", + "metadata": {}, + "outputs": [], + "source": [ + "# 重置索引\n", + "# drop=False - 默认值 - 原来的索引变成一个普通列\n", + "# drop=True - 原来的索引直接丢弃\n", + "df6.reset_index(drop=True, inplace=True)\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cb55a518-f4bd-4fac-8554-4353c0798bc6", + "metadata": {}, + "outputs": [], + "source": [ + "# 设置索引\n", + "df6.set_index('订单号', inplace=True)\n", + "df6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "101bd804-5a90-4cd3-a545-613df6d9b8e5", + "metadata": {}, + "outputs": [], + "source": [ + "# 筛选数据 - 布尔索引\n", + "df6[df6['销售额'] > 100000]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "64c83a43-fcb0-4ba1-9400-ae4a5b21715c", + "metadata": {}, + "outputs": [], + "source": [ + "df6[(df6['销售额'] > 100000) & (df6['月份'] == 6)]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22c01e56-b188-40f7-9e53-3a3d2f0bcb29", + "metadata": {}, + "outputs": [], + "source": [ + "df6[(df6['销售额'] > 100000) | (df6['月份'] == 6)]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5adb86b9-8b31-49cb-9292-94189f3714c5", + "metadata": {}, + "outputs": [], + "source": [ + "df6.query('销售额 > 100000')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b768afa0-7066-4a1d-8f10-b88386587388", + "metadata": {}, + "outputs": [], + "source": [ + "df6.query('月份 == 6 and 渠道 == \"实体\"')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e57b21c-0565-4352-8924-de169497bce0", + "metadata": {}, + "outputs": [], + "source": [ + "df6.query('销售额 > 100000 and 月份 == 6')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7ef8ba56-5293-41b0-8208-85a0eed735e8", + "metadata": {}, + "outputs": [], + "source": [ + "# 随机抽样\n", + "df6.sample(n=100)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bfcd52d7-eac4-4776-b0e3-a37e67e349f3", + "metadata": {}, + "outputs": [], + "source": [ + "df6.sample(frac=0.05)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1c654ca8-3179-4fa2-9213-7d7029357342", + "metadata": {}, + "outputs": [], + "source": [ + "# replace=False - 无放回抽样\n", + "ignore_rows = np.random.choice(np.arange(1, 1946), size=int(1945 * 0.9), replace=False)\n", + "pd.read_excel(\n", + " 'res/2020年销售数据.xlsx',\n", + " sheet_name='data',\n", + " skiprows=ignore_rows\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "2037ed6a-d616-4c67-9f5d-ea517d6e1c6b", + "metadata": {}, + "source": [ + "### 数据重塑\n", + "\n", + "1. 拼接(合并结构一致的数据)\n", + "2. 合并(事实表连接维度表)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d2184fd4-bd44-459f-bda4-6dc11c09c219", + "metadata": {}, + "outputs": [], + "source": [ + "# 拼接两个DataFrame - union\n", + "all_emp_df = pd.concat([emp_df1, emp_df2])\n", + "all_emp_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05bc65a1-42ac-463c-a089-08fb8dc60855", + "metadata": {}, + "outputs": [], + "source": [ + "# 连表 - 连接事实表和维度表 - 用维度把数据分组然后再做聚合\n", + "# 连接两个DataFrame(内连接、左外连接、右外连接、全外连接)- join\n", + "# how - 连表方式 - inner、left、right、outer\n", + "# on - 基于哪个字段连表 - left_on、right_on\n", + "all_emp_df = pd.merge(all_emp_df, dept_df, how='inner', on='dno')\n", + "all_emp_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6a3d52d-a04c-494d-9ee9-2dad9805b1c1", + "metadata": {}, + "outputs": [], + "source": [ + "# 作业:在jobs目录下有若干个CVS文件,它们的数据结构是一样的,现在需要把所有CSV文件的数据拼接到一个DataFrame中\n", + "import os\n", + "\n", + "dfs = [pd.read_csv(os.path.join('res/jobs', filename))\n", + " for filename in os.listdir('res/jobs') \n", + " if filename.endswith('.csv')]\n", + "pd.concat(dfs, ignore_index=True).to_csv('res/all_jobs.csv', index=False)" + ] + }, + { + "cell_type": "markdown", + "id": "6b9ad1e1-fe5d-45a0-8755-ac6720a32ba0", + "metadata": {}, + "source": [ + "### 数据清洗\n", + "\n", + "1. 缺失值\n", + "2. 重复值\n", + "3. 异常值\n", + "4. 预处理" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45c835c4-559f-45f1-a501-70a8c12bbbb1", + "metadata": {}, + "outputs": [], + "source": [ + "# 甄别缺失值\n", + "all_emp_df.isna()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fd7fbdf8-ebf2-463b-ac3b-cdb24560873a", + "metadata": {}, + "outputs": [], + "source": [ + "# all_emp_df['comm'].isna()\n", + "all_emp_df['comm'].isnull()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a4f16d30-83e9-4761-92a1-780e85e721e1", + "metadata": {}, + "outputs": [], + "source": [ + "# all_emp_df['comm'].notna()\n", + "all_emp_df['comm'].notnull()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f2a153d-ab4a-475e-9ee3-0d623a289f7f", + "metadata": {}, + "outputs": [], + "source": [ + "all_emp_df['comm'].notna().value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5d388d57-fa1a-405b-880e-9316354a6f05", + "metadata": {}, + "outputs": [], + "source": [ + "# 删除空值 - 删除带有空值的行\n", + "all_emp_df.dropna()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b40fa037-3fab-454e-a300-2e9dcf4b2b60", + "metadata": {}, + "outputs": [], + "source": [ + "all_emp_df.dropna(axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "67ae21a1-7dc1-496b-85b5-013d79d25a63", + "metadata": {}, + "outputs": [], + "source": [ + "all_emp_df.mgr.dropna()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66745379-9db7-42b0-ab6b-a55e870a515b", + "metadata": {}, + "outputs": [], + "source": [ + "# 填充空值\n", + "all_emp_df.fillna(0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1664115-30e0-4946-ae4b-c919bb319ddc", + "metadata": {}, + "outputs": [], + "source": [ + "all_emp_df.comm.fillna(0).astype('i8')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1743531-66a2-42a4-8c28-ad268efc848c", + "metadata": {}, + "outputs": [], + "source": [ + "# 将空值下方的非空值向上填充 - backward fill\n", + "all_emp_df.comm.bfill()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5fcef0a0-ff29-42bd-9955-5a97595390fd", + "metadata": {}, + "outputs": [], + "source": [ + "# 将空值上方的非空值向下填充 - forward fill\n", + "all_emp_df.comm.ffill()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eeeb9be3-802c-44e3-80a0-465aba1a485a", + "metadata": {}, + "outputs": [], + "source": [ + "# 通过插值算法填充空值 - interpolate\n", + "all_emp_df['comm'] = all_emp_df.comm.interpolate(method='linear')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f1f094c3-1cc2-4826-a04a-24150ea9cef8", + "metadata": {}, + "outputs": [], + "source": [ + "all_emp_df['comm'] = all_emp_df.comm.astype('i8')\n", + "all_emp_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a739242d-ebd2-42d2-9ec7-9a5939cbf74a", + "metadata": {}, + "outputs": [], + "source": [ + "all_emp_df['mgr'] = all_emp_df.mgr.fillna(-1).astype('i8')\n", + "all_emp_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cd376d13-2245-48b8-ba14-3315d4c48f9c", + "metadata": {}, + "outputs": [], + "source": [ + "# 甄别重复值\n", + "all_emp_df.ename.duplicated()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e107a38-c5e8-4e5e-9e42-71481c54e0d1", + "metadata": {}, + "outputs": [], + "source": [ + "all_emp_df.duplicated(['ename', 'job'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "097eaaf2-1112-4e0f-b361-786bf91d6c1f", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计每个元素出现的频次\n", + "all_emp_df.ename.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6494bb56-7ac7-47df-a9f1-960b02586e31", + "metadata": {}, + "outputs": [], + "source": [ + "all_emp_df.job.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "172e4d9a-63bd-44ca-98ea-e4614c8823ab", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计不重复的元素的个数\n", + "all_emp_df.ename.nunique()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d6fa062c-d338-407f-8647-e84878a5642e", + "metadata": {}, + "outputs": [], + "source": [ + "# 删除重复值\n", + "# keep='first' - 默认值,重复元素保留第一项 - 'last' / False\n", + "all_emp_df.drop_duplicates(['ename', 'job'], keep='last', inplace=True)\n", + "all_emp_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "832a2ea2-6941-4364-b143-af7db9ff9701", + "metadata": {}, + "outputs": [], + "source": [ + "# 异常值的甄别\n", + "# 数值判定法(data < Q1 - 1.5 * IQR 或者 data > Q3 + 1.5 * IQR)\n", + "\n", + "\n", + "def find_outliers_by_iqr(data, whis=1.5):\n", + " q1, q3 = np.quantile(data, [0.25, 0.75])\n", + " iqr = q3 - q1\n", + " return data[(data < q1 - whis * iqr) | (data > q3 + whis * iqr)]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1cd5d6aa-c60e-483e-995c-a627a0dfec15", + "metadata": {}, + "outputs": [], + "source": [ + "temp = np.random.normal(80, 8, 50).round(0)\n", + "temp = np.append(temp, [120, 160, 200, 40, 20, -50])\n", + "temp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2121dab4-0efc-4fcd-a5fe-67585552cb53", + "metadata": {}, + "outputs": [], + "source": [ + "find_outliers_by_iqr(temp)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da048825-3f88-4009-9db5-159e8e883b10", + "metadata": {}, + "outputs": [], + "source": [ + "find_outliers_by_iqr(temp, whis=3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0da7034b-2350-43ff-a6eb-9e7f4361bdee", + "metadata": {}, + "outputs": [], + "source": [ + "# zscore判定法(三西格玛法则 ---> 68-95-99.7法则)\n", + "\n", + "\n", + "def find_outliers_by_zscore(data, mul=3):\n", + " mu, sigma = np.mean(data), np.std(data)\n", + " zscore = (data - mu) / sigma\n", + " return data[np.abs(zscore) > mul]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e88616c0-a4d8-4fd8-9ec2-e761cb5ba056", + "metadata": {}, + "outputs": [], + "source": [ + "find_outliers_by_zscore(temp)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c902031c-2f78-4721-9734-5c5b0ca81650", + "metadata": {}, + "outputs": [], + "source": [ + "find_outliers_by_zscore(temp, mul=2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1e295014-d582-4e78-b5b9-6d9f0463ff8d", + "metadata": {}, + "outputs": [], + "source": [ + "find_outliers_by_zscore(df6.直接成本)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "97b98c82-fd09-42a9-8a75-a3e71ae10fbc", + "metadata": {}, + "outputs": [], + "source": [ + "# 根据离群点的行索引删除行\n", + "df6.drop(index=find_outliers_by_zscore(df6.直接成本).index)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0053ed12-c09f-4331-a6dd-487ff990c680", + "metadata": {}, + "outputs": [], + "source": [ + "med_value = np.median(temp)\n", + "med_value" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f02c2985-1b07-4b1c-b248-aa1de9e98451", + "metadata": {}, + "outputs": [], + "source": [ + "find_outliers_by_zscore(temp, mul=2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "485adc15-f39d-419b-9869-2b366f5d88ec", + "metadata": {}, + "outputs": [], + "source": [ + "np.in1d(temp, find_outliers_by_zscore(temp, mul=2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce92f242-1f0f-476e-ae85-91e1615783ef", + "metadata": {}, + "outputs": [], + "source": [ + "# 替换离群点\n", + "np.place(temp, np.in1d(temp, find_outliers_by_zscore(temp, mul=2)), med_value)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10b0b0bc-f98c-40fe-890f-976df9d9c52b", + "metadata": {}, + "outputs": [], + "source": [ + "temp" + ] + }, + { + "cell_type": "markdown", + "id": "d970e838-42f2-44d0-8f2d-07ebbf6de2b0", + "metadata": {}, + "source": [ + "#### 案例1:招聘数据清洗和预处理\n", + "\n", + "1. 数据加载\n", + "2. 去重\n", + "3. 数据抽取\n", + "4. 拆分列\n", + "5. 替换值\n", + "6. 数据筛选" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1ec417a9-457f-434e-96a6-f4fd35d75987", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df = pd.read_csv('res/all_jobs.csv')\n", + "jobs_df.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74e0e4a5-3c03-4617-9661-8cfa03b88fd7", + "metadata": {}, + "outputs": [], + "source": [ + "# 根据URI列去重\n", + "jobs_df.drop_duplicates('uri', inplace=True)\n", + "jobs_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6cca7b8b-25f1-46b8-9946-34ba90f42116", + "metadata": {}, + "outputs": [], + "source": [ + "# 通过正则表达式从列中提取信息\n", + "jobs_df[['salary_lower', 'salary_upper']] = jobs_df.salary.str.extract(r'(\\d+)-(\\d+)').astype('i8')\n", + "jobs_df['salary'] = (jobs_df.salary_lower + jobs_df.salary_upper) / 2\n", + "jobs_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ffaea2af-09f6-4577-9c0d-024966d6854f", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df.drop(columns=['uri', 'city'], inplace=True)\n", + "jobs_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d9ba5998-ca1d-44c8-87ca-363356074dd5", + "metadata": {}, + "outputs": [], + "source": [ + "# 拆分列\n", + "jobs_df['city'] = jobs_df.site.str.split(expand=True)[0]\n", + "jobs_df.drop(columns='site', inplace=True)\n", + "jobs_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "933e9006-4f5e-4238-b6d9-940dfeb6caf1", + "metadata": {}, + "outputs": [], + "source": [ + "# 字符串正则表达式替换\n", + "jobs_df['year'] = jobs_df.year.replace(r'5-10年|10年以上', '5年以上', regex=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d10a9c1c-a9d5-49e1-8fdf-a68b5bb3d59a", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df.year.unique()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d248e233-bac5-48d5-8a69-a1f04350867a", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df['edu'] = jobs_df.edu.replace(r'中专|高中', '学历不限', regex=True)\n", + "jobs_df['edu'] = jobs_df.edu.replace(r'硕士|博士', '研究生', regex=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eec6fbd5-2355-4674-9e5d-7f47a5a808a2", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df.edu.unique()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "352b1921-aa2b-4016-af3e-02032b2a3935", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df['job_name'] = jobs_df.job_name.str.lower()\n", + "jobs_df = jobs_df[jobs_df.job_name.str.contains('python|数据|产品|运营|data', regex=True)]\n", + "jobs_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "df370013-1278-48d2-9891-8647df3c5e15", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df.to_csv('res/cleand_jobs.csv', index=False)" + ] + }, + { + "cell_type": "markdown", + "id": "8ee07676-737c-420e-b11a-235ff7f2c4c8", + "metadata": {}, + "source": [ + "#### 案例2:北京积分落户数据预处理\n", + "\n", + "1. 加载数据\n", + "2. 日期时间处理\n", + "3. 年龄段分箱\n", + "4. 落户积分归一化" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1232d023-7591-47b3-b67b-4920642dd28d", + "metadata": {}, + "outputs": [], + "source": [ + "settle_df = pd.read_csv('res/2023年北京积分落户数据.csv', index_col='公示编号')\n", + "settle_df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "734eb268-3ad7-4e67-9661-08328075992b", + "metadata": {}, + "outputs": [], + "source": [ + "settle_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "63698465-ddcd-430c-bd96-e78abaaebda3", + "metadata": {}, + "outputs": [], + "source": [ + "# 将字符串处理成日期\n", + "settle_df['出生年月'] = pd.to_datetime(settle_df['出生年月'])\n", + "settle_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "989c56c7-85fa-4180-9b86-5247a41cdbab", + "metadata": {}, + "outputs": [], + "source": [ + "# 将生日换算成年龄\n", + "settle_df['年龄'] = (pd.to_datetime('2023-01-01') - settle_df.出生年月).dt.days // 365\n", + "settle_df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4191c7a2-19fd-4347-ac79-2371c8e59c10", + "metadata": {}, + "outputs": [], + "source": [ + "# 将年龄划分到年龄段 - 分箱 - 数据桶\n", + "settle_df['年龄段'] = pd.cut(\n", + " settle_df.年龄,\n", + " bins=np.arange(35, 61, 5),\n", + " labels=['35~39岁', '40~44岁', '45~49岁', '50~54岁', '55~59岁'],\n", + " right=False\n", + ")\n", + "settle_df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea2e0c9b-0aa0-41d3-a52a-6926b797465c", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计每个元素出现的频次\n", + "temp = settle_df.年龄段.value_counts()\n", + "temp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "30843274-b940-4527-92ed-97db86bb4ec7", + "metadata": {}, + "outputs": [], + "source": [ + "plt.cm.Greens(np.linspace(0.9, 0.1, 5))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "375dd407-9d0a-4788-a38e-3a37efbb6d3b", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制柱状图\n", + "temp.plot(\n", + " kind='bar', # 图表类型\n", + " figsize=(8, 4), # 图表尺寸\n", + " xlabel='', # 横轴标签\n", + " ylabel='Count', # 纵轴标签\n", + " width=0.5, # 柱子宽度\n", + " hatch='//', # 柱子条纹\n", + " color=plt.cm.Greens(np.linspace(0.9, 0.3, temp.size)) # 颜色值\n", + ")\n", + "\n", + "for i in range(temp.size):\n", + " # plt.text(横坐标, 纵坐标, 标签内容)\n", + " plt.text(i, temp.iloc[i] + 30, temp.iloc[i], ha='center')\n", + "\n", + "# 定制横轴的刻度\n", + "plt.xticks(rotation=0)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e020ba6c-d16d-482f-ad3b-a9e855257b91", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制饼图\n", + "temp.plot(\n", + " kind='pie',\n", + " ylabel='',\n", + " autopct='%.1f%%', # 自动计算并显示百分比\n", + " wedgeprops={'width': 0.3}, # 环状结构部分的宽度\n", + " pctdistance=0.85, # 百分比到圆心的距离\n", + " labeldistance=1.1, # 标签到圆心的距离\n", + " # shadow=True, # 阴影效果\n", + " # startangle=0, # 起始角度\n", + " counterclock=True, # 是否反时针方向绘制\n", + ")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e846eec2-6c95-409c-8b15-2b14cab3f57c", + "metadata": {}, + "outputs": [], + "source": [ + "# agg - aggregate - 聚合\n", + "settle_df.积分分值.agg(['mean', 'max', 'min', 'std', 'skew', 'kurt'])" + ] + }, + { + "cell_type": "markdown", + "id": "b1669102-1c03-4751-813c-b241a05718e3", + "metadata": {}, + "source": [ + "线性归一化:\n", + "$$\n", + "x^{\\prime} = \\frac{x - x_{min}}{x_{max} - x_{min}}\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e8d9dca7-b976-43ab-96b8-abefca66cc53", + "metadata": {}, + "outputs": [], + "source": [ + "# 将积分分值处理成0~1范围的值\n", + "max_score, min_score = settle_df.积分分值.agg(['max', 'min'])\n", + "max_score, min_score" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10acd550-8422-4934-b38f-03554f86d305", + "metadata": {}, + "outputs": [], + "source": [ + "# map - 映射 - 将指定的函数作用到数据系列的每个元素上\n", + "# apply - 应用 - 将指定的函数应用到数据系列的每个元素上\n", + "settle_df['线性归一化积分'] = settle_df.积分分值.map(lambda x: (x - min_score) / (max_score - min_score)).round(2)\n", + "settle_df" + ] + }, + { + "cell_type": "markdown", + "id": "55e57b00-cb9e-4c9e-bc59-e99b738e2f5d", + "metadata": {}, + "source": [ + "zscore标准化:\n", + "$$\n", + "x^{\\prime} = \\frac{x - \\mu}{\\sigma}\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5fc6260-5337-4161-99f0-d7be43d59361", + "metadata": {}, + "outputs": [], + "source": [ + "mu, sigma = settle_df.积分分值.agg(['mean', 'std'])\n", + "settle_df['zscore评分'] = settle_df.积分分值.apply(lambda x: (x - mu) / sigma)\n", + "settle_df" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Day66-80/code/day05.ipynb b/Day66-80/code/day05.ipynb new file mode 100644 index 000000000..0834e84f3 --- /dev/null +++ b/Day66-80/code/day05.ipynb @@ -0,0 +1,673 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "97982f4b-b3d0-47fe-b6e8-a1a327d03d93", + "metadata": {}, + "source": [ + "## 深入浅出pandas" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "834d7d19-4015-4048-a1a6-2b8c756f0115", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n", + "plt.rcParams['axes.unicode_minus'] = False" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56becc00-f1c7-4f81-86d1-7f7b23fd65d4", + "metadata": {}, + "outputs": [], + "source": [ + "%config InlineBackend.figure_format = 'svg'" + ] + }, + { + "cell_type": "markdown", + "id": "c81bfe94-906c-4e13-b321-0e6c397aea46", + "metadata": {}, + "source": [ + "### 数据透视\n", + "\n", + "1. 数据聚合(指标统计)\n", + "2. 排序和头部值\n", + "3. 透视表和交叉表" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7836ce8b-1e1d-40e5-b01c-c228ebf60928", + "metadata": {}, + "outputs": [], + "source": [ + "sales_df = pd.read_excel('res/2020年销售数据.xlsx', sheet_name='data')\n", + "sales_df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ca3b2020-9470-4e49-a3ca-066dc9a437b3", + "metadata": {}, + "outputs": [], + "source": [ + "sales_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "224ae45b-172a-48bc-8df0-7d6c7099529a", + "metadata": {}, + "outputs": [], + "source": [ + "# 添加销售额、毛利润、月份列\n", + "sales_df['销售额'] = sales_df.售价 * sales_df.销售数量\n", + "sales_df['毛利润'] = sales_df.销售额 - sales_df.直接成本\n", + "sales_df['月份'] = sales_df.销售日期.dt.month\n", + "sales_df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c8878bfd-64de-4a6a-9ae3-36e9af40d45e", + "metadata": {}, + "outputs": [], + "source": [ + "def make_tag(price):\n", + " if price < 300:\n", + " return '低端'\n", + " elif price < 800:\n", + " return '中端'\n", + " return '高端'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4cba4262-b952-4fce-9b07-fed9c5f2a2e3", + "metadata": {}, + "outputs": [], + "source": [ + "# 根据商品的价格添加价位标签\n", + "sales_df['价位'] = sales_df.售价.apply(make_tag)\n", + "sales_df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2a2b472-55fc-4d35-8530-445e8ffbf800", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计北极星指标\n", + "GMV, profit, quantity = sales_df[['销售额', '毛利润', '销售数量']].sum()\n", + "print(f'销售额: {GMV}元')\n", + "print(f'毛利润: {profit}元')\n", + "print(f'销售数量: {quantity}件')\n", + "print(f'毛利率: {profit / GMV:.2%}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b6ac225-6834-462f-a5b7-8be8297b66d0", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计每个月的销售额和毛利润\n", + "temp1 = sales_df.groupby('月份')[['销售额', '毛利润']].agg('sum')\n", + "temp1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8f179a1-c270-458d-be84-b862ca2c143d", + "metadata": {}, + "outputs": [], + "source": [ + "# 使用透视表统计每个月的销售额和毛利润\n", + "pd.pivot_table(\n", + " sales_df,\n", + " index='月份',\n", + " values=['销售额', '毛利润'],\n", + " aggfunc='sum'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3a3cb912-c47f-4d47-8bd4-ffc9eb171cba", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制折线图\n", + "temp1.plot(\n", + " kind='line',\n", + " figsize=(10, 5),\n", + " y=['销售额', '毛利润'], # 放到纵轴上的数据\n", + " xlabel='', # 横轴的标签\n", + " ylabel='销售额和毛利润', # 纵轴的标签\n", + " marker='^', # 标记点符号\n", + ")\n", + "# plt.fill_between(np.arange(1, 13), temp1.销售额, where=temp1.销售额 >= 3e6, facecolor='red', alpha=0.25)\n", + "# plt.fill_between(np.arange(1, 13), temp1.销售额, where=temp1.销售额 < 3e6, facecolor='green', alpha=0.25)\n", + "# 定制纵轴的取值范围\n", + "plt.ylim(0, 6e6)\n", + "# 定制横轴的刻度\n", + "plt.xticks(np.arange(1, 13), labels=[f'{x}月' for x in range(1, 13)])\n", + "# 定制标题\n", + "plt.title('2020年月度销售额和毛利润', fontdict={'fontsize': 22, 'color': 'navy'})\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "389db27d-2b11-47ea-a8c6-660fcc39b863", + "metadata": {}, + "outputs": [], + "source": [ + "plt.cm.RdYlBu_r" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "621a2487-2d15-450d-9f69-df8089616bd9", + "metadata": {}, + "outputs": [], + "source": [ + "# 计算月环比\n", + "temp1['销售额月环比'] = temp1.销售额.pct_change()\n", + "temp1['毛利润月环比'] = temp1.毛利润.pct_change()\n", + "# 索引重排序\n", + "temp1 = temp1.reindex(columns=['销售额', '销售额月环比', '毛利润', '毛利润月环比'])\n", + "# 渲染输出\n", + "temp1.style.format(\n", + " formatter={\n", + " '销售额月环比': '{:.2%}',\n", + " '毛利润月环比': '{:.2%}'\n", + " },\n", + " na_rep='-------'\n", + ").background_gradient(\n", + " 'RdYlBu_r',\n", + " subset=['销售额月环比', '毛利润月环比']\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b158aea-fe6e-4938-a4f8-4b880c6f23d0", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制横线图\n", + "mu = temp1.销售额.mean()\n", + "temp1['diff'] = temp1.销售额 - mu\n", + "temp1['colors'] = temp1.销售额.map(lambda x: 'green' if x > mu else 'red')\n", + "\n", + "plt.figure(figsize=(8, 6), dpi=200)\n", + "plt.hlines(y=temp1.index, xmin=0, xmax=temp1['diff'], color=temp1.colors, alpha=0.6, linewidth=6)\n", + "plt.yticks(np.arange(1, 13), labels=[f'{x}月' for x in np.arange(1, 13)])\n", + "# 定制网格线\n", + "plt.grid(linestyle='--', linewidth=0.4, alpha=0.5)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "810936a1-f6c5-43e9-9eda-b9fe8014ba9f", + "metadata": {}, + "outputs": [], + "source": [ + "# 各品牌对销售额贡献占比\n", + "temp2 = sales_df.groupby('品牌')['销售额'].sum()\n", + "temp2.plot(\n", + " kind='pie',\n", + " ylabel='',\n", + " autopct='%.2f%%', # 自动计算并显示百分比\n", + " pctdistance=0.82, # 百分比标签到圆心的距离\n", + " wedgeprops=dict(width=0.35, edgecolor='w'), # 定制环状饼图\n", + " explode=[0.1, 0, 0, 0, 0], # 分离饼图\n", + ")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1465d545-14e8-435d-9b12-651df3bab9ea", + "metadata": {}, + "outputs": [], + "source": [ + "# 各销售区域每个月的销售额\n", + "temp3 = sales_df.groupby(['销售区域', '月份'], as_index=False)[['销售额']].sum()\n", + "# pivot - 将行旋转到列上(窄表 ----> 宽表)\n", + "# melt - 将列旋转到行上(宽表 ----> 窄表)\n", + "temp3.pivot(index='销售区域', columns='月份', values='销售额').fillna(0).astype('i8')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1adf23bb-7120-416c-846e-1098aac2e1f4", + "metadata": {}, + "outputs": [], + "source": [ + "# 创建透视表\n", + "pd.pivot_table(\n", + " sales_df,\n", + " index='销售区域',\n", + " columns='月份',\n", + " values='销售额',\n", + " aggfunc='sum',\n", + " fill_value=0,\n", + " margins=True,\n", + " margins_name='总计'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fdf54c9a-64c3-453d-bc5b-533fce7b25ed", + "metadata": {}, + "outputs": [], + "source": [ + "# 将价位字段处理成category类型并指定排序的顺序\n", + "sales_df['价位'] = sales_df.价位.astype('category').cat.reorder_categories(['高端', '中端', '低端'])\n", + "sales_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e706575-3260-4b37-99fe-696db4f6fe7b", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计每个月各种价位产品的销量\n", + "temp4 = sales_df.pivot_table(\n", + " index='价位',\n", + " columns='月份',\n", + " values='销售数量',\n", + " observed=False,\n", + " fill_value=0,\n", + " aggfunc='sum'\n", + ")\n", + "temp4" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12527025-64f5-4a0a-9bf5-6bdbc4b6f93b", + "metadata": {}, + "outputs": [], + "source": [ + "# 交叉表\n", + "pd.crosstab(\n", + " index=sales_df.价位,\n", + " columns=sales_df.月份,\n", + " values=sales_df.销售数量,\n", + " aggfunc='sum'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d99ecd49-2669-493c-ae04-974afa71dc4d", + "metadata": {}, + "outputs": [], + "source": [ + "blood_types = np.array(['B', 'A', 'O', 'O', 'AB', 'B', 'O', 'B', 'AB', 'A', 'A', 'O', 'B', 'O', 'O', 'O', 'O', 'A', 'B', 'B'])\n", + "personality_types = np.array(['𝛃', '𝛂', '𝛂', '𝛂', '𝛃', '𝛂', '𝛄', '𝛄', '𝛂', '𝛄', '𝛃', '𝛂', '𝛂', '𝛂', '𝛄', '𝛄', '𝛂', '𝛂', '𝛂', '𝛂'])\n", + "\n", + "# 创建交叉表\n", + "pd.crosstab(\n", + " index=blood_types,\n", + " columns=personality_types,\n", + " rownames=['血型'],\n", + " colnames=['人格'],\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3cba2e9e-8335-4081-a253-5ab6d08d8559", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制堆叠柱状图\n", + "temp4.T.plot(\n", + " figsize=(10, 4),\n", + " kind='bar',\n", + " width=0.6,\n", + " xlabel='',\n", + " ylabel='销售数量',\n", + " stacked=True\n", + ")\n", + "plt.xticks(rotation=0)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9eba1698-d8a5-4b66-b617-fc68072a670e", + "metadata": {}, + "outputs": [], + "source": [ + "# 让每一项数据除以对应月份的销售数量之和\n", + "temp5 = temp4.T.divide(temp4.sum(), axis=0)\n", + "temp5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1ce2e80e-0af9-4162-a46d-b5d357a9362d", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制百分比堆叠柱状图\n", + "temp5.plot(\n", + " figsize=(10, 4),\n", + " kind='bar',\n", + " width=0.6,\n", + " xlabel='',\n", + " ylabel='销量占比',\n", + " stacked=True\n", + ")\n", + "plt.xticks(rotation=0)\n", + "plt.yticks(np.linspace(0, 1, 6), labels=[f'{x:.0%}' for x in np.linspace(0, 1, 6)])\n", + "plt.legend(loc='lower center')\n", + "\n", + "for i in temp5.index:\n", + " y1, y2, y3 = temp5.loc[i]\n", + " plt.text(i - 1, y2 / 2 + y1, f'{y2:.2%}', ha='center', va='center', fontdict={'size': 8})\n", + " plt.text(i - 1, y3 / 2 + y2 + y1, f'{y3:.2%}', ha='center', va='center', fontdict={'size': 8})\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "82ac6b37-302e-4913-a973-4a7f4e6daf1d", + "metadata": {}, + "source": [ + "### 作业:招聘岗位数据分析\n", + "\n", + "1. 统计出城市、招聘信息、招聘岗位的数量和平均月薪。\n", + "2. 统计每个城市的岗位数量从高到低排序。\n", + "3. 统计每个城市的平均薪资从高到低排序。\n", + "4. 统计招聘岗位对学历要求的占比。\n", + "5. 统计招聘岗位对工作年限的要求占比。\n", + "6. 分析薪资跟学历和工作年限的关系。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "91e8b171-7568-4da1-8170-5d13fc8945bc", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df = pd.read_csv('res/cleaned_jobs.csv')\n", + "jobs_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7cadf547-f7aa-494b-beeb-678f5a09e967", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计北极星指标\n", + "city_count = jobs_df['city'].nunique()\n", + "info_count = jobs_df['company_name'].count()\n", + "post_count = jobs_df['pos_count'].sum()\n", + "salary_avg = jobs_df['salary'].mean().round(1)\n", + "print(f'城市数量: {city_count}')\n", + "print(f'信息数量: {info_count}')\n", + "print(f'岗位数量: {post_count}')\n", + "print(f'平均薪资: {salary_avg}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cb9fc057-df56-4f01-a08e-ab3427dca467", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计每个城市的岗位数量从高到低排序\n", + "jobs_df.groupby('city')[['pos_count']].sum().sort_values(by='pos_count', ascending=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "430af08e-2ce2-4f4f-9d64-df2d09c7c9f3", + "metadata": {}, + "outputs": [], + "source": [ + "pd.pivot_table(\n", + " jobs_df,\n", + " index='city',\n", + " values='pos_count',\n", + " aggfunc='sum'\n", + ").sort_values(by='pos_count', ascending=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bcd4aeca-bb90-42e6-b45c-6cdea2c76271", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df.groupby('city')[['salary']].mean().round(1).sort_values(by='salary', ascending=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e93f5c8-da3e-4064-ac77-748eb3167504", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计每个城市的平均薪资从高到低排序\n", + "pd.pivot_table(\n", + " jobs_df,\n", + " index='city',\n", + " values='salary',\n", + " aggfunc='mean'\n", + ").round(1).sort_values(by='salary', ascending=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7ae067bf-2981-49f2-a224-28a5223dc21f", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df['edu'] = jobs_df.edu.astype('category').cat.reorder_categories(['学历不限', '大专', '本科', '研究生'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a87fab8f-18af-4c0b-9546-63d79778b4d8", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计招聘岗位对学历要求占比\n", + "pd.pivot_table(\n", + " jobs_df,\n", + " index='edu',\n", + " values='pos_count',\n", + " aggfunc='sum',\n", + " observed=True\n", + ").plot(\n", + " kind='pie',\n", + " ylabel='',\n", + " subplots=True,\n", + " legend=False,\n", + " autopct='%.2f%%',\n", + " pctdistance=0.85,\n", + " wedgeprops={'width': 0.35}\n", + ")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15202818-48a8-40bb-8701-47bc1b481f80", + "metadata": {}, + "outputs": [], + "source": [ + "jobs_df['year'] = jobs_df.year.astype('category').cat.reorder_categories(['应届生', '1年以内', '经验不限', '1-3年', '3-5年', '5年以上'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6d965229-36d5-4c38-850c-3300726f208a", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计招聘岗位对工作年限要求绘制饼图\n", + "pd.pivot_table(\n", + " jobs_df,\n", + " index='year',\n", + " values='pos_count',\n", + " aggfunc='sum',\n", + " observed=True\n", + ").plot(\n", + " kind='pie',\n", + " y='pos_count',\n", + " ylabel='',\n", + " legend=False,\n", + " autopct='%.2f%%',\n", + " pctdistance=0.85,\n", + " wedgeprops={'width': 0.35}\n", + ")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9c7ab419-0bc4-428f-be9b-6d4d617b6663", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计不同学历和工作年限平均薪资\n", + "temp6 = pd.pivot_table(\n", + " jobs_df,\n", + " index='edu',\n", + " columns='year',\n", + " values='salary',\n", + " observed=False,\n", + " fill_value=0\n", + ").round(1)\n", + "temp6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8492b17-5ae8-47f0-a058-ab303a6087a9", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制热力图\n", + "plt.imshow(temp6, cmap='Reds')\n", + "plt.xticks(np.arange(6), labels=temp6.columns)\n", + "plt.yticks(np.arange(4), labels=temp6.index)\n", + "\n", + "for i in range(temp6.index.size):\n", + " for j in range(temp6.columns.size):\n", + " value = temp6.iat[i, j]\n", + " color = 'w' if value > salary_avg else 'k'\n", + " plt.text(j, i, value, ha='center', va='center', color=color)\n", + "\n", + "# 定制颜色条\n", + "plt.colorbar()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a3eb8cd0-fe07-43a4-9514-8a38c08ca081", + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install seaborn" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f6287994-0865-432a-9932-92b05b5bf7e8", + "metadata": {}, + "outputs": [], + "source": [ + "import seaborn as sns\n", + "\n", + "sns.heatmap(temp6, cmap='Reds', annot=True)\n", + "plt.xlabel('')\n", + "plt.ylabel('')\n", + "plt.yticks(rotation=0)\n", + "plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Day66-80/code/day06.ipynb b/Day66-80/code/day06.ipynb new file mode 100644 index 000000000..0cdd91fa4 --- /dev/null +++ b/Day66-80/code/day06.ipynb @@ -0,0 +1,809 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e4ad3216-d45c-4328-a509-3c01e0fec4d5", + "metadata": {}, + "source": [ + "## 深入浅出pandas" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5faacbe5-d44a-4e0e-a287-19270bc1a693", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n", + "plt.rcParams['axes.unicode_minus'] = False\n", + "get_ipython().run_line_magic('config', \"InlineBackend.figure_format = 'svg'\")" + ] + }, + { + "cell_type": "markdown", + "id": "9b6101ba-4d7b-408c-8f94-76cf789ab71a", + "metadata": {}, + "source": [ + "### 科比投篮数据分析\n", + "\n", + "1. 科比使用得最多的投篮动作\n", + "2. 科比交手次数最多的球队\n", + "3. 科比有出手的比赛有多少场\n", + "4. 科比职业生涯(常规赛+季后赛)总得分(不含罚篮)\n", + "5. 科比得分最高的五场比赛(对手、投篮次数、得分、命中率)\n", + "6. 科比得分最多的三个赛季(赛季、投篮次数、得分、命中率)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5008a34f-e108-4c35-a71e-0b28e2f4cbfc", + "metadata": {}, + "outputs": [], + "source": [ + "# 不限制最大显示的列数\n", + "pd.set_option('display.max_columns', None)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "526e6f01-7b18-487a-bd89-fdc5a5a494b0", + "metadata": {}, + "outputs": [], + "source": [ + "# 加载科比投篮数据\n", + "kobe_df = pd.read_csv('res/科比投篮数据.csv', index_col='shot_id')\n", + "kobe_df.tail(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "127b7ab9-568c-4982-8238-d5fcafe6e9b2", + "metadata": {}, + "outputs": [], + "source": [ + "kobe_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4cbe2e96-0bbe-43a7-bafc-f5f956090ea4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 科比使用得最多的投篮动作是什么\n", + "kobe_df.action_type.value_counts().index[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46bd394d-05f3-4adc-bc69-d8ef70e1f877", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "kobe_df.groupby('action_type')['action_type'].count().nlargest(1).index[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3ebb1eab-a675-4f2f-833b-ff7c4207ec77", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 科比交手次数最多的球队是哪支队伍\n", + "kobe_df.drop_duplicates('game_id').opponent.value_counts().index[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b53491a-1e04-4684-94de-ddcf4eba0a9d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "kobe_df.drop_duplicates('game_id').groupby('opponent').opponent.count().idxmax()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf1b9f70-23df-4103-87db-7727329eb679", + "metadata": {}, + "outputs": [], + "source": [ + "# 科比有出手的比赛有多少场\n", + "kobe_df.game_id.nunique()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "91dfb7c3-c2d8-4eff-9df9-8be558e054d1", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 统计科比常规赛和季后赛的投篮命中率\n", + "temp = kobe_df.dropna().pivot_table(index=['playoffs', 'shot_type'], columns=['shot_made_flag'], values='game_id', aggfunc='count')\n", + "temp = temp.divide(temp.sum(axis=1), axis=0)\n", + "temp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e998d2d7-1259-479d-b74e-6e9dccf8de36", + "metadata": {}, + "outputs": [], + "source": [ + "# 填充shot_made_flag字段的缺失值\n", + "def handle(x):\n", + " playoffs, shot_type, shot_made_flag = x\n", + " if np.isnan(shot_made_flag):\n", + " shot_made_flag = 1 if np.random.random() < temp.at[(playoffs, shot_type), 1.0] else 0\n", + " return shot_made_flag\n", + "\n", + "\n", + "kobe_df['shot_made_flag'] = kobe_df[['playoffs', 'shot_type', 'shot_made_flag']].apply(handle, axis=1).astype('?')\n", + "kobe_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1fd5dbf5-e89f-4f6c-9867-cca583d08527", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 处理得分字段\n", + "kobe_df['point'] = kobe_df.shot_type.str[0].astype('i8')\n", + "kobe_df['point'] = kobe_df[['shot_made_flag', 'point']].apply(lambda x: x.loc['point'] if x.loc['shot_made_flag'] else 0, axis=1)\n", + "kobe_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b431b8a2-600e-4ed2-8092-0ed8598043f4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 参考数据:投篮命中数11719\n", + "kobe_df.shot_made_flag.sum()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2fdf20e-54d2-43b4-a5b2-52b6f7e815c4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 参考数据:不含罚篮的投篮得分25265\n", + "kobe_df.point.sum()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8929cfa5-2152-45ad-9ac4-ce9154ef8830", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 科比得分最多的赛季是哪个赛季和分数\n", + "kobe_df.groupby('season').point.sum().nlargest(3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d5bd3ccf-6654-4c84-a295-4142fd24cebe", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 获得得分最高的5场比赛的game_id\n", + "index = kobe_df.groupby('game_id').point.sum().nlargest(5).index.values\n", + "index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e48b6dfb-c6f8-40c5-beda-a1e50ae742b4", + "metadata": {}, + "outputs": [], + "source": [ + "# 用布尔索引筛选数据\n", + "kobe_df[np.in1d(kobe_df.game_id, index)]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6ba3500-a50f-400c-a2bc-7f04cf0e3393", + "metadata": {}, + "outputs": [], + "source": [ + "# 用query方法筛选数据\n", + "kobe_df.query('game_id in @index')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "171a9b52-e917-4b41-9472-80532cc1a72e", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 科比得分最高的五场比赛(对手、投篮次数、得分、命中率)\n", + "# 参考数据(含罚篮):TOR - 81分 / POR - 65分 / DAL - 62分 / NYK - 61分 / MEM - 60分 / UTA - 60分\n", + "df1 = kobe_df[np.in1d(kobe_df.game_id, index)].groupby(\n", + " 'game_id'\n", + ")[['game_date', 'opponent', 'game_id', 'shot_made_flag', 'point']].agg({\n", + " 'game_date': 'max',\n", + " 'opponent': 'max',\n", + " 'game_id': 'count',\n", + " 'shot_made_flag': 'sum',\n", + " 'point': 'sum'\n", + "})\n", + "df1['rate'] = df1.shot_made_flag / df1.game_id\n", + "df1.drop(columns=['shot_made_flag'], inplace=True)\n", + "df1.reset_index(drop=True, inplace=True)\n", + "df1.set_index('game_date', inplace=True)\n", + "df1.rename(columns={'opponent': '对手', 'game_id': '出手次数', 'point': '得分', 'rate': '命中率'}, inplace=True)\n", + "df1.sort_values(by='得分', ascending=False).style.format(formatter={'命中率': '{:.2%}'})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a129218e-4ceb-4004-9b91-5668b8d3940f", + "metadata": {}, + "outputs": [], + "source": [ + "df2 = kobe_df.query('game_id in @index').groupby(\n", + " 'game_id'\n", + ")[['game_date', 'opponent', 'game_id', 'shot_made_flag', 'point']].agg({\n", + " 'game_date': 'max',\n", + " 'opponent': 'max',\n", + " 'game_id': 'count',\n", + " 'shot_made_flag': 'sum',\n", + " 'point': 'sum'\n", + "})\n", + "df2['rate'] = df2.shot_made_flag / df2.game_id\n", + "df2.drop(columns=['shot_made_flag'], inplace=True)\n", + "df2.reset_index(drop=True, inplace=True)\n", + "df2.set_index('game_date', inplace=True)\n", + "df2.rename(columns={'opponent': '对手', 'game_id': '出手次数', 'point': '得分', 'rate': '命中率'}, inplace=True)\n", + "df2.sort_values(by='得分', ascending=False).style.format(formatter={'命中率': '{:.2%}'})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "473883f8-acfa-4d32-806d-4201c50106e1", + "metadata": {}, + "outputs": [], + "source": [ + "# 科比得分最多的三个赛季(赛季、投篮次数、得分、命中率)\n" + ] + }, + { + "cell_type": "markdown", + "id": "c1ae87f9-1698-48b8-be11-0edd47cae298", + "metadata": {}, + "source": [ + "### 深圳二手房数据分析\n", + "\n", + "1. 统计深圳二手房单价分布规律\n", + "2. 统计深圳二手房总价分布规律\n", + "3. 统计每个区总价和均价的均值\n", + "4. 深圳每个区单价Top3的商圈\n", + "5. 哪种户型的二手房数量最多\n", + "6. 总价Top10的二手房分布在哪些区" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "041ac635-9598-4a10-9e04-664678f23448", + "metadata": {}, + "outputs": [], + "source": [ + "sz_df = pd.read_csv('res/深圳二手房数据.csv')\n", + "sz_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7298d0f3-d9ef-4cee-975d-8d720711d749", + "metadata": {}, + "outputs": [], + "source": [ + "sz_df.drop(columns='Unnamed: 0', inplace=True)\n", + "sz_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f140da58-6846-4595-b365-359cdc5b7dd1", + "metadata": {}, + "outputs": [], + "source": [ + "# 修正列名\n", + "sz_df.rename(columns={'hourseType': 'house_type', 'hourseSize': 'house_size'}, inplace=True)\n", + "sz_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "731f00a4-1db8-4aac-ba7c-7bf65ebc3b40", + "metadata": {}, + "outputs": [], + "source": [ + "# 将tax字段处理为bool类型\n", + "sz_df['tax'] = sz_df.tax.fillna('').astype('?')\n", + "sz_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b2bad9d-ae47-478d-8be1-139121742012", + "metadata": {}, + "outputs": [], + "source": [ + "sz_df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a780b30-c8d8-4b40-b416-99b1d6572e26", + "metadata": {}, + "outputs": [], + "source": [ + "# 获取描述性统计信息\n", + "sz_df.total_price.agg(['mean', 'max', 'min', 'skew', 'kurt'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2b4a48c-9775-424f-bc10-4bc1d71b3813", + "metadata": {}, + "outputs": [], + "source": [ + "sz_df.unit_price.agg(['mean', 'max', 'min', 'skew', 'kurt'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c742fc0c-9b3c-4b8d-861a-879012677458", + "metadata": {}, + "outputs": [], + "source": [ + "sz_df.house_size.agg(['mean', 'max', 'min', 'skew', 'kurt'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7c6a64d4-7bd8-4d04-b9a9-3d787ba9c732", + "metadata": {}, + "outputs": [], + "source": [ + "sz_df[sz_df.unit_price < 10000]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22989374-3017-48a5-9802-61459df9f8d6", + "metadata": {}, + "outputs": [], + "source": [ + "# 删除异常数据(单价小于10000)\n", + "sz_df.drop(index=sz_df[sz_df.unit_price < 10000].index, inplace=True)\n", + "sz_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02a1b493-6d66-4e31-8298-bbe50eae9676", + "metadata": {}, + "outputs": [], + "source": [ + "# 删除面积在10平米以下200平米以上的房屋信息\n", + "sz_df.drop(index=sz_df.query('house_size < 10 or house_size > 200').index, inplace=True)\n", + "sz_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf37b2b5-1493-4a2b-a16c-3df39a99fcf2", + "metadata": {}, + "outputs": [], + "source": [ + "# 添加一个总房间数字段\n", + "sz_df['rooms_num'] = sz_df.house_type.str.extract('(\\d+)室(\\d+)厅').astype('i8').sum(axis=1)\n", + "sz_df.rooms_num.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f605b5bd-9ccf-4946-b084-c05db4b1f78b", + "metadata": {}, + "outputs": [], + "source": [ + "sz_df.tail(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f197094-8843-4575-bf87-8e6599d4dce9", + "metadata": {}, + "outputs": [], + "source": [ + "# 删除房间总数大于8个的房屋信息\n", + "sz_df.drop(index=sz_df[sz_df.rooms_num > 8].index, inplace=True)\n", + "sz_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "920d0e09-fa58-456a-9d3d-6329ef49ae67", + "metadata": {}, + "outputs": [], + "source": [ + "# 单价分布\n", + "sz_df.unit_price.plot(kind='hist', figsize=(9, 5), bins=15, ylabel='')\n", + "plt.xticks(np.arange(10000, 210001, 20000))\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2fd023d2-b7a9-456e-a58d-731512711484", + "metadata": {}, + "outputs": [], + "source": [ + "# 总价分布\n", + "sz_df.total_price.plot(kind='hist', figsize=(9, 5), bins=15, ylabel='')\n", + "plt.xticks(np.arange(100, 2901, 400))\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d1d68635-d7fa-44dc-97dc-8938cba1de80", + "metadata": {}, + "outputs": [], + "source": [ + "# 统计每个区总价和均价的均值\n", + "sz_df.pivot_table(\n", + " index='area',\n", + " values=['title', 'unit_price', 'total_price'],\n", + " aggfunc={'title': 'count', 'unit_price': 'mean', 'total_price': 'mean'}\n", + ").round(1).sort_values(\n", + " 'unit_price', ascending=False\n", + ").style.format(\n", + " formatter={\n", + " 'total_price': '¥{:.0f}万元',\n", + " 'unit_price': '¥{:,.0f}元'\n", + " }\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "538ebe18-432e-4951-8667-ce90cb73ae5e", + "metadata": {}, + "outputs": [], + "source": [ + "# 深圳每个区房屋平均单价Top3商圈\n", + "temp_df = sz_df.groupby(['area', 'position'])[['unit_price']].mean().round(1)\n", + "temp_df['rank'] = temp_df.unit_price.groupby('area').rank(method='dense', ascending=False).astype('i8')\n", + "temp_df.query('rank <= 3')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04aaf617-240d-4fc2-b702-fa3e03978347", + "metadata": {}, + "outputs": [], + "source": [ + "# 深圳每个区房屋平均单价Top3商圈\n", + "temp_df = sz_df.groupby(['area', 'position'], as_index=False)[['unit_price']].mean().round(1)\n", + "temp_df = temp_df.groupby('area')[['position', 'unit_price']].apply(lambda x: x.nlargest(3, 'unit_price'))\n", + "temp_df.style.hide(level=1).format(formatter={'unit_price': '¥{:,.0f}元'})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38f91365-6b40-40a1-b8bd-f0fd4cac9d50", + "metadata": {}, + "outputs": [], + "source": [ + "# 哪种户型的二手房数量最多\n", + "sz_df.groupby('house_type').house_type.count().nlargest(1).index[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4db07743-51cd-4007-8faf-7aa5714167d9", + "metadata": {}, + "outputs": [], + "source": [ + "# 总价Top10的二手房分布在哪些区\n", + "top10 = sz_df.total_price.nlargest(10).index.values\n", + "# 通过花式索引获取对应的行\n", + "sz_df.loc[top10].groupby('area').area.count()" + ] + }, + { + "cell_type": "markdown", + "id": "870a64c4-3164-4b96-b910-fc80b5e44853", + "metadata": {}, + "source": [ + "### 销售利润下滑诊断分析" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a1e63fa6-018e-4cea-9899-05bc4a4aaae2", + "metadata": {}, + "outputs": [], + "source": [ + "detail_df = pd.read_excel('res/商品销售明细表.xlsx', sheet_name='Sheet1')\n", + "outlet_df = pd.read_excel('res/门店信息维度表.xlsx', sheet_name='Sheet1')\n", + "commod_df = pd.read_excel('res/商品信息维度表.xlsx', sheet_name='Sheet1')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cb430f84-5628-4dc5-a63a-36520cfcf66a", + "metadata": {}, + "outputs": [], + "source": [ + "detail_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41a2f93a-3799-4253-8acd-6f3498283010", + "metadata": {}, + "outputs": [], + "source": [ + "detail_df.rename(columns={'日期(年月日)': '销售日期'}, inplace=True)\n", + "detail_df['销售日期'] = pd.to_datetime(detail_df.销售日期)\n", + "detail_df['月份'] = detail_df.销售日期.dt.month\n", + "detail_df['利润额'] = detail_df.销售额 - detail_df.成本额\n", + "detail_df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f94c5f3-c799-470f-a693-7d01afa4f88f", + "metadata": {}, + "outputs": [], + "source": [ + "temp1 = detail_df.groupby('月份')[['销售额', '利润额']].sum()\n", + "temp1['销售月环比'] = temp1.销售额.pct_change()\n", + "temp1['利润月环比'] = temp1.利润额.pct_change()\n", + "temp1[['销售额', '销售月环比', '利润额', '利润月环比']].style.format(\n", + " formatter={\n", + " '销售月环比': '{:.2%}',\n", + " '利润月环比': '{:.2%}',\n", + " },\n", + " na_rep='--------'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56b4ab4a-aa35-49c5-9387-80b545799cfe", + "metadata": {}, + "outputs": [], + "source": [ + "temp1.plot(kind='line', figsize=(9, 5), xlabel='', y=['销售额', '利润额'], color=['navy', 'coral'], marker='o')\n", + "plt.ylim(0, 1.4e7)\n", + "plt.grid(axis='y', linestyle=':', alpha=0.5)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00ca4494-b03d-4f42-890f-9d8088bb61e7", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.ticker as tkr\n", + "\n", + "ax = temp1.销售额.plot(kind='line', figsize=(9, 5), marker='o', color='navy', linestyle='--')\n", + "temp1.利润额.plot(ax=ax, kind='line', marker='*', color='darkgreen', linestyle='--', xlabel='')\n", + "plt.ylim(0, 14000000)\n", + "plt.legend(loc='lower right')\n", + "\n", + "# 基于ax构建双胞胎坐标系(共享横轴,自己定制纵轴)\n", + "ax2 = ax.twinx()\n", + "ax2.yaxis.set_major_formatter(tkr.PercentFormatter(xmax=1, decimals=0))\n", + "profs_rates = temp1.利润额 / temp1.销售额\n", + "profs_rates.plot(ax=ax2, kind='line', marker='^', color='r', linestyle=':', label='毛利率')\n", + "plt.ylim(0.45, 0.65)\n", + "plt.legend()\n", + "plt.grid(axis='y', linestyle=':', alpha=0.5)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "637679a6-f7e1-4de8-a17e-4541790aa99c", + "metadata": {}, + "outputs": [], + "source": [ + "# 事实表连接维度表\n", + "merged_df = pd.merge(detail_df, outlet_df, how='left', on='门店编码')\n", + "merged_df = pd.merge(merged_df, commod_df, how='left', on='商品编码')\n", + "august_df = merged_df.query('月份 == 8')\n", + "august_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e49793fb-6149-4366-94f6-2546fd14065e", + "metadata": {}, + "outputs": [], + "source": [ + "temp_df2 = august_df.groupby('省份')[['销售额', '成本额']].sum()\n", + "temp_df2.nlargest(10, '成本额')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6959bb8b-a2e5-4948-bfc1-c78be4a3d1d9", + "metadata": {}, + "outputs": [], + "source": [ + "temp_df2.nlargest(10, '成本额').plot(kind='bar', figsize=(9, 5), xlabel='')\n", + "plt.xticks(rotation=0)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e62162ff-af16-4a8f-af56-32b64712f0a5", + "metadata": {}, + "outputs": [], + "source": [ + "temp_df3 = august_df.query('省份 == \"湖南省\"').groupby('城市')[['销售额', '成本额']].sum()\n", + "temp_df3['利润率'] = (temp_df3.销售额 - temp_df3.成本额) / temp_df3.销售额\n", + "temp_df3.nsmallest(3, '利润率').style.format(formatter={'利润率': '{:.2%}'})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5626ff61-f8dd-4871-9ad3-28c25910d36b", + "metadata": {}, + "outputs": [], + "source": [ + "temp_df4 = august_df.query('省份 == \"湖南省\" and 城市 == \"长沙市\"').groupby('门店名称')[['销售额', '成本额']].sum()\n", + "temp_df4['利润率'] = (temp_df4.销售额 - temp_df4.成本额) / temp_df4.销售额\n", + "temp_df4.sort_values(by='利润率').style.format(formatter={'利润率': '{:.2%}'})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4993f987-37ad-450c-971a-d54ff4f6bc29", + "metadata": {}, + "outputs": [], + "source": [ + "august_df = august_df.query('省份 == \"湖南省\" and 城市 == \"长沙市\" and 门店名称 == \"长沙梅溪湖店\"')\n", + "august_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04dfedf1-e7f1-4493-be54-a32b5e997e0d", + "metadata": {}, + "outputs": [], + "source": [ + "temp_df5 = august_df.groupby('商品类别')[['销售额', '成本额']].sum()\n", + "temp_df5['利润率'] = (temp_df5.销售额 - temp_df5.成本额) / temp_df5.销售额\n", + "temp_df5.sort_values(by='利润率').style.format(formatter={'利润率': '{:.2%}'})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3fd3e504-3292-4bab-b3dd-f2f3629f6285", + "metadata": {}, + "outputs": [], + "source": [ + "temp_df6 = august_df.query('商品类别 == \"零食\"').groupby('商品名称')[['销售额', '成本额']].sum()\n", + "temp_df6['利润率'] = (temp_df6.销售额 - temp_df6.成本额) / temp_df6.销售额\n", + "temp_df6.sort_values(by='利润率').style.format(formatter={'利润率': '{:.2%}'})" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Day66-80/code/day07.ipynb b/Day66-80/code/day07.ipynb new file mode 100644 index 000000000..465b64d2b --- /dev/null +++ b/Day66-80/code/day07.ipynb @@ -0,0 +1,1246 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "89fbe897-855c-4208-848b-b411727c8eb9", + "metadata": {}, + "source": [ + "## 数据可视化详解" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5b31dba-6894-4c28-83d6-25bade7f28ed", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n", + "plt.rcParams['axes.unicode_minus'] = False\n", + "get_ipython().run_line_magic('config', \"InlineBackend.figure_format = 'svg'\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef91deaa-1007-465c-aa39-4b5198a57d84", + "metadata": {}, + "outputs": [], + "source": [ + "import warnings\n", + "\n", + "warnings.filterwarnings('ignore')" + ] + }, + { + "cell_type": "markdown", + "id": "cd7bef85-c7a9-4dad-822b-23bc36922deb", + "metadata": {}, + "source": [ + "### matplotlib\n", + "\n", + "整体架构:\n", + "1. 渲染层 - 底层的画布,图像的渲染,事件交互\n", + "2. 组件层 - 各种各样的统计图表\n", + "3. 脚本层 - 提供编程接口,通过调函数实现图表绘制\n", + "\n", + "绘图过程:\n", + "1. 创建画布 - plt.figure(figsize, dpi) --> Figure\n", + "2. 创建坐标系 - plt.subplot(nrows, ncols, index)\n", + "3. 绘制图表\n", + " - 折线图:plt.plot()\n", + " - 散点图:plt.scatter()\n", + " - 柱状图:plt.bar() / plt.barh()\n", + " - 饼图:plt.pie()\n", + " - 直方图:plt.hist()\n", + " - 箱线图:plt.boxplot()\n", + "4. 保存图表 - plt.savefig()\n", + "5. 显示图表 - plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "73435be1-71e3-49e7-8703-7479be7f2b29", + "metadata": {}, + "outputs": [], + "source": [ + "x = np.linspace(-2 * np.pi, 2 * np.pi, 120)\n", + "y1 = np.sin(x)\n", + "y2 = np.cos(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1c57ddb9-5338-4008-8878-901f92cf94ba", + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install PyQt5\n", + "# %pip install PyQt6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6ba631ea-11ed-48d3-b06b-5efd93222f07", + "metadata": {}, + "outputs": [], + "source": [ + "# 魔法指令 - 将统计图表渲染到Qt窗口\n", + "# %matplotlib qt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4c8fdb5f-18be-4ed5-826a-6b2466630154", + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(8, 4), dpi=200)\n", + "plt.subplot(1, 1, 1)\n", + "plt.plot(x, y1, label='正弦', linewidth=0.5, linestyle='--', color='#D75281')\n", + "plt.plot(x, y2, label='余弦', marker='.', color='#0096FF')\n", + "plt.legend(loc='lower center')\n", + "# plt.savefig('aa.jpg')\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "80c166e9-0a24-4593-bfde-ceb592c5949c", + "metadata": {}, + "outputs": [], + "source": [ + "# 魔法指令 - 将统计图表渲染到浏览器\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "39fbcb43-325c-4878-a23e-ae5d27ae08e2", + "metadata": {}, + "outputs": [], + "source": [ + "# 创建画布(Figure)\n", + "plt.figure(figsize=(8, 4), dpi=200)\n", + "# 创建坐标系(Axes)\n", + "ax = plt.subplot(1, 1, 1)\n", + "ax.spines['left'].set_position('center')\n", + "ax.spines['bottom'].set_position('center')\n", + "ax.spines['top'].set_visible(False)\n", + "ax.spines['right'].set_visible(False)\n", + "# 绘制折线图\n", + "plt.plot(x, y1, label='正弦', color='#D75281', linewidth=2, linestyle='-.')\n", + "plt.plot(x, y2, label='余弦', color='#0096FF', marker='.')\n", + "# 定制图表的标题\n", + "plt.title(r'$sin(\\alpha)$和$cos(\\alpha)$曲线图', fontdict=dict(fontsize=18, color='#FFFFFF', backgroundcolor='#0F3D3E'))\n", + "# 定制横轴的刻度\n", + "plt.xticks(\n", + " np.arange(-2 * np.pi, 2 * np.pi + 0.1, np.pi / 2),\n", + " labels=[r'$ -2\\pi $', r'$ -\\frac{3\\pi}{2} $', r'$ -\\pi $', r'$ -\\frac{\\pi}{2} $', \n", + " '0', r'$ \\frac{\\pi}{2} $', r'$ \\pi $', r'$ \\frac{3\\pi}{2} $', r'$ 2\\pi $']\n", + ")\n", + "# 定制纵轴的刻度\n", + "plt.yticks(np.arange(-1, 1.5, 0.5))\n", + "# 添加标注(文字和箭头)\n", + "plt.annotate(r'$ sin(\\alpha) $', xytext=(0.5, -0.5), xy=(0, 0), color='#EF5B0C', \n", + " arrowprops=dict(arrowstyle='fancy', connectionstyle='arc3, rad=0.25', color='darkgreen'))\n", + "# 定制图例\n", + "plt.legend(loc='lower right')\n", + "# 保存图表\n", + "# plt.savefig('aa.jpg')\n", + "# 显示图表\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e72a3b87-6cf9-4709-abe1-4c0a67d9fe00", + "metadata": {}, + "outputs": [], + "source": [ + "x2 = np.linspace(0.1, 10.1, 60)\n", + "y3 = np.log2(x2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf5cecf5-3316-4f60-ae1c-42ed31467827", + "metadata": {}, + "outputs": [], + "source": [ + "# 创建画布(Figure)\n", + "plt.figure(figsize=(8, 6), dpi=200)\n", + "# 创建坐标系(Axes)\n", + "plt.subplot(2, 2, 1)\n", + "# 绘制折线图\n", + "plt.plot(x, y1, label='正弦', color='#D75281', linewidth=2, linestyle='-.')\n", + "# 创建坐标系\n", + "plt.subplot(2, 2, 2)\n", + "plt.plot(x, y2, label='余弦', color='#0096FF', marker='.')\n", + "# 创建坐标系\n", + "# plt.subplot(2, 2, (3, 4))\n", + "plt.subplot(2, 1, 2)\n", + "# 绘制散点图\n", + "plt.plot(x2, y3, color='darkgreen', marker='*')\n", + "# 显示图表\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eaf4e484-6821-4e8c-839f-fae932b68224", + "metadata": {}, + "outputs": [], + "source": [ + "# 创建画布(Figure)\n", + "plt.figure(figsize=(8, 6), dpi=200)\n", + "# 创建网格对象(GridSpec)\n", + "grid = plt.GridSpec(2, 3)\n", + "plt.subplot(grid[:, 0])\n", + "plt.subplot(grid[0, 1:])\n", + "plt.subplot(grid[1, 1])\n", + "plt.subplot(grid[1, 2])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b1d3b736-da33-479c-9d0c-a84145dd59e1", + "metadata": {}, + "outputs": [], + "source": [ + "# 月收入\n", + "income = np.fromstring('5550, 7500, 10500, 15000, 20000, 25000, 30000, 40000', sep=',')\n", + "income" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cedebe7e-58eb-456a-bb64-4e75d1729c92", + "metadata": {}, + "outputs": [], + "source": [ + "# 月网购支出\n", + "outcome = np.fromstring('800, 1800, 1250, 2000, 1800, 2100, 2500, 3500', sep=',')\n", + "outcome" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eab6ea54-8c49-46a8-918c-8ce7aebd521e", + "metadata": {}, + "outputs": [], + "source": [ + "plt.scatter(income, outcome)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef2ee591-6fa9-4b02-9f10-6f2dc776b8fe", + "metadata": {}, + "outputs": [], + "source": [ + "# 网购次数\n", + "nums = np.array([5, 3, 10, 5, 12, 20, 8, 10])\n", + "nums" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "239a9331-c6a4-450c-bc13-f4e3c2969f2e", + "metadata": {}, + "outputs": [], + "source": [ + "plt.get_cmap('rainbow_r')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b9abf01-b4a8-4b3e-8de4-8284f6fc3efe", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制散点图 ---> 气泡图(引入第三个变量)\n", + "plt.scatter(income, outcome, s=nums * 30, c=nums, cmap='rainbow_r')\n", + "plt.colorbar()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3a36fd15-7779-4dd5-a205-478f1ae1506c", + "metadata": {}, + "outputs": [], + "source": [ + "data1 = np.random.randint(100, 500, 4)\n", + "data2 = np.random.randint(200, 600, 4)\n", + "data3 = np.random.randint(300, 500, 4)\n", + "quarter = np.arange(4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc760daf-de0c-478d-a84b-d5ed0fd5a5e9", + "metadata": {}, + "outputs": [], + "source": [ + "errs = np.random.randint(10, 30, 4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "143a0b7b-a3ff-4a76-9032-fd236826966c", + "metadata": {}, + "outputs": [], + "source": [ + "# 柱状图\n", + "plt.bar(quarter-0.2, data1, label='A组', hatch='//', width=0.2)\n", + "plt.bar(quarter, data2, label='B组', hatch='xxx', width=0.2)\n", + "plt.bar(quarter+0.2, data3, label='C组', width=0.2, yerr=errs)\n", + "plt.xticks(quarter, labels=[f'Q{i}' for i in range(1, 5)])\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a802edea-f2f7-4265-809f-78318d1c1e57", + "metadata": {}, + "outputs": [], + "source": [ + "# 堆叠柱状图\n", + "plt.bar(quarter, data1, label='A组', width=0.4)\n", + "plt.bar(quarter, data2, label='B组', width=0.4, bottom=data1)\n", + "plt.bar(quarter, data3, label='C组', width=0.4, bottom=data1 + data2)\n", + "plt.xticks(quarter, labels=[f'Q{i}' for i in range(1, 5)])\n", + "for i in range(quarter.size):\n", + " plt.text(i, data1[i] // 2, data1[i], ha='center', va='center', color='w')\n", + " plt.text(i, data1[i] + data2[i] // 2, data2[i], ha='center', color='w')\n", + " plt.text(i, data1[i] + data2[i] + data3[i] // 2, data3[i], ha='center', color='w')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f01c63f-f1da-42af-a04a-e45d95c2262b", + "metadata": {}, + "outputs": [], + "source": [ + "# 计算每组数据的占比\n", + "temp_df = pd.DataFrame(data={\n", + " 'A组': data1,\n", + " 'B组': data2,\n", + " 'C组': data3,\n", + "})\n", + "temp_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ccb6a3f1-ae06-467d-9cc0-00b3eb1b9b84", + "metadata": {}, + "outputs": [], + "source": [ + "pct_data = temp_df.apply(lambda x: x / temp_df.sum(axis=1))\n", + "pct_data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0ec0bf97-62ed-4ec7-8998-b1277d752098", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.ticker as tkr\n", + "\n", + "# 绘制百分比堆叠柱状图\n", + "data1, data2, data3 = pct_data.A组, pct_data.B组, pct_data.C组\n", + "plt.bar(quarter, data1, label='A组', width=0.4)\n", + "plt.bar(quarter, data2, label='B组', width=0.4, bottom=data1)\n", + "plt.bar(quarter, data3, label='C组', width=0.4, bottom=data1 + data2)\n", + "plt.xticks(quarter, labels=[f'Q{i}' for i in range(1, 5)])\n", + "# plt.yticks(np.arange(0, 1.1, 0.2), labels=[f'{i}%' for i in range(0, 101, 20)])\n", + "plt.gca().yaxis.set_major_formatter(tkr.PercentFormatter(xmax=1, decimals=0))\n", + "for i in range(quarter.size):\n", + " plt.text(i, data1[i] / 2, f'{data1[i] * 100:.1f}%', ha='center', color='w')\n", + " plt.text(i, data1[i] + data2[i] / 2, f'{data2[i] * 100:.1f}%', ha='center', color='w')\n", + " plt.text(i, data1[i] + data2[i] + data3[i] / 2, f'{data3[i] * 100:.1f}%', ha='center', color='w')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "98d80c5c-2f1a-41e2-9222-d396bb4a580e", + "metadata": {}, + "outputs": [], + "source": [ + "labels = ['苹果', '香蕉', '桃子', '荔枝', '石榴', '山竹', '榴莲']\n", + "data = np.random.randint(100, 500, 7)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33086285-51e7-40af-b7fe-c775b63beff0", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制饼图\n", + "plt.figure(figsize=(5, 5), dpi=120)\n", + "plt.pie(\n", + " data, \n", + " labels=labels, # 每块饼对应的标签\n", + " labeldistance=1.1, # 标签到圆心的距离\n", + " autopct='%.1f%%', # 自动计算和显示百分比\n", + " pctdistance=0.88, # 百分比到圆心的距离\n", + " explode=[0, 0, 0.05, 0, 0, 0, 0.1], # 分离距离(分离饼图)\n", + " shadow=True, # 阴影效果\n", + " wedgeprops={'width': 0.25, 'edgecolor': 'w'}, # 楔子属性(环状饼图)\n", + " textprops={'fontsize': 9, 'color': 'k'} # 文本属性\n", + ")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14813133-5f35-4324-b001-17708872d80c", + "metadata": {}, + "outputs": [], + "source": [ + "x1 = np.random.normal(0, 1, 5000)\n", + "x2 = np.random.random(100000)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2607c7d9-77e6-4b01-8d15-1e8b61dc6dba", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制直方图(概率密度)\n", + "plt.hist(x1, bins=20, density=True, color='darkcyan')\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3c5b6ed0-0c80-4040-8cdd-51c00083d252", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制直方图(累积分布)\n", + "plt.hist(x2, bins=10, density=True, cumulative=True)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "80db5ad6-30e8-47e3-a092-5b2a934f35c8", + "metadata": {}, + "outputs": [], + "source": [ + "data = x1[::5]\n", + "data.size" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a14acac1-6515-4b23-befc-0f28f3193b88", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制箱线图\n", + "plt.boxplot(data, showmeans=True, notch=True)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c8c5c940-7f83-4fc2-803f-b302d78cd30d", + "metadata": {}, + "outputs": [], + "source": [ + "# 堆叠折线图(面积图)\n", + "plt.figure(figsize=(8, 4))\n", + "days = np.arange(7)\n", + "sleeping = [7, 8, 6, 6, 7, 8, 10]\n", + "eating = [2, 3, 2, 1, 2, 3, 2]\n", + "working = [7, 8, 7, 8, 6, 2, 3]\n", + "playing = [8, 5, 9, 9, 9, 11, 9]\n", + "plt.stackplot(days, sleeping, eating, working, playing)\n", + "plt.legend(['睡觉', '吃饭', '工作', '玩耍'], fontsize=10)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66c79bd0-8253-4fa2-b303-696598e952d1", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "\n", + "category_names = ['Strongly disagree', 'Disagree',\n", + " 'Neither agree nor disagree', 'Agree', 'Strongly agree']\n", + "results = {\n", + " 'Question 1': [10, 15, 17, 32, 26],\n", + " 'Question 2': [26, 22, 29, 10, 13],\n", + " 'Question 3': [35, 37, 7, 2, 19],\n", + " 'Question 4': [32, 11, 9, 15, 33],\n", + " 'Question 5': [21, 29, 5, 5, 40],\n", + " 'Question 6': [8, 19, 5, 30, 38]\n", + "}\n", + "\n", + "\n", + "def survey(results, category_names):\n", + " \"\"\"\n", + " Parameters\n", + " ----------\n", + " results : dict\n", + " A mapping from question labels to a list of answers per category.\n", + " It is assumed all lists contain the same number of entries and that\n", + " it matches the length of *category_names*.\n", + " category_names : list of str\n", + " The category labels.\n", + " \"\"\"\n", + " labels = list(results.keys())\n", + " data = np.array(list(results.values()))\n", + " data_cum = data.cumsum(axis=1)\n", + " category_colors = plt.colormaps['RdYlGn'](\n", + " np.linspace(0.15, 0.85, data.shape[1]))\n", + "\n", + " fig, ax = plt.subplots(figsize=(9.2, 5))\n", + " ax.invert_yaxis()\n", + " ax.xaxis.set_visible(False)\n", + " ax.set_xlim(0, np.sum(data, axis=1).max())\n", + "\n", + " for i, (colname, color) in enumerate(zip(category_names, category_colors)):\n", + " widths = data[:, i]\n", + " starts = data_cum[:, i] - widths\n", + " rects = ax.barh(labels, widths, left=starts, height=0.5,\n", + " label=colname, color=color)\n", + "\n", + " r, g, b, _ = color\n", + " text_color = 'white' if r * g * b < 0.5 else 'darkgrey'\n", + " ax.bar_label(rects, label_type='center', color=text_color)\n", + " ax.legend(ncols=len(category_names), bbox_to_anchor=(0, 1),\n", + " loc='lower left', fontsize='small')\n", + "\n", + " return fig, ax\n", + "\n", + "\n", + "survey(results, category_names)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36499ccc-a243-4f96-9759-454d3267bb9a", + "metadata": {}, + "outputs": [], + "source": [ + "# 雷达图(极坐标折线图)\n", + "labels = np.array(['速度', '力量', '经验', '防守', '发球', '技术'])\n", + "malong_values = np.array([93, 95, 98, 92, 96, 97])\n", + "shuigu_values = np.array([30, 40, 65, 80, 45, 60])\n", + "angles = np.linspace(0, 2 * np.pi, labels.size, endpoint=False)\n", + "# 加一条数据让图形闭合\n", + "malong_values = np.append(malong_values, malong_values[0])\n", + "shuigu_values = np.append(shuigu_values, shuigu_values[0])\n", + "angles = np.append(angles, angles[0])\n", + "\n", + "# 创建画布\n", + "plt.figure(figsize=(4, 4), dpi=120)\n", + "# 创建坐标系\n", + "ax = plt.subplot(projection='polar')\n", + "# 绘图和填充\n", + "plt.plot(angles, malong_values, color='r', linewidth=2, label='马龙')\n", + "plt.fill(angles, malong_values, color='r', alpha=0.25)\n", + "plt.plot(angles, shuigu_values, color='g', linewidth=2, label='水谷隼')\n", + "plt.fill(angles, shuigu_values, color='g', alpha=0.25)\n", + "# 设置文字和网格线\n", + "# ax.set_thetagrids(angles[:-1] * 180 / np.pi, labels, fontsize=10)\n", + "# ax.set_rgrids([0, 20, 40, 60, 80, 100], fontsize=10)\n", + "ax.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e3939fab-09a5-4e21-9bee-0081acddb36e", + "metadata": {}, + "outputs": [], + "source": [ + "# 玫瑰图(圆形柱状图)\n", + "group1 = np.random.randint(20, 50, 4)\n", + "group2 = np.random.randint(10, 60, 4)\n", + "x = np.array([f'A组-Q{i}' for i in range(1, 5)] + [f'B组-Q{i}' for i in range(1, 5)])\n", + "y = np.array(group1.tolist() + group2.tolist())\n", + "theta = np.linspace(0, 2 * np.pi, x.size, endpoint=False)\n", + "width = 2 * np.pi / x.size\n", + "\n", + "# 产生随机颜色\n", + "colors = np.random.rand(8, 3)\n", + "# 将柱状图投影到极坐标\n", + "ax = plt.subplot(projection='polar')\n", + "plt.bar(theta, y, width=width, color=colors, bottom=0)\n", + "ax.set_thetagrids(theta * 180 / np.pi, x, fontsize=10)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e0390d2a-a0a9-483a-a4bc-ce1c5059f30b", + "metadata": {}, + "outputs": [], + "source": [ + "# %matplotlib qt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d338e6e3-bf98-428b-bac2-693bb1a2a760", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制3D曲面\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "\n", + "fig = plt.figure(figsize=(8, 4), dpi=120)\n", + "ax = Axes3D(fig, auto_add_to_figure=False)\n", + "fig.add_axes(ax)\n", + "x = np.arange(-2, 2, 0.1)\n", + "y = np.arange(-2, 2, 0.1)\n", + "x, y = np.meshgrid(x, y)\n", + "z = (1 - y ** 5 + x ** 5) * np.exp(-x ** 2 - y ** 2)\n", + "ax.plot_surface(x, y, z)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6dc5a3e1-d5a0-4f83-b8a6-7f69015a1ba9", + "metadata": {}, + "outputs": [], + "source": [ + "# %matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "id": "6f1235e1-f300-4ae2-ad7a-b97523bedfa1", + "metadata": {}, + "source": [ + "### seaborn\n", + "\n", + "对matplotlib进行了封装,定制了默认的样式,简化了调用matplotlib函数时需要传入的参数。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25a64a4e-bacc-43da-ad92-ec7ae2191b38", + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install seaborn" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa94138e-a642-45d5-957e-b094bdabed91", + "metadata": {}, + "outputs": [], + "source": [ + "import seaborn as sns\n", + "\n", + "# 设置使用默认主题(样式、配色方案、字体方案、……)\n", + "sns.set_theme()\n", + "# sns.set_theme(font_scale=1.2, style='darkgrid', palette='Dark2')\n", + "\n", + "plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n", + "plt.rcParams['axes.unicode_minus'] = False" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dabdf3ae-f8dc-42ad-a929-909b0325ca3e", + "metadata": {}, + "outputs": [], + "source": [ + "# import ssl\n", + "\n", + "# ssl._create_default_https_context = ssl._create_unverified_context" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ddf6e866-1115-404d-b548-a6070e4e065e", + "metadata": {}, + "outputs": [], + "source": [ + "# tips_df = sns.load_dataset('tips')\n", + "# tips_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "500326a6-ff54-41e0-8c93-8307107ce52d", + "metadata": {}, + "outputs": [], + "source": [ + "tips_df = pd.read_excel('res/tips.xlsx')\n", + "tips_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de1ae073-3fe9-45a1-9db8-21ed75de45ff", + "metadata": {}, + "outputs": [], + "source": [ + "tips_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5028d1d5-7d7e-4e34-a17b-80bae21f6324", + "metadata": {}, + "outputs": [], + "source": [ + "sns.histplot(data=tips_df, x='total_bill')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11b422dd-68ff-4881-9164-2343b7be0375", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制直方图\n", + "# kde - kernel density estimation - 拟合概率密度曲线\n", + "sns.histplot(data=tips_df, x='total_bill', kde=True, stat='density')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b05e0b86-5296-4f05-a64c-34395e9f565a", + "metadata": {}, + "outputs": [], + "source": [ + "# tips_df[['total_bill', 'tip']].corr(method='pearson')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8eb7d2be-9981-4750-999f-2335824ff619", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制点对图\n", + "# sns.pairplot(data=tips_df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c9f97b00-ecab-4602-8a9a-16cb30db687e", + "metadata": {}, + "outputs": [], + "source": [ + "# tips_df.query('sex == \"Female\"')[['total_bill', 'tip']].corr()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d8e63c7-97b6-41b4-bf97-10fc1f14617a", + "metadata": {}, + "outputs": [], + "source": [ + "# tips_df.query('sex == \"Male\"')[['total_bill', 'tip']].corr()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17ac77e3-adac-4ac9-8e7a-f911b014b598", + "metadata": {}, + "outputs": [], + "source": [ + "sns.color_palette('tab10')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d31d89fd-f0c3-4698-8a2f-85dfb7e852ea", + "metadata": {}, + "outputs": [], + "source": [ + "sns.pairplot(data=tips_df, hue='sex', palette='tab10')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1d78fd7-64d7-4c37-a738-379ca63ab7ac", + "metadata": {}, + "outputs": [], + "source": [ + "sns.color_palette('winter')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5097260c-30e6-48b7-997c-a2b815b3eb0a", + "metadata": {}, + "outputs": [], + "source": [ + "sns.pairplot(data=tips_df, hue='sex', palette='winter')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "145a9efe-30b9-447d-bcf6-c74c2233b572", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制联合分布图\n", + "sns.jointplot(data=tips_df, x='total_bill', y='tip', hue='sex')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bdcb5a1a-d27f-41d6-9d91-97fad47c31d6", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制联合分布图\n", + "sns.jointplot(data=tips_df, x='total_bill', y='tip', hue='smoker')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f83e9d23-93b3-4d1b-81cd-d7b89613aefa", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制联合分布图\n", + "sns.jointplot(data=tips_df, x='total_bill', y='tip', hue='time')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bfc55394-74e6-45d6-9a7a-68d26221a8f2", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制线性回归模型图\n", + "# lm - linear regression model\n", + "sns.lmplot(data=tips_df, x='total_bill', y='tip', hue='smoker')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef565a16-3847-466b-a22f-9ecc42265c31", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制箱线图\n", + "sns.boxplot(data=tips_df, x='day', y='total_bill')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8a36af19-38af-4f53-9c9a-f0a6ef8f02d3", + "metadata": {}, + "outputs": [], + "source": [ + "# 绘制小提琴图\n", + "sns.violinplot(data=tips_df, x='day', y='total_bill')" + ] + }, + { + "cell_type": "markdown", + "id": "27067b07-bf54-4905-a7a3-0fc9f91885f8", + "metadata": {}, + "source": [ + "### pyecharts\n", + "\n", + "对Apache的echarts库用Python语言进行了封装,可以绘制美观且交互性极佳的统计图表。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "afecad24-897c-4729-9034-640f4d38ef7e", + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install -U pyecharts" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7fab295e-0e60-4fd4-9852-77f23bea19fc", + "metadata": {}, + "outputs": [], + "source": [ + "# 配置Notebook的类型是JupyterLab\n", + "from pyecharts.globals import CurrentConfig, NotebookType\n", + "\n", + "CurrentConfig.NOTEBOOK_TYPE = NotebookType.JUPYTER_LAB" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1e12a132-50d5-40a9-9b3f-e67b224ef7cf", + "metadata": {}, + "outputs": [], + "source": [ + "from pyecharts.charts import Bar\n", + "\n", + "a = np.random.randint(10, 50, 6)\n", + "b = np.random.randint(20, 40, 6)\n", + "\n", + "bar = Bar()\n", + "bar.add_xaxis([\"衬衫\", \"羊毛衫\", \"雪纺衫\", \"裤子\", \"高跟鞋\", \"袜子\"])\n", + "bar.add_yaxis(\"商家A\", a.tolist())\n", + "bar.add_yaxis(\"商家B\", b.tolist())\n", + "bar.load_javascript()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "baaaab27-4cc2-4584-904f-9e7bc6181d95", + "metadata": {}, + "outputs": [], + "source": [ + "bar.render_notebook()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "617bd760-447e-4b6f-b6a4-41dc126be5d8", + "metadata": {}, + "outputs": [], + "source": [ + "from pyecharts.charts import Bar\n", + "from pyecharts import options as opts\n", + "from pyecharts.globals import ThemeType\n", + "\n", + "# 创建柱状图对象\n", + "bar = Bar(init_opts=opts.InitOpts(width='640px', height='480px', theme=ThemeType.LIGHT))\n", + "# 修改全局配置项\n", + "bar.set_global_opts(\n", + " # 定制标题\n", + " title_opts=opts.TitleOpts(\n", + " title='2022年各品类销售额',\n", + " title_link='http://www.qfedu.com',\n", + " pos_left='center'\n", + " ),\n", + " # 定制图例\n", + " legend_opts=opts.LegendOpts(\n", + " is_show=True,\n", + " pos_top='bottom'\n", + " ),\n", + " # 定制工具箱\n", + " toolbox_opts=opts.ToolboxOpts(\n", + " is_show=True,\n", + " pos_left='right',\n", + " pos_top='center',\n", + " orient='vertical'\n", + " )\n", + ")\n", + "# 添加横轴的数据\n", + "bar.add_xaxis([\"衬衫\", \"羊毛衫\", \"雪纺衫\", \"裤子\", \"高跟鞋\", \"袜子\"])\n", + "# 添加纵轴的数据\n", + "bar.add_yaxis(\"商家A\", [5, 20, 36, 10, 45, 20])\n", + "bar.add_yaxis(\"商家B\", [15, 22, 23, 18, 37, 40])\n", + "# 让浏览器加载JS文件\n", + "bar.load_javascript()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5734c4da-1a6d-418a-a2d4-80dfb196da8b", + "metadata": {}, + "outputs": [], + "source": [ + "# 渲染图表\n", + "bar.render_notebook()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6ce15768-7810-4699-9795-51188525ca83", + "metadata": {}, + "outputs": [], + "source": [ + "from pyecharts import options as opts\n", + "from pyecharts.charts import Funnel\n", + "\n", + "steps = ['曝光', '点击', '加购', '下单', '支付']\n", + "vdata = [10000, 5000, 2000, 1200, 880]\n", + "\n", + "f = Funnel()\n", + "f.add('转化', [(step, data) for step, data in zip(steps, vdata)])\n", + "f.set_global_opts(\n", + " title_opts=opts.TitleOpts(\n", + " title='转化漏斗',\n", + " pos_left='10%',\n", + " title_textstyle_opts=opts.TextStyleOpts(\n", + " font_family='微软雅黑',\n", + " font_size=28\n", + " )\n", + " )\n", + ")\n", + "f.load_javascript()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "78f6113c-ad73-4a25-a20b-b88b0b8fbd58", + "metadata": {}, + "outputs": [], + "source": [ + "f.render_notebook()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2add1c4-9ab9-493e-b312-a972795c6f3c", + "metadata": {}, + "outputs": [], + "source": [ + "# 注意:pyecharts库只接受原生Python数据类型(不支持numpy的数组,pandas的Series或DataFrame)\n", + "import pyecharts.options as opts\n", + "from pyecharts.charts import Pie\n", + "\n", + "x_data = [\"直接访问\", \"邮件营销\", \"联盟广告\", \"视频广告\", \"搜索引擎\"]\n", + "y_data = [335, 310, 234, 135, 1548]\n", + "\n", + "pie_chart = Pie(init_opts=opts.InitOpts(\n", + " width='640px',\n", + " height='480px'\n", + "))\n", + "pie_chart.add(\n", + " series_name=\"访问来源\",\n", + " data_pair=[list(z) for z in zip(x_data, y_data)],\n", + " radius=[\"50%\", \"70%\"],\n", + " label_opts=opts.LabelOpts(is_show=False, position=\"center\"),\n", + ")\n", + "pie_chart.set_global_opts(legend_opts=opts.LegendOpts(pos_left=\"legft\", orient=\"vertical\"))\n", + "pie_chart.set_series_opts(\n", + " label_opts=opts.LabelOpts(formatter=\"{b}: {d}%\")\n", + ")\n", + "pie_chart.load_javascript()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1b478655-7774-43ad-a0ca-1fd7082d5a4d", + "metadata": {}, + "outputs": [], + "source": [ + "pie_chart.render_notebook()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "be870fdd-71c9-4b98-82bf-dc5f7c81503a", + "metadata": {}, + "outputs": [], + "source": [ + "baidu = pd.read_excel('res/2022年股票数据.xlsx', sheet_name='BIDU', index_col='Date')\n", + "baidu = baidu[::-1][['Open', 'Close', 'Low', 'High']].round(2)\n", + "baidu" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e5eec52d-7ba1-4c30-8fcf-582971528ac1", + "metadata": {}, + "outputs": [], + "source": [ + "from pyecharts import options as opts\n", + "from pyecharts.charts import Kline\n", + "\n", + "x_data = baidu.index.strftime('%m月%d日').values.tolist()\n", + "y_data = baidu.values.tolist()\n", + "\n", + "kline_chart = Kline()\n", + "kline_chart.add_xaxis(x_data)\n", + "kline_chart.add_yaxis('', y_data)\n", + "kline_chart.set_global_opts(\n", + " xaxis_opts=opts.AxisOpts(is_scale=True),\n", + " yaxis_opts=opts.AxisOpts(\n", + " is_scale=True,\n", + " splitarea_opts=opts.SplitAreaOpts(\n", + " is_show=True,\n", + " areastyle_opts=opts.AreaStyleOpts(opacity=1)\n", + " ),\n", + " ),\n", + " datazoom_opts=[\n", + " opts.DataZoomOpts(\n", + " pos_bottom='2%',\n", + " range_start=40,\n", + " range_end=60,\n", + " )\n", + " ],\n", + ")\n", + "kline_chart.load_javascript()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26dfb364-9edd-4a0a-8ac1-4b7146971e44", + "metadata": {}, + "outputs": [], + "source": [ + "kline_chart.render_notebook()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0ce7fde5-2f11-48f3-a4fc-2d9f3303a13b", + "metadata": {}, + "outputs": [], + "source": [ + "# 安装地图数据\n", + "# %pip uninstall -y echarts-countries-pypkg\n", + "# %pip uninstall -y echarts-china-provinces-pypkg\n", + "# %pip uninstall -y echarts-china-cities-pypkg\n", + "# %pip uninstall -y echarts-china-counties-pypkg" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "97d8b5ca-6c66-4500-a74a-8fa521a51549", + "metadata": {}, + "outputs": [], + "source": [ + "from pyecharts import options as opts\n", + "from pyecharts.charts import Map\n", + "\n", + "data = [\n", + " ('广东省', 594), ('浙江省', 438), ('四川省', 316), ('北京市', 269), ('山东省', 248),\n", + " ('江苏省', 234), ('湖南省', 196), ('福建省', 166), ('河南省', 153), ('辽宁省', 152),\n", + " ('上海市', 138), ('河北省', 86), ('安徽省', 79), ('湖北省', 75), ('黑龙江省', 70), \n", + " ('陕西省', 63), ('吉林省', 59), ('江西省', 56), ('重庆市', 46), ('贵州省', 39),\n", + " ('山西省', 37), ('云南省', 33), ('广西壮族自治区', 28), ('天津市', 22), ('新疆维吾尔自治区', 24),\n", + " ('海南省', 18), ('台湾省', 11), ('甘肃省', 7), ('青海省', 3), ('内蒙古自治区', 17), \n", + " ('宁夏回族自治区', 1), ('西藏自治区', 1), ('香港特别行政区', 12), ('澳门特别行政区', 2)\n", + "]\n", + "\n", + "map_chart = Map(init_opts=opts.InitOpts(width='1000px', height='1000px'))\n", + "map_chart.add('', data, 'china', is_roam=False)\n", + "map_chart.load_javascript()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cb71a3f2-f08b-435a-ad75-a4b6a6be6f91", + "metadata": {}, + "outputs": [], + "source": [ + "map_chart.render_notebook()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Day66-80/res/contents_of_data_analysis.png b/Day66-80/res/contents_of_data_analysis.png new file mode 100644 index 000000000..3599354d5 Binary files /dev/null and b/Day66-80/res/contents_of_data_analysis.png differ diff --git "a/Day81-90/82.k\346\234\200\350\277\221\351\202\273\347\256\227\346\263\225.md" "b/Day81-90/82.k\346\234\200\350\277\221\351\202\273\347\256\227\346\263\225.md" index a8c6134be..4e865b669 100644 --- "a/Day81-90/82.k\346\234\200\350\277\221\351\202\273\347\256\227\346\263\225.md" +++ "b/Day81-90/82.k\346\234\200\350\277\221\351\202\273\347\256\227\346\263\225.md" @@ -36,7 +36,7 @@ $$ ### 数据集介绍 -接下来为大家隆重介绍一下我们后续会使用到的一个重要的数据集——鸢尾花数据集(iris dataset)。鸢尾花数据集是机器学习领域中最著名、最经典的数据集之一,由英国统计学家 *Ronald A. Fisher* 于 1936 年在他的论文*《The Use of Multiple Measurements in Taxonomic Problems》*中首次引入,被广泛用于机器学习算法的入门和实验。 +接下来为大家隆重介绍一下我们后续会使用到的一个重要的数据集——鸢尾花数据集(iris dataset)。鸢尾花数据集是机器学习领域中最著名、最经典的数据集之一,由植物学家 *Edgar S. Anderson* 在加拿大魁北克加斯帕半岛采集,由英国统计学家 *Ronald A. Fisher* 于 1936 年在他的论文*《The Use of Multiple Measurements in Taxonomic Problems》*中首次引入,被广泛用于机器学习算法的入门和实验。 diff --git "a/Day81-90/88.\347\245\236\347\273\217\347\275\221\347\273\234\346\250\241\345\236\213.md" "b/Day81-90/88.\347\245\236\347\273\217\347\275\221\347\273\234\346\250\241\345\236\213.md" index 545b3ca2c..7a199eea3 100755 --- "a/Day81-90/88.\347\245\236\347\273\217\347\275\221\347\273\234\346\250\241\345\236\213.md" +++ "b/Day81-90/88.\347\245\236\347\273\217\347\275\221\347\273\234\346\250\241\345\236\213.md" @@ -148,7 +148,7 @@ weighted avg 0.45 0.37 0.23 30 > **注意**:由于创建`MLPClassifier`时没有指定`random_state`参数,所以代码每次执行的结果可能并不相同。 -模型的预测准确率只有`0.37`,大家对这个结果是不是感觉到非常失望,我们煞费苦心构建的模型预测效果竟然如此拉胯。别紧张,上面代码中我们创建神经网络模型时,`hidden_layer_sizes`参数设置的是`(1, )`,它表示我们的网络只有 1 个隐藏层,而且隐藏层只有 1 个神经元。我们只要增加隐藏层和神经元的数量,让模型可以更好的学习特征和目标之间的映射关系,预测的效果就会好起来。下面,我们将`hidden_layer_sizes`参数调整为`(32, 32, 32)`,即模型有三个隐藏层,每层有 32 个神经元,再来看看代码执行的结果。 +模型的预测准确率只有`0.37`,大家对这个结果是不是感觉到非常失望,我们煞费苦心构建的模型预测效果竟然如此拉胯。别紧张,上面代码中我们创建神经网络模型时,`hidden_layer_sizes`参数设置的是`(1, )`,它表示我们的网络只有 1 个隐藏层,而且隐藏层只有 1 个神经元,这个神经元承受了太多(它真的,我哭死)。接下俩,我们需要增加隐藏层和神经元的数量,让模型可以更好的学习特征和目标之间的映射关系,这样预测的效果就会好起来。下面,我们将`hidden_layer_sizes`参数调整为`(32, 32, 32)`,即模型有三个隐藏层,每层有 32 个神经元,再来看看代码执行的结果。 ```python from sklearn.datasets import load_iris @@ -176,7 +176,7 @@ y_pred = model.predict(X_test) print(classification_report(y_test, y_pred)) ``` -> **说明**:大家可以试着运行上面的代码,看看有没有获得更好的结果。当然,大家也可以重新设置`hidden_layer_sizes`参数,看看会得到怎样的结果。 +> **说明**:大家可以试着运行上面的代码,看看有没有获得更好的结果。当然,模型准确率为 1 也未必就值得高兴,因为你可能训练了一个过拟合的模型。无论如何,大家可以试着重新设置`hidden_layer_sizes`参数,看看会得到怎样的结果。 下面,我们还是对`MLPClassifier`几个比较重要的超参数做一个说明。 @@ -396,7 +396,7 @@ Test MSE: 8.7226 Test R2: 0.8569 ``` -通过上面的输出可以看到,随着神经网络模型不断的前向传播和反向传播,损失变得越来越小,模型的拟合变得越来越好。在预测的时候,我们利用训练得到的模型参数进行一次正向传播,就完成了从特征到目标值的映射,模型评估的两个指标看起来还不错。目前,神经网络被广泛应用于模式识别、图像处理、语音识别等领域,是深度学习中最核心的技术。对深度学习有兴趣的读者,可以关注我的另一个项目[“深度学习就是大力出奇迹”](https://github.com/jackfrued/Deep-Learning-Is-Nothing),目前该项目仍然在创造更新中。 +通过上面的输出可以看到,随着神经网络模型不断的前向传播和反向传播,损失变得越来越小,模型的拟合变得越来越好。在预测的时候,我们利用训练得到的模型参数进行一次正向传播,就完成了从特征到目标值的映射,评估回归模型的两个指标 MSE 和 $\small{R^{2}}$ 看起来还不错哟。目前,神经网络被广泛应用于模式识别、图像处理、语音识别等领域,是深度学习中最核心的技术。对深度学习有兴趣的读者,可以关注我的另一个项目[“深度学习就是大力出奇迹”](https://github.com/jackfrued/Deep-Learning-Is-Nothing),目前该项目仍然在创作更新中。 ### 模型优缺点 diff --git "a/Day81-90/90.\346\234\272\345\231\250\345\255\246\344\271\240\345\256\236\346\210\230.md" "b/Day81-90/90.\346\234\272\345\231\250\345\255\246\344\271\240\345\256\236\346\210\230.md" index 0746d1394..36c0d143c 100755 --- "a/Day81-90/90.\346\234\272\345\231\250\345\255\246\344\271\240\345\256\236\346\210\230.md" +++ "b/Day81-90/90.\346\234\272\345\231\250\345\255\246\344\271\240\345\256\236\346\210\230.md" @@ -379,9 +379,9 @@ test['FamilySize'] = test['SibSp'] + test['Parch'] + 1 # 删除多余特征 test.drop(columns=['Name', 'Ticket', 'SibSp', 'Parch'], inplace=True) -# 逻辑回归模型 +# 使用逻辑回归模型 passenger_id, X_test = test.index, test -# XGBoost模型 +# 使用XGBoost模型 # passenger_id, X_test = test.index, xgb.DMatrix(test) y_test_pred = model.predict(X_test) diff --git "a/Python\345\255\246\344\271\240\350\265\204\346\272\220\346\261\207\346\200\273.md" "b/Python\345\255\246\344\271\240\350\265\204\346\272\220\346\261\207\346\200\273.md" new file mode 100755 index 000000000..f913353a8 --- /dev/null +++ "b/Python\345\255\246\344\271\240\350\265\204\346\272\220\346\261\207\346\200\273.md" @@ -0,0 +1,108 @@ +## Python学习资源汇总 + +最近有很多小伙伴在找 Python 的相关学习资源,给大家做一个汇总吧,大家就不需要到处打听了,而且网上的资源良莠不齐,给大家整理一些优质的资源,让大家少走弯路。温馨提示一下,下面的资源选一些适合自己的就行了,并非每个都值得学习和研究。 + +### Python学习教程 + +#### 图文教程 + +1. [《从零开始学Python》](https://www.zhihu.com/column/c_1216656665569013760)- 我自己在知乎创作的专栏,欢迎大家打卡学习 +2. [《基于Python的数据分析》](https://www.zhihu.com/column/c_1217746527315496960)- 我自己在知乎创作的专栏,欢迎大家学习交流 +3. [《说走就走的AI之旅》](https://www.zhihu.com/column/c_1628900668109946880)- 我自己在知乎创作的专栏,欢迎大家学习交流 +4. [《Python - 100天从新手到大师》](https://github.com/jackfrued/Python-100-Days) - 我自己在 GitHub 分享的 Python 学习项目 +5. [《Python 3教程》](https://www.runoob.com/python3/python3-tutorial.html)- 菜鸟教程上的 Python 课程,上面还有很多其他学习资源 +6. [《Python教程》](https://liaoxuefeng.com/books/python/introduction/index.html)- 廖雪峰个人网站上的 Python 课程,上面还有一些其他学习资源 + +#### 视频教程 + +1. [《从零开始学Python》](https://space.bilibili.com/1177252794/lists/1222205)- 我自己分享在 Bilibili 的 Python 入门视频 +2. [《快速上手Python语言》](https://www.zhihu.com/education/video-course/1491848366791700480)- 在知乎知学堂上传的一套之前讲课的随堂视频 +3. [《Python进阶课程》](https://space.bilibili.com/1177252794/lists/4128173)- 我自己分享在 Bilibili 的 Python 进阶随堂视频 +4. [《Python数据分析三剑客》](https://space.bilibili.com/1177252794/lists/502289)- 我自己分享在 Bilibili 的 Python 数据分析随堂视频 +5. [《AI Python for Beginners》](https://www.deeplearning.ai/short-courses/ai-python-for-beginners/)- 吴恩达(Andrew Ng)老师的 Python 入门课 +6. [《AI for Everyone》](https://www.deeplearning.ai/courses/ai-for-everyone/)- 吴恩达(Andrew Ng)老师的 AI 通识课 +7. [《Deep Learning Specilizaiton》](https://www.deeplearning.ai/courses/deep-learning-specialization/)- 吴恩达(Andrew Ng)老师的深度学习专项课程 +8. [《100 Days of Code: The Complete Python Pro Bootcamp》](https://www.udemy.com/course/100-days-of-code/) - Udemy 上很受欢迎的一整套 Python 课程(付费) +9. [《Python for Data Science and Machine Learning Bootcamp》](https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/) - Udemy 上一套评分很高的数据科学课程(付费) +10. [《PyTorch: Deep Learning and Artificial Intelligence》](https://www.udemy.com/course/pytorch-deep-learning/) - Udemy 好评课程(付费) + +> **说明**:吴恩达老师的课程在 YouTube 和 Bilibili 上也有很多人分享,YouTube 上面也有很多免费的 Python 课程曾经让我觉得受益匪浅。这些课程很多都是言简意赅、直击问题的,不像国内很多培训机构,动不动就分享七百集的课程或者八百G的学习资料,让很多人误以为点赞收藏就是学会。国内外各种学习平台也很多,有人喜欢 Udemy,有人喜欢 Coursera,我只是把我自己看过觉得不错的课程分享出来,大家可以根据需要自己去对应的平台查找,当然更重要的是有计划的学习起来。 + +#### 资源网站 + +1. [Python 官方网站](https://python.org) - 下载 Python 解释器、查看官方文档、了解社区动态等 +2. [Online Python](