Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[译] [205] 我们的第一个编程练习 #35

Open
cssmagic opened this issue Mar 1, 2024 · 0 comments
Open

[译] [205] 我们的第一个编程练习 #35

cssmagic opened this issue Mar 1, 2024 · 0 comments

Comments

@cssmagic
Copy link
Owner

cssmagic commented Mar 1, 2024

2.5 Our first programming problem

2.5 我们的第一个编程练习

The goal of this next section is twofold: (1) for you to see the workflow of interacting with Copilot and (2) for you to gain an appreciation of how powerful Copilot can be by seeing it solve a complicated task fairly easily.

本节有两个主要目标:(1) 展现与 Copilot 交互的整个工作流程;(2) 通过观察 Copilot 如何轻松解决一个相当复杂的任务,让你感受 Copilot 的强大能力。

In our next chapter, we’ll talk through the workflow with Copilot in more detail, but you’ll generally use the following steps when authoring code with Copilot:

在下一章中,我们将更深入地探讨与 Copilot 协作的工作流。不过一般来说,当你借助 Copilot 编写代码时,大体遵从以下几个步骤:

  1. Write a prompt to Copilot using comments (#) or docstrings (""").

  2. Let Copilot generate code for you.

  3. Check to see whether the code is correct by reading through it and by testing.

    1. If it works, move to step 1 for the next thing you’d like it to do.
    2. If it doesn’t work, delete the code from Copilot and go back to step 1 and modify the prompt (and see suggestions in table 2.1).
  4. 通过注释(#)或文档字符串(""")向 Copilot 提供提示词。

  5. 让 Copilot 生成代码。

  6. 通过阅读和测试代码来确认其是否正确。

    1. 如果代码运行正常,则可以移至第 1 步,继续进行下一个任务。
    2. 如果代码运行不正常,则需要删除 Copilot 生成的代码,返回第 1 步并修改提示词(参见表 2.1 中的建议)。

Because you’ve just started working with Copilot, we’re wary of showing you such a large example, but we feel you’ll value seeing how powerful Copilot can be now that you have it installed. As such, we want you to follow along as best as you can to get a feel for working with Copilot on your own, but if you get stuck, just read along and save working along with Copilot in VS Code for the next chapter. Later chapters will explain the process of working with Copilot in more detail. Also, Copilot will generate a lot of code in this section, and we don’t expect you to understand the code until much later in the book. We provide the code solely so you can see what Copilot gave us, but do not feel as though you need to try to understand the code in this chapter.

由于你刚开始尝试 Copilot,我们对于展示这么大一个例子还是比较谨慎的。但我们认为,让你见证 Copilot 的强大能力,特别是在你亲手安装它之后,将极具价值。因此,我们希望你尽可能地跟着操作,以便获得最真实的 Copilot 使用体验。但如果你在途中卡住了,也可以继续往下读,把实践操作留到下一章也没什么问题。后续章节会对 Copilot 的协作过程进行更深入的讲解。另外,在这一节里 Copilot 将生成许多代码,我们并不要求你在完成后面几章的学习之前立刻理解这些代码。我们提供这些代码,仅仅是为了让你感受 Copilot 生成的内容,但你并不需要在本章就理解这些代码。

To get started, let’s create a new file. If you aren’t already in VS Code, go ahead and start it. Then create a new Python file and save it as nfl_stats.py.

开始之前,我们需要创建一个新文件。如果你现在还没有进入 VS Code,那就先启动它。然后创建一个新的 Python 文件,并将其命名为 nfl_stats.py

2.5.1 Showcasing Copilot’s value in a data processing task

2.5.1 展现 Copilot 在数据处理上的能力

We want to start with some basic data processing as this is something that many of you have likely done in your personal or professional lives. To find a dataset, we went to a great website called Kaggle [4], which has tons of datasets freely available for use. Many of them include important data like health statistics for different countries, information to help track the spread of disease, and so on. We’re not going to use those because we’d like to have something lighter for our first program. Since both of us are American football fans, we felt we should play with the National Football League (NFL) offensive stats database.

我们决定从一些基础的数据处理任务开始,这可能是许多人在生活或工作中经常遇到的事务。在寻找合适的数据素材时,我们用到了一个名为 “Kaggle” [4] 的优秀网站,该网站免费提供了丰富的数据集。这些数据集涵盖了众多领域,比如不同国家的健康统计数据或是疾病传播的追踪信息等等。不过我们没有选用这类数据,因为我们希望我们的第一个编程任务更加轻松有趣。由于我们两位作者都是橄榄球运动的忠实粉丝,我们决定选择美国橄榄球联盟(National Football League,NFL)的赛事统计数据来展开我们的数据处理实践。

Let’s get started by downloading the dataset from www.kaggle.com/datasets/dtrade84/nfl-offensive-stats-2019-2022.

我们先通过以下地址来下载数据集:www.kaggle.com/datasets/dtrade84/nfl-offensive-stats-2019-2022

To download the dataset, you will have to sign up for a Kaggle account. If you don’t want to create the account, it’s okay to just read through this section without using VS Code and Copilot to generate the code yourself. Once downloaded, you may need to extract the zip file using the default zip extractor on your computer. Copy the dataset file from that zip file into your current folder in VS Code where you have your code (the folder you have open in Explorer). (If you are on a Mac and the file is saved as a .numbers file, you will need to use File > Export To and save the file as a CSV in your current working directory.) That dataset has NFL information from 2019 to 2022 (figure 2.4).

为了下载数据集,你需要注册一个 Kaggle 账户。如果你不想注册账户,那么在本节里光看不练也是可以的。下载完成后,你可能需要使用电脑自带的解压缩工具来解压 zip 文件。把 zip 文件解压得到的数据集文件复制到 VS Code 工作目录,也就是你在资源管理器中打开的那个包含你代码的文件夹。(如果你是 Mac 用户,并且你下载解压得到的文件是 .numbers 格式,那你可能需要使用 “文件→导出”菜单将它转换成 CSV 格式。)该数据集含有 2019 到 2022 年的 NFL 信息(如图2.4所示)。

Figure 2.4 The first few columns and rows of the nfl_offensive_stats.csv dataset
图 2.4 nfl_offensive_stats.csv 数据集的前几行和前几列。

The nfl_offensive_stats.csv file is something known as a comma separated value text file (see figure 2.4 for a portion of the file). This is a standard format for storing data. It has a header row at the top that explains what’s in every column. The way that we (or a computer) know the boundaries between columns is to use commas between cells. Also notice that each row is placed on its own line. Good news: Python has a bunch of tools for reading in CSV files.

这个 nfl_offensive_stats.csv 文件是一种名为 “逗号分隔值” 的文本文件格式(图 2.4 展示了它的部分内容)。这是一种用于存储数据的标准格式。它的结构类似表格,最顶部是标题行,解释了每一列数据的含义;而每一列的边界则是通过各个值之间的逗号来划分的;此外,表格的每一行都放置在独立的文本行中。好消息是 Python 有一大堆工具可以读取 CSV 文件。

Step 1: How many passing yards did Aaron Rodgers throw in 2019–2022

第 1 步:2019~2022 年,阿隆·罗杰斯总共投掷了多少传球码数

Let’s start by exploring what is stored in this file. To preview what is in the file, you can look at the Kaggle webpage for these stats under “Detail”, open it in VS Code, or open it in spreadsheet software like Microsoft Excel. (If you open it with Excel, be sure not to save the file. We need to leave the file in a .csv format.) Whichever way you choose to open it, here’s the start of the header (top) row (also shown in figure 2.4):

我们先来探索一下这个文件包含了哪些数据。为了预览文件内容,你既可以在 Kaggle 网页上查看这些统计数据的“详细信息”,也可以在 VS Code 中打开它,还可以在 Microsoft Excel 等电子表格软件中查看。(如果使用 Excel 查看,请务必不要保存文件,因为我们需要维持文件的 .csv 格式。)不论你采用何种方式打开它,可以看到最顶部的标题行是这样的(与你在图 2.4 中看到的相同):

game_id,player_id,position ,player,team,pass_cmp,pass_att,pass_yds,...

There are more columns, but these have all we need to perform our first task. We know now that there’s a column for players and a column for passing yards. Aaron Rodgers is a player who gets passing yards in each game that he plays. But how many passing yards does he have in total, over all the games that he played from 2019 to 2022? This isn’t so easy to answer by looking directly at the file. So, we want the computer to make this easier for us!

除了我们展示的这几列,数据表后面还有很多列,但前几列信息就足够我们完成首个任务了。我们现在知道,表中有一列记录的是球员姓名,还有一列记录的是传球码数。阿隆·罗杰斯(Aaron Rodgers)是一位出色的球员,他在每场比赛中都能够贡献传球码数。但在 2019 到 2022 这段时间内,他总共拿到了多少传球码数呢?仅靠肉眼浏览文件就想答出这个问题基本不太可能。因此,我们需要计算机来帮助我们完成这个不可能的任务!

We want it to sum up all the passing yards (pass_yds) for rows (games) where Aaron Rodgers is the player. For now, we’re going to just ask for all the yards in the database even though it covers multiple seasons. We can change this later if we’d like. This problem might be a good problem to give to programmers learning to program in their fourth week of a standard college-level introductory programming course, but we have Copilot!

我们希望找到数据表中球员姓名(player)标记为“阿隆·罗杰斯”的所有行,并将对应的传球码数(pass_yds)累加起来。也就是说,我们目前设定的问题就是统计他在整张数据表中的传球码数总和。(这个统计结果涵盖了多个赛季,似乎没啥实际意义,不过我们随时可以对统计范围进行调整。)如果要把这个题目放在标准大学水平的编程入门课程中,那至少应该是第四周的水平了,但别忘了我们有 Copilot!

So instead of learning how to write this code from scratch, we’re just going to ask Copilot to generate it for us. To make that happen, we need to be quite specific in our request to make sure Copilot knows what we are asking for. We’re only going to ask it to perform small amounts of work and then re-prompt it to perform the next step. Later we’ll discuss how to write good prompts, but for now, just go ahead and use what we’ve written by placing this text at the top of your new file:

因此,我们并不需要从头开始学习如何编写代码,而是直接让 Copilot 来为我们生成代码。为了确保 Copilot 理解我们的意图,我们需要在提出要求时极其明确。具体来说,我们会要求它每次只完成一小部分工作,然后再次提示它进行下一步操作。我们稍后会详细探讨如何撰写有效的提示词,但目前,你只需要跟着做,将以下文本填入到新文件的开头即可:

"""
open the csv file called "nfl_offensive_stats.csv" and read in
the csv data from the file
"""

The """ at the top and the bottom are surrounding something called a docstring. Docstrings are an alternative way of commenting (similar to text starting with #). They are commonly used for describing functions (see chapter 3 for details on functions), but we use them in this example to avoid Copilot continually generating comments (see the Comments Only problem in table 2.1). Given this prompt, Copilot should start generating code. For us, it produced this block of code:

顶部和底部由 """ 符号围绕的内容,称作“文档字符串”(docstring)。文档字符串是另一种书写注释的方式(我们在前面已经用过以 # 开头的那种单行注释)。它们常常用于描述函数的功能(函数详见第 3 章),但在本例中,我们使用这种注释是为了避免 Copilot 持续生成注释的问题(参见表 2.1 中的“仅得到注释”问题)。在给出这样的提示词之后,Copilot 应该可以生成代码了。对我们而言,它生成的代码是这样的:

import csv
with open('nfl_offensive_stats.csv', 'r') as f:
    reader = csv.reader(f)
    nfl_data = list(reader)
原文 译文 备注
Notice the file name. 请注意这里的文件名。

First, for the purpose of reading this book, we want to remind you that the prompt is displayed differently than what Copilot produces. This is intentional so you can tell what we wrote (and you should write) and what Copilot wrote.

首先,为了便于本书的阅读,我们想提醒你,提示词的展示形式与 Copilot 生成的内容有所不同。这是故意为之的,以便你能清楚地区分我们所编写的内容(也就是你应编写的内容)以及 Copilot 所生成的内容。

Second, the code produced by Copilot is quite reasonable. We don’t expect you to understand the code at this point in the book, but you can likely see the name of the file we wanted opened and some code about opening and reading in the file. Later in the book, we’ll learn how to read through the code. For now, just keep following along.

其次,Copilot 生成的代码相当合理。我们并不期望你在这个阶段就能完全理解代码,但你可能已经在代码中看到我们想要打开的文件名,以及一些有关打开和读取文件的操作。在本书的后续章节中,我们将引导你学习如何阅读这些代码。在现阶段,你只需要跟着做就行。

Now that we have the data from the file, we’re going to give it a new prompt to ask it to sum all the passing yards for Aaron Rodgers in this dataset. Because the computer doesn’t know what football is or specifics like that Aaron Rodgers is a quarterback, our prompt is going to be quite specific. We’ll teach you how to write prompts like this over the course of the book. Here is the new prompt:

现在我们已经从文件里提取出了数据,下一步我们会给出一段新的提示词,目的是让它计算数据表中阿隆·罗杰斯的传球码数总和。鉴于计算机既不了解橄榄球这项运动,也不了解阿隆·罗杰斯是一名四分卫这样的具体信息,我们的提示词需要极为明确。在本书的学习过程中,我们将引导你学会如何撰写这样的提示词。以下是新一轮的提示词:

"""
In the data we just read in, the fourth column is the player
and the 8th column is the passing yards. Get the sum of
yards from column 8 where the 4th column value is
"Aaron Rodgers"
"""

Notice that we tell the computer which columns are for players and which are for passing yards. That’s to tell the computer how to interpret the data. Also, notice that we say specifically that we only want to sum the yards in the case that the player’s name is Aaron Rodgers. Again, we’ll teach you how to write prompts like this as we move forward in the book. Given this prompt, Copilot then produced the following code:

请留意我们是如何告诉计算机哪一列是球员名称、哪一列是传球码数的。这样做是为了让计算机知道如何对数据进行解析。此外,请注意我们特别强调只在球员名为阿隆·罗杰斯的情况下对码数进行累加。再次提醒,我们会在这本书的后续章节向你讲解如何撰写这类提示词。根据这段提示词,Copilot 生成了如下代码:

passing_yards = 0
for row in nfl_data:
   if row[3] == 'Aaron Rodgers':
       passing_yards += int(row[7])
print(passing_yards)
Reminder: Copilot is nondeterministic
提醒:Copilot 具有不确定性

Remember from the Introduction that Copilot is nondeterministic, so what Copilot gives you may not match what it gives us. This is going to be a challenge for the rest of the book: What do you do if you get a wrong result when we get a right result? We’re fairly confident that Copilot will give you a correct answer here, but if you get a wrong answer from Copilot, go ahead and read the remainder of this section rather than working along with Copilot in VS Code. We will absolutely give you all the tools you need to fix the code when Copilot gives you a wrong answer, but that skill will be taught over the remainder of the book, so we don’t want you to get stuck on this now.

回想第一章提到过,Copilot 具有不确定性,这意味着你从 Copilot 那里得到的结果可能与我们不一致。这将是贯穿本书的一大挑战:如果我们获得正确结果而你得到错误结果时,你该怎么办?虽然我们充分相信 Copilot 在这里能向你输出正确结果,但如果你从 Copilot 那里得到错误结果,请直接阅读本节的剩余内容,而不是停在原地继续尝试。我们会确保你拥有修正 Copilot 错误代码所需的所有技能,但这些技能会在书的后续部分陆续讲解,因此我们不希望你卡在这儿。

When we run this code (recall how to Run Code from figure 2.1), we get the result 13852, which is the correct answer. (We double-checked the answer, but if you are familiar with football, you can likely use estimates to see if the figure seems reasonable. Quarterbacks throw for 3,000–5,500 yards per season, and this is three seasons worth of data, so 13,852 yards over three seasons seems like the right ballpark for a high-performing quarterback.) What’s particularly interesting is that we planned on giving Copilot a third prompt to ask it to print the result, but Copilot guessed that’s what we’d want to do and did it on its own.

如果我们运行这段代码(回想一下图 2.1 标注的 “运行代码”),我们得到的结果将是:13852。这个答案是正确的。(我们再三确认了这个答案,但如果你对橄榄球略知一二,应该可以通过估算来判断这个数值是否合理。四分卫每赛季的传球码数一般在 3,000 至 5,500 码之间,而这是三个赛季的数据,因此对于一位表现卓越的四分卫而言,13,852 码似乎是在合理范围内。)尤其有趣的是,我们原计划给 Copilot 提出第三轮提示词,要求打印结果,但 Copilot 似乎已经猜到了我们的意图并自行完成了这一步。

What we want you to take from this example (and the rest of the chapter):

我们希望你能从这个示例(及本章的其余内容)中学到以下几点:

  1. Copilot is a powerful tool. We didn’t write any code, but we were able to get Copilot to generate the code needed to perform a basic analysis of the data. For readers who have used spreadsheets, you can probably think of a way to do this using spreadsheet applications like Excel, but it likely wouldn’t be as easy as writing code like this. Even if you haven’t used spreadsheets before, you’ve got to admit that it’s amazing that writing basic, human-readable prompts can produce correct code and output like this.

  2. Copilot 是一个功能强大的工具。 我们没有亲手编写过任何代码,却能够利用 Copilot 生成代码来完成基础的数据分析工作。对于那些有使用电子表格经验的读者,可能会考虑通过 Excel 等电子表格工具来做这件事,但很可能不会像上面那样编写代码来得简单。即便你以前没有使用过电子表格,也应该会承认,仅仅撰写一些平实的、易于人类阅读的提示词,就能够产出正确的代码和输出,这确实令人赞叹。

  3. Breaking problems into small tasks is important. For this example, we tried writing this code with just a single large prompt (not shown) or by breaking it into two smaller tasks. The larger prompt was almost identical text to the two smaller tasks we used, just as a single prompt. We found that Copilot would usually give us the right answer with the larger prompt but would sometimes make mistakes. This was especially true in the next example we’ll show you. However, breaking the problem into smaller tasks significantly increased the likelihood of Copilot generating the right code. We’ll see how to break down larger problems into smaller tasks throughout the remainder of this book because this is one of the most important skills you’ll need. In fact, the next chapter helps you start understanding what are reasonable tasks to give to Copilot.

  4. 将问题分解成小任务是很重要的。 在撰写这个例子的过程中,我们曾尝试使用一段较大的单一提示词(未展示)来编写这段代码,或者将其分解成两个较小的任务。较大的提示词与两个较小提示词的内容几乎一致。我们发现,使用较大的提示词时,Copilot 通常可以给出正确的答案,但有时会犯错。这在我们将要展示的下一个例子中尤其明显。然而,将问题分解成较小的任务显著增加了 Copilot 生成正确代码的可能性。在本书的剩余部分,我们将看到如何将较大的问题分解成小任务,因为这是你将需要掌握的最重要的技能之一。实际上,从下一章开始,你将会逐渐理解哪些任务是适合交给 Copilot 的。

  5. We still need to understand code to some degree. This is true for several reasons. One is that writing good prompts requires a basic understanding of what computers know and what they don’t. We can’t just give a prompt to Copilot that says, “Give me the number of passing yards for Aaron Rodgers.” Copilot likely wouldn’t be able to figure out where the data is stored, the format of the data, which columns correspond to players and passing yards, or that Aaron Rodgers is a player. We had to spell that out to Copilot for it to be successful. Another reason has to do with determining whether code from Copilot is reasonable. When the two of us read the response from Copilot, we know how to read code so we can determine whether the code produced by Copilot is reasonable. You’ll need to be able to do this to some degree, which is why chapters 4 and 5 are dedicated to reading code.

  6. 我们仍然需要在一定程度上理解代码。 这背后有几个原因。首先,要写出好的提示词,你需要对计算机知道什么和不知道什么有一个基本的理解。我们不能只给 Copilot 一个提示说,“给我阿隆·罗杰斯的传球码数。” Copilot 可能压根不知道数据存储在哪里,数据格式如何,球员和传球码数是哪两列,或者不明白阿隆·罗杰斯是一位球员。我们需要明确指出这些信息,Copilot 才能成功执行。另一个原因涉及到判断 Copilot 生成的代码是否合理。当我们g两位作者在阅读 Copilot 输出的结果时,由于我们能够读懂代码,因此我们可以判断 Copilot 生成的代码是否合理。你也需要具备这种判断能力,这也是第4章和第5章专门讲解如何阅读代码的原因。

  7. Testing is important. When programmers talk about testing, they’re referring to the practice of making sure that their code works correctly, even in possibly unexpected circumstances. We didn’t spend much time on this piece, other than checking whether Copilot’s answer is plausible using estimates on just one data set, but in general, we’ll need to spend more time on testing because this is a critical part of the code-writing process. It likely goes without saying, but errors in code range from embarrassing (if you tell your hard-core NFL fan friend the wrong number of passing yards for a player) to dangerous (if software in a car behaves incorrectly) to costly (if businesses make decisions on wrong analyses). Even after you’ve learned how to read code, we have first-hand experience that even if the code looks correct, it might not be! To address this, we must test every piece of code created by Copilot to ensure it does what it should. We’ll learn how to rigorously test Copilot’s code in later chapters.

  8. 测试极为重要。 当程序员提到测试时,他们指的是一种确保代码即便在各种非常规情况下也能正确运行的工作过程。我们在这一部分并没有花太多时间,仅通过对单一数据集进行估算来判断 Copilot 的答案是否合理,但总体而言,我们需要在测试上投入更多的精力,因为这是编码过程中至关重要的一环。可能不用说你也明白,但我们还是想强调,代码错误的后果可能超出想像,有时只是出个臭(例如,向你的一位痴迷 NFL 的朋友谈及错误的球员传球码数),有时则相当危险(比如汽车软件反应失常),还有可能代价高昂(比如企业基于错误的分析结果来做决策)。即使你已经学会了如何阅读代码,我们也要根据自身的经验教训提醒你——就算代码看似正确,也可能暗藏错误!因此,我们必须对 Copilot 生成的每行代码进行测试,以确保它们按照预期工作。在后续章节里,我们将学习如何对 Copilot 生成的代码进行严格的测试。

To showcase the power of Copilot, we’re going to continue this example. Please feel free to follow along writing the prompts and running the code in Copilot or by just reading along.

为了充分展现 Copilot 的能力,我们将继续拓展这个示例。你可以选择跟随我们一起编写提示词并在 Copilot 中运行代码,或者只是进行阅读。

Step 2: How well did all the quarterbacks do over that time period?

第 2 步:在这段时间里,所有四分卫的整体表现如何?

Knowing how well Aaron Rodgers did is interesting. But a more meaningful step would be to compare his stats to other quarterbacks over that time period. We only want to compare against other quarterbacks because they are the players whose job is to throw the ball. Sure, a running back might throw the ball once a season, and it’s often fun when they do, but it’s not really their job. To include only the quarterbacks, we need to go back to the data for a moment. The third column is Position and QB stands for quarterback. As such, let’s delete all the code Copilot gave us (everything after the end of our initial comments) and start over. We won’t show each step like we did before. Instead, the following listing is the result of the entire interaction with Copilot with our prompts highlighted as separate from the code it gave us.

了解阿隆·罗杰斯的表现固然有趣,但接下来更有意义的任务是将他的统计数据与同期的其他四分卫进行比较。我们只需要与其他四分卫进行比较,因为这个位置就是负责传球的。当然,一个跑卫在一个赛季里可能会传一次球,而且当他们这么做时通常也很有趣,但这并不是他们的本职工作。为了筛选出那些四分卫,我们需要重新查看一下数据。第三列是“位置(position)”,而“QB”就代表四分卫。因此,让我们删除 Copilot 给我们的所有代码(即我们最顶部注释之后的所有内容)并重新开始。我们不会像上面那样详细展示每个步骤,而是把我们与 Copilot 互动的整个结果一次性列出,其中我们编写的提示词会加粗显示,以便与 Copilot 给出的代码进行区分。

Listing 2.1 Copilot’s code to analyze the top quarterbacks
代码清单 2.1 Copilot 分析顶级四分卫的代码
"""                                                              #1
open the csv file called "nfl_offensive_stats.csv" and           #1
read in the csv data from the file                               #1
"""                                                              #1
# import the csv module                                          #2
import csv
# open the csv file
with open('nfl_offensive_stats.csv', 'r') as f:
    # read the csv data
    data = list(csv.reader(f))
"""                                                              #3
the 3rd column in data is player position, the fourth column     #3
is the player, and the 8th column is the passing yards.          #3
For each player whose position in column 3 is "QB",              #3
determine the sum of yards from column 8                         #3
"""                                                              #3
# create a dictionary to hold the player name and passing yards
passing_yards = {}
# loop through the data
for row in data:
    # check if the player is a quarterback
    if row[2] == 'QB':
        # check if the player is already in the dictionary
        if row[3] in passing_yards:
            # add the passing yards to the existing value
            passing_yards[row[3]] += int(row[7])
        else:
            # add the player to the dictionary
            passing_yards[row[3]] = int(row[7])
"""                                                              #4
print the sum of the passing yards sorted by sum                 #4
of passing yards in descending order                             #4
"""                                                              #4
for player in sorted(passing_yards, key=passing_yards.get, reverse=True):
    print(player, passing_yards[player])
原文 译文 备注
Our first prompt to read in the data 我们的第一段提示词,用于读入数据
Copilot’s responses sometimes include comments. Comments that we get from Copilot will not be in bold, unlike our prompts which appear in bold. Copilot 的输出有时会包含注释。我们从 Copilot 那里得到的注释不会加粗显示,只有我们自己写的提示词会加粗显示。
The second prompt to analyze the data 第二段提示词,用于分析数据
Our third prompt to print the results 我们的第三段提示词,用于打印结果

Notice that we gave Copilot three prompts. The first was to handle the input data, the second was to process the data, and the third was to output the response. This cycle of input data, process data, provide output is exceptionally common in programming tasks.

请注意,我们向 Copilot 给出了三段提示词。第一段提示词针对输入数据进行处理,第二段涉及数据的加工处理,而第三段则关于输出结果。这种 “输入数据、加工数据、输出结果” 的流程在编程任务里非常典型。

Looking at the results from Copilot, we have to point out that we’ve taught programming for years and this is pretty impressive. We might ask students to solve something like this on a final exam in our college-level classes, and we suspect less than half the class would do it correctly. Without diving into too many details, Copilot chose a good way of storing the data by using a dictionary (not a normal dictionary like an English dictionary but a way of storing data in Python), which is a good choice here, and used a clever way of sorting the data to help in displaying the results.

观察 Copilot 的输出结果,我们必须指出,身为多年编程教育者,这段代码确实令人赞叹。在我们执教的大学课程里,我们可能会在期末考试上布置类似的题目,但估计只有不到半数的学生能够正确完成。Copilot 选择了使用字典(这里说的字典不是像英语字典那种普通字典,而是一种在 Python 中用于数据存储的方式)来存储数据,这是个明智的选择;并且它还巧妙地采用了排序方法来提升数据的展示效果。

Thinking of the results, the first five lines from the output if you run the code are

我们来看看结果。如果运行上述代码,输出结果的前五行是这样的:

Patrick Mahomes 16132
Tom Brady 15876
Aaron Rodgers 13852
Josh Allen 13758
Derek Carr 13271

If you follow football, these results should not be a surprise to you. Just to see how well Copilot can adapt to our wishes, let’s try to make a minor change. Suppose that because Tom Brady is already recognized as one of the best QBs of all time, you would rather omit him from this comparison.

如果你是橄榄球迷,这个结果对你而言应该并不意外。为了试探 Copilot 对我们需求的适应能力到底有多好,我们来尝试做一个小修改。考虑到汤姆·布雷迪(Tom Brady)已经是公认的有史以来最伟大的四分卫之一,你或许打算在这次比较中忽略他。

To make this change, we’re just going to modify the prompt at the bottom. Go to the point in the code where it says

为实现这个调整,我们只需要修改最后一段提示词。请找到代码中以下位置:

"""
print the sum of the passing yards sorted by sum
of passing yards in descending order
"""
for player in sorted(passing_yards, key=passing_yards.get, reverse=True):
    print(player, passing_yards[player])

Delete the code, leaving just the comment, and add another line to the docstring:

删除代码,仅保留注释,然后在文档字符串中添加下面这行:

"""
print the sum of the passing yards sorted by sum
of passing yards in descending order
Do not include Tom Brady because he wins too much
"""

Copilot then suggested to us:

随后,Copilot 给出的建议是:

for player in sorted(passing_yards, key=passing_yards.get, reverse=True):
    if player != "Tom Brady":                 #1
        print(player, passing_yards[player])
原文 译文 备注
Code that excludes Tom Brady from the data 这行代码排除了汤姆·布雷迪的数据

That’s exactly what we’d like to see changed in the code. (Thanks, Tom Brady, for being a good sport in this example.) The code excluded all data for Tom Brady at the point of printing the results. When we save the file and run it again, the first five lines are now

这正是我们想在代码中进行的更改。(感谢汤姆·布雷迪在这个例子中展现出的良好体育精神。)代码在输出结果的环节排除了所有关于汤姆·布雷迪的数据。现在当我们保存并重新运行文件时,输出结果的前五行是这样的:

Patrick Mahomes 16132
Aaron Rodgers 13852
Josh Allen 13758
Derek Carr 13271
Matt Ryan 13015

Step 3: Let’s plot these stats so we can compare them better

步骤 3:绘制这些数据的统计图,便于我们更好地进行对比

Let’s drive home our key point that Copilot is a powerful tool by asking it to go even one step farther. The printout of all the quarterback stats is likely a useful analysis for some purposes. But a visual plot might be a better way of presenting this information. Can we ask Copilot to plot it? Suppose we only care about the top-performing quarterbacks, so we decide to set an arbitrary limit of more than 4,000 yards as the minimum number of yards a quarterback needs to throw during this time window to be included in our plot. Granted, asking Copilot to plot the data is likely to introduce some new challenges because it will involve going beyond straight text, but let’s give it a try.

我们要求 Copilot 完成的这一系列任务,进一步突显出我们的核心论点:Copilot 是一款强大的工具。在通常情况下,打印所有四分卫的统计数据可能是一项很有意义的分析工作。而且,如果我们能采用可视化图表来展示这些信息,那就更是锦上添花了。我们可以要求 Copilot 来绘制图表吗?假设我们在这里仅对顶尖的四分卫感兴趣,因此我们决定设定一个门槛,即四分卫们在这段时间内获得的传球码数要达到4000以上,才有机会入围我们的分析图表。确实,要求 Copilot 绘制数据图表可能会带来一些新的挑战,因为这超越了常规的文本操作,但我们可以尝试一下。

At the end of the code we just wrote, add the following comment:

在刚刚写完的代码末尾,我们添加以下注释:

"""
plot the players by their number of passing yards only for
players with more than 4000 passing yards
"""

The following listing shows the code Copilot gave us in response to our prompt.

当我们向 Copilot 提出这段提示词之后,它给出的代码如代码清单2.2所示:

Listing 2.2 Code given by Copilot to plot our data about quarterbacks
代码清单2.2 Copilot 提供的绘制四分卫数据的代码
import matplotlib.pyplot as plt
import numpy as np
# create a list of players
players = []
# create a list of passing yards
yards = []
# for each player in the dictionary
for player in passing_yards:
    # if the player has more than 4000 passing yards
    if passing_yards[player] > 4000:
        # add the player to the list of players
        players.append(player)
        # add the passing yards to the list of passing yards
        yards.append(passing_yards[player])
# create a numpy array of the passing yards
yards = np.array(yards)
# create a numpy array of the players
players = np.array(players)
# sort the players by the passing yards
print(np.argsort(yards))
players = players[np.argsort(yards)]
# sort the passing yards
yards = np.sort(yards)
# create a figure
plt.figure()
# create a bar chart of the players and their passing yards
plt.bar(players, yards)
# rotate the x axis labels
plt.xticks(rotation=90)
# show the plot
plt.show()

We gave Copilot this prompt a number of times and it consistently produced a reasonable graph, but the code and graph varied each time. This is just one representative result, and in this result, the code is notably more advanced than the code you saw earlier this chapter. In fact, we need a more in-depth understanding of the code to be able to walk through it in any real detail, so we’re just going to skip any attempt to read through or interpret the code at this point in the book. We can tell you that, at a high-level, it properly imported a Python module designed to make plots (called matplotlib), did some fairly clever data manipulation in the middle using a Python module called numpy, and even had the sense to rotate player names so that they could print well as an x-axis label.

我们曾多次向 Copilot 提供这段提示词,它每次都能产生一个合理的图形,但是代码和图形每次都有所不同。这只是其中一个具有代表性的结果,在这个结果中,代码比本章早些时候展示的代码要复杂得多。事实上,我们需要更深层次地理解代码才能详细地分析它,因此在眼下这个阶段,我们暂且跳过对这段代码的阅读和解释。我们可以提纲挈领地告诉你,它正确导入了一个用于绘图的 Python 模块(名为 matplotlib),然后在代码中段利用名为 numpy 的 Python 模块进行了一些相当巧妙的数据操作,并且它甚至还考虑到将球员名字旋转,使其能够作为 x 轴标签清晰地打印出来。

If you run this code, you might hit a snag, however. Because Copilot learned from code in GitHub, it doesn’t know what Python modules are installed on your personal machine. The programmers who wrote the code that Copilot learned from likely had matplotlib installed, and matplotlib is the right module to use here, but matplotlib is not a module installed by default in Python. If you don’t have it installed, you’ll get an error about not finding the matplotlib module when you try to run the code.

但是,当你尝试执行这段代码时,可能会遇到问题。因为 Copilot 完全根据 GitHub 上的代码进行训练,它无法得知你的个人电脑上安装了哪些 Python 模块。Copilot 在训练时所用到的原始代码在编写时可能安装了 matplotlib,而且 matplotlib 在这个场景下确实是正确的选择,但是 matplotlib 并不是 Python 默认安装的模块。如果你尚未安装它,那在运行代码时,你将遇到一个错误,提示无法找到 matplotlib 模块。

Python modules
Python 模块

Python modules expand the capability of the programming language. There are many modules in Python, and they can help you do anything from data analysis to creating websites to writing video games. You can recognize when code wants to use a Python module by the import statement in the code. Python doesn’t automatically install all the modules for you because you likely won’t use most of them. When you want to use a module then, you’ll need to install the package containing the module yourself.

模块扩展了 Python 这门编程语言的功能范畴。Python 拥有大量模块,它们能帮助你完成各式各样的任务,包括数据分析、网站创建和电子游戏开发等等。看到代码中的 import 语句,你就知道代码需要使用 Python 模块了。Python 并不会自动安装所有模块,因为其中的大部分你可能根本用不到。因此,当你希望使用某个模块时,你需要自己安装包含该模块的“包(package)”。

To fix this error, you’ll need to install matplotlib. The good news is that Python has made it easy to install new packages. Go to the Terminal at the bottom right of VS Code and type:

要修复这个错误,你需要手动安装 matplotlib。好消息是现在 Python 安装新包已经非常简单了。在 VS Code 右下角的 “终端” 面板里输入:

pip install matplotlib
Note
注意

For some operating systems, you may need to use pip3 rather than pip. On Windows machines, we recommend using pip if you followed our installation instructions. On Mac or Linux machines, we recommend using pip3.

对于某些操作系统来说,你可能需要使用 pip3 而不是 pip。在 Windows 设备上,如果你是按照我们的安装步骤进行操作的,那建议你使用 pip。而在 Mac 或 Linux 设备上,我们建议使用 pip3

When you run this command, you’ll see that a bunch of modules are installed, including numpy (the next module this code wants to use). (matplotlib requires Python modules of its own, so it installs all the modules you need to use matplotlib in addition to matplotlib itself.) When you try to run the code again, you’ll get a plot like figure 2.5.

运行这个命令后,你会看到一大堆模块被安装了,其中也包括 numpy(它正是我们的代码需要用到的另一个模块)。(matplotlib 自己也依赖别的 Python 模块,因此除了 matplotlib 本身,这个命令还会安装运行 matplotlib 所需的其他模块。)当你尝试再次运行代码时,将会得到类似图2.5的图表。

Figure 2.5 The plot produced by the code in listing 2.2
图 2.5 代码清单 2.2 所生成的图表

In this bar graph, we see the y-axis is the number of passing yards, and the x-axis is the player’s name. The players are sorted from fewest yards (with a minimum of 4,000) to most yards. Admittedly, it’s not perfect because it is missing a y-axis label and the names on the x-axis are cut off at the bottom, but this is pretty impressive given all we gave Copilot was a short prompt. We could keep adding prompts to see if we can format the graph better, but we’ve already achieved the primary goals for this section, which was to show you how powerful Copilot is at helping us write code and to get a feel for how to interact with Copilot.

在这个柱状图中,我们看到 y 轴是传球码数,x 轴是球员的名字。球员按照传球码数从最少(至少要达到 4000 码)到最多进行排序。诚然,这个图表并不完美,因为它缺少 y 轴标签,而 x 轴上的名字在底部被截断了,但考虑到我们向 Copilot 给出的提示词非常简短,这已经相当令人赞叹了。我们可以继续添加提示词,看看是否能更好地设置图表的格式,但我们已经完成了本节的主要目标,即向你展示 Copilot 在帮助我们编写代码方面的强大能力,并让你找到与 Copilot 互动的感觉。

Indeed, in this chapter, we’ve accomplished a great deal! If you’ve finished setting up your programming environment and followed along the example with us, you should be proud. You’ve taken a huge step toward writing software! Beyond the details of setting up your environment, we’ve written software to solve our first problem. Moreover, you’ve observed the process of writing software with Copilot that starts with writing good prompts to help Copilot give us the code we want. In the examples in this chapter, Copilot gave us the code we wanted without us needing to change the prompt or debug the code to figure out why it’s not working properly. That was a nice way to showcase the power of using an AI assistant to program, but you will often find yourself having to test the code, change the prompts, and sometimes try to understand why the code is wrong. This is the AI assistant programming process that we’ll learn more about in upcoming chapters.

不得不说,我们在这一章取得了很大进展!**如果你已经完成了编程环境的设置,并跟着我们的示例进行了操作,你应该感到自豪。你已经在编写软件的道路上迈出了重要的一步!**不仅仅是完成了细碎的环境配置,我们还写出了一段代码并完成了我们的第一个练习。此外,你还观察到了使用 Copilot 编写软件的过程——在这个过程中,先是由我们来撰写良好的提示词,然后由 Copilot 来生成我们需要的代码。在本章的示例中,Copilot 直接就给出了我们想要的代码,而我们无需调整提示词,也无需调试代码来排查代码出错的原因。这个例子很好地展示了 AI 编程助手的强大能力,但你往往还是会发现自己需要测试代码、更改提示词,有时还需要尝试理解代码为何出错。在接下来的章节里,我们会更真切地体会到这种与 AI 编程助手的协作过程。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant