NumPy 博客总结:
《Python数据分析基础教程:NumPy学习指南(第2版)》所有章节阅读笔记+代码
pandas博客总结:
本篇博客是力扣的“pandas入门15题”。
2877. 从表中创建 DataFrame
编写一个解决方案,基于名为 student_data
的二维列表 创建 一个 DataFrame 。这个二维列表包含一些学生的 ID 和年龄信息。
DataFrame 应该有两列, student_id
和 age
,并且与原始二维列表的顺序相同。
返回结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
输入:
student_data:
[
[1, 15],
[2, 11],
[3, 11],
[4, 20]
]
输出:
+------------+-----+
| student_id | age |
+------------+-----+
| 1 | 15 |
| 2 | 11 |
| 3 | 11 |
| 4 | 20 |
+------------+-----+
解释:
基于 student_data 创建了一个 DataFrame,包含 student_id 和 age 两列。
1
2
3
4
import pandas as pd
def createDataframe(student_data: List[List[int]]) -> pd.DataFrame:
return pd.DataFrame(student_data, columns = ["student_id", "age"])
2878. 获取 DataFrame 的大小
1
2
3
4
5
6
7
8
9
10
DataFrame players:
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| player_id | int |
| name | object |
| age | int |
| position | object |
| ... | ... |
+-------------+--------+
编写一个解决方案,计算并显示 players
的 行数和列数。
将结果返回为一个数组:
1
[number of rows, number of columns]
返回结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
输入:
+-----------+----------+-----+-------------+--------------------+
| player_id | name | age | position | team |
+-----------+----------+-----+-------------+--------------------+
| 846 | Mason | 21 | Forward | RealMadrid |
| 749 | Riley | 30 | Winger | Barcelona |
| 155 | Bob | 28 | Striker | ManchesterUnited |
| 583 | Isabella | 32 | Goalkeeper | Liverpool |
| 388 | Zachary | 24 | Midfielder | BayernMunich |
| 883 | Ava | 23 | Defender | Chelsea |
| 355 | Violet | 18 | Striker | Juventus |
| 247 | Thomas | 27 | Striker | ParisSaint-Germain |
| 761 | Jack | 33 | Midfielder | ManchesterCity |
| 642 | Charlie | 36 | Center-back | Arsenal |
+-----------+----------+-----+-------------+--------------------+
输出:
[10, 5]
解释:
这个 DataFrame 包含 10 行和 5 列。
1
2
3
4
import pandas as pd
def getDataframeSize(players: pd.DataFrame) -> List[int]:
return [players.shape[0], players.shape[1]]
或者:
1
2
3
4
import pandas as pd
def getDataframeSize(players: pd.DataFrame) -> List[int]:
return list(players.shape)
2879. 显示前三行
1
2
3
4
5
6
7
8
9
DataFrame: employees
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| employee_id | int |
| name | object |
| department | object |
| salary | int |
+-------------+--------+
编写一个解决方案,显示这个 DataFrame 的 前 3
行。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
输入:
DataFrame employees
+-------------+-----------+-----------------------+--------+
| employee_id | name | department | salary |
+-------------+-----------+-----------------------+--------+
| 3 | Bob | Operations | 48675 |
| 90 | Alice | Sales | 11096 |
| 9 | Tatiana | Engineering | 33805 |
| 60 | Annabelle | InformationTechnology | 37678 |
| 49 | Jonathan | HumanResources | 23793 |
| 43 | Khaled | Administration | 40454 |
+-------------+-----------+-----------------------+--------+
输出:
+-------------+---------+-------------+--------+
| employee_id | name | department | salary |
+-------------+---------+-------------+--------+
| 3 | Bob | Operations | 48675 |
| 90 | Alice | Sales | 11096 |
| 9 | Tatiana | Engineering | 33805 |
+-------------+---------+-------------+--------+
解释:
只有前 3 行被显示。
1
2
3
4
import pandas as pd
def selectFirstRows(employees: pd.DataFrame) -> pd.DataFrame:
return employees.head(3)
返回切片 df = employees[:3]
也可
2880. 数据选取
1
2
3
4
5
6
7
8
DataFrame students
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| student_id | int |
| name | object |
| age | int |
+-------------+--------+
编写一个解决方案,选择 student_id = 101
的学生的 name 和 age 并输出。
返回结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
输入:
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 101 | Ulysses | 13 |
| 53 | William | 10 |
| 128 | Henry | 6 |
| 3 | Henry | 11 |
+------------+---------+-----+
输出:
+---------+-----+
| name | age |
+---------+-----+
| Ulysses | 13 |
+---------+-----+
解释:
学生 Ulysses 的 student_id = 101,所以我们输出了他的 name 和 age。
1
2
3
4
import pandas as pd
def selectData(students: pd.DataFrame) -> pd.DataFrame:
return students[students['student_id']==101][['name','age']]
2881. 创建新列
1
2
3
4
5
6
7
DataFrame employees
+-------------+--------+
| Column Name | Type. |
+-------------+--------+
| name | object |
| salary | int. |
+-------------+--------+
一家公司计划为员工提供奖金。
编写一个解决方案,创建一个名为 bonus
的新列,其中包含 salary
值的 两倍。
返回结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
输入:
DataFrame employees
+---------+--------+
| name | salary |
+---------+--------+
| Piper | 4548 |
| Grace | 28150 |
| Georgia | 1103 |
| Willow | 6593 |
| Finn | 74576 |
| Thomas | 24433 |
+---------+--------+
输出:
+---------+--------+--------+
| name | salary | bonus |
+---------+--------+--------+
| Piper | 4548 | 9096 |
| Grace | 28150 | 56300 |
| Georgia | 1103 | 2206 |
| Willow | 593 | 13186 |
| Finn | 74576 | 149152 |
| Thomas | 24433 | 48866 |
+---------+--------+--------+
解释:
通过将 salary 列中的值加倍创建了一个新的 bonus 列。
1
2
3
4
5
import pandas as pd
def createBonusColumn(employees: pd.DataFrame) -> pd.DataFrame:
employees["bonus"] = employees["salary"] * 2
return employees
或者:(更慢)
1
2
3
4
import pandas as pd
def createBonusColumn(employees: pd.DataFrame) -> pd.DataFrame:
return employees.assign(bonus = employees['salary'] * 2)
2882. 删去重复的行
1
2
3
4
5
6
7
8
DataFrame customers
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| customer_id | int |
| name | object |
| email | object |
+-------------+--------+
在 DataFrame 中基于 email
列存在一些重复行。
编写一个解决方案,删除这些重复行,仅保留第一次出现的行。
返回结果格式如下例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
输入:
+-------------+---------+---------------------+
| customer_id | name | email |
+-------------+---------+---------------------+
| 1 | Ella | emily@example.com |
| 2 | David | michael@example.com |
| 3 | Zachary | sarah@example.com |
| 4 | Alice | john@example.com |
| 5 | Finn | john@example.com |
| 6 | Violet | alice@example.com |
+-------------+---------+---------------------+
输出:
+-------------+---------+---------------------+
| customer_id | name | email |
+-------------+---------+---------------------+
| 1 | Ella | emily@example.com |
| 2 | David | michael@example.com |
| 3 | Zachary | sarah@example.com |
| 4 | Alice | john@example.com |
| 6 | Violet | alice@example.com |
+-------------+---------+---------------------+
解释:
Alice (customer_id = 4) 和 Finn (customer_id = 5) 都使用 john@example.com,因此只保留该邮箱地址的第一次出现。
1
2
3
4
import pandas as pd
def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
return customers.drop_duplicates(subset="email")
参考答案给的是:
1
2
3
4
5
6
import pandas as pd
def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
customers.drop_duplicates(subset='email', keep='first', inplace=True)
return customers
注:使用 inplace=True 修改原始 DataFrame。为了保留原来的 DataFrame 并获得一个去掉重复项的新 DataFrame,我们应该设置 inplace=False,并将结果赋给一个新的变量。
2883. 删去丢失的数据
1
2
3
4
5
6
7
8
DataFrame students
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| student_id | int |
| name | object |
| age | int |
+-------------+--------+
在 name
列里有一些具有缺失值的行。
编写一个解决方案,删除具有缺失值的行。
返回结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
输入:
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 32 | Piper | 5 |
| 217 | None | 19 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
+------------+---------+-----+
输出:
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 32 | Piper | 5 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
+------------+---------+-----+
解释:
学号为 217 的学生所在行在 name 列中有空值,因此这一行将被删除。
1
2
3
4
import pandas as pd
def dropMissingData(students: pd.DataFrame) -> pd.DataFrame:
return students[~students.isnull().any(axis = 1)]
或者:
1
2
3
4
5
import pandas as pd
def dropMissingData(students: pd.DataFrame) -> pd.DataFrame:
students.dropna(subset='name', inplace=True)
return students
2884. 修改列
1
2
3
4
5
6
7
DataFrame employees
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| name | object |
| salary | int |
+-------------+--------+
一家公司决定增加员工的薪水。
编写一个解决方案,将每个员工的薪水乘以2来 修改 salary
列。
返回结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
输入:
DataFrame employees
+---------+--------+
| name | salary |
+---------+--------+
| Jack | 19666 |
| Piper | 74754 |
| Mia | 62509 |
| Ulysses | 54866 |
+---------+--------+
输出:
+---------+--------+
| name | salary |
+---------+--------+
| Jack | 39332 |
| Piper | 149508 |
| Mia | 125018 |
| Ulysses | 109732 |
+---------+--------+
解释:
每个人的薪水都被加倍。
1
2
3
4
5
import pandas as pd
def modifySalaryColumn(employees: pd.DataFrame) -> pd.DataFrame:
employees["salary"] = employees["salary"] * 2
return employees
2885. 重命名列
1
2
3
4
5
6
7
8
9
DataFrame students
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| id | int |
| first | object |
| last | object |
| age | int |
+-------------+--------+
编写一个解决方案,按以下方式重命名列:
id
重命名为student_id
first
重命名为first_name
last
重命名为last_name
age
重命名为age_in_years
返回结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
输入:
+----+---------+----------+-----+
| id | first | last | age |
+----+---------+----------+-----+
| 1 | Mason | King | 6 |
| 2 | Ava | Wright | 7 |
| 3 | Taylor | Hall | 16 |
| 4 | Georgia | Thompson | 18 |
| 5 | Thomas | Moore | 10 |
+----+---------+----------+-----+
输出:
+------------+------------+-----------+--------------+
| student_id | first_name | last_name | age_in_years |
+------------+------------+-----------+--------------+
| 1 | Mason | King | 6 |
| 2 | Ava | Wright | 7 |
| 3 | Taylor | Hall | 16 |
| 4 | Georgia | Thompson | 18 |
| 5 | Thomas | Moore | 10 |
+------------+------------+-----------+--------------+
解释:
列名已相应更换。
1
2
3
4
5
import pandas as pd
def renameColumns(students: pd.DataFrame) -> pd.DataFrame:
students.columns = ["student_id", "first_name", "last_name", "age_in_years"]
return students
352ms
其他方法:
1
2
3
4
5
6
7
8
9
10
11
import pandas as pd
def renameColumns(students: pd.DataFrame) -> pd.DataFrame:
students = students.rename(
columns = {
'id': "student_id",
'first': "first_name",
'last': 'last_name',
'age': "age_in_years"
})
return students
或者用rename函数:
1
2
3
4
5
6
7
8
9
10
import pandas as pd
def renameColumns(students: pd.DataFrame) -> pd.DataFrame:
new_columns = {
'id': "student_id",
'first': "first_name",
'last': 'last_name',
'age': "age_in_years"
}
return students.rename(columns = new_columns, inplace=False)
2886. 改变数据类型
1
2
3
4
5
6
7
8
9
DataFrame students
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| student_id | int |
| name | object |
| age | int |
| grade | float |
+-------------+--------+
编写一个解决方案来纠正以下错误:
grade
列被存储为浮点数,将它转换为整数。
返回结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
输入:
DataFrame students:
+------------+------+-----+-------+
| student_id | name | age | grade |
+------------+------+-----+-------+
| 1 | Ava | 6 | 73.0 |
| 2 | Kate | 15 | 87.0 |
+------------+------+-----+-------+
输出:
+------------+------+-----+-------+
| student_id | name | age | grade |
+------------+------+-----+-------+
| 1 | Ava | 6 | 73 |
| 2 | Kate | 15 | 87 |
+------------+------+-----+-------+
解释:
grade 列的数据类型已转换为整数。
1
2
3
4
5
import pandas as pd
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
students['grade'] = students['grade'].astype('int')
return students
特别注意,改完students['grade']
不要忘记赋值回去。
其他方法:
1
2
3
4
5
import pandas as pd
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
students = students.astype({'grade':int})
return students
或者:(时间更久)
1
2
3
4
5
import pandas as pd
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
students['grade'] = students['grade'].apply(lambda x: int(x))
return students
🌟2887. 填充缺失值
1
2
3
4
5
6
7
8
DataFrame products
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| name | object |
| quantity | int |
| price | int |
+-------------+--------+
编写一个解决方案,在 quantity
列中将缺失的值填充为 **0**
。
返回结果如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
输入:
+-----------------+----------+-------+
| name | quantity | price |
+-----------------+----------+-------+
| Wristwatch | 32 | 135 |
| WirelessEarbuds | None | 821 |
| GolfClubs | None | 9319 |
| Printer | 849 | 3051 |
+-----------------+----------+-------+
输出:
+-----------------+----------+-------+
| name | quantity | price |
+-----------------+----------+-------+
| Wristwatch | 32 | 135 |
| WirelessEarbuds | 0 | 821 |
| GolfClubs | 0 | 9319 |
| Printer | 849 | 3051 |
+-----------------+----------+-------+
解释:
Toaster 和 Headphones 的数量被填充为 0。
参考答案:
1
2
3
4
5
import pandas as pd
def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
return products.fillna(value={'quantity':0})
或者:
1
2
3
4
5
6
import pandas as pd
def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
products['quantity'][products['quantity'].isnull()] = 0
return products
注:products['quantity'][products['quantity'].isnull()]
: 这部分代码使用布尔索引来选择 quantity 列中所有值为缺失值的行。 它本质上是选择了 DataFrame 中 quantity 列中所有 isnull() 返回 True 的那些行。
2888. 重塑数据:连结
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
DataFrame df1
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| student_id | int |
| name | object |
| age | int |
+-------------+--------+
DataFrame df2
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| student_id | int |
| name | object |
| age | int |
+-------------+--------+
编写一个解决方案,将两个 DataFrames 垂直 连接成一个 DataFrame。
结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
输入:
df1
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 1 | Mason | 8 |
| 2 | Ava | 6 |
| 3 | Taylor | 15 |
| 4 | Georgia | 17 |
+------------+---------+-----+
df2
+------------+------+-----+
| student_id | name | age |
+------------+------+-----+
| 5 | Leo | 7 |
| 6 | Alex | 7 |
+------------+------+-----+
输出:
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 1 | Mason | 8 |
| 2 | Ava | 6 |
| 3 | Taylor | 15 |
| 4 | Georgia | 17 |
| 5 | Leo | 7 |
| 6 | Alex | 7 |
+------------+---------+-----+
解释:
两个 DataFrame 被垂直堆叠,它们的行被合并。
1
2
3
4
import pandas as pd
def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
return pd.merge(df1, df2, how='outer')
其他方法:
1
2
3
4
import pandas as pd
def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
return pd.concat([df1, df2], axis = 0)
2889. 数据重塑:透视
1
2
3
4
5
6
7
8
DataFrame weather
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| city | object |
| month | object |
| temperature | int |
+-------------+--------+
编写一个解决方案,以便将数据 旋转,使得每一行代表特定月份的温度,而每个城市都是一个单独的列。
输出结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
输入:
+--------------+----------+-------------+
| city | month | temperature |
+--------------+----------+-------------+
| Jacksonville | January | 13 |
| Jacksonville | February | 23 |
| Jacksonville | March | 38 |
| Jacksonville | April | 5 |
| Jacksonville | May | 34 |
| ElPaso | January | 20 |
| ElPaso | February | 6 |
| ElPaso | March | 26 |
| ElPaso | April | 2 |
| ElPaso | May | 43 |
+--------------+----------+-------------+
输出:
+----------+--------+--------------+
| month | ElPaso | Jacksonville |
+----------+--------+--------------+
| April | 2 | 5 |
| February | 6 | 23 |
| January | 20 | 13 |
| March | 26 | 38 |
| May | 43 | 34 |
+----------+--------+--------------+
解释:
表格被旋转,每一列代表一个城市,每一行代表特定的月份。
1
2
3
4
import pandas as pd
def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
return weather.pivot(index="month", columns="city", values="temperature")
🌟2890. 重塑数据:融合
1
2
3
4
5
6
7
8
9
10
DataFrame report
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| product | object |
| quarter_1 | int |
| quarter_2 | int |
| quarter_3 | int |
| quarter_4 | int |
+-------------+--------+
编写一个解决方案,将数据 重塑 成每一行表示特定季度产品销售数据的形式。
结果格式如下例所示:
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
输入:
+-------------+-----------+-----------+-----------+-----------+
| product | quarter_1 | quarter_2 | quarter_3 | quarter_4 |
+-------------+-----------+-----------+-----------+-----------+
| Umbrella | 417 | 224 | 379 | 611 |
| SleepingBag | 800 | 936 | 93 | 875 |
+-------------+-----------+-----------+-----------+-----------+
输出:
+-------------+-----------+-------+
| product | quarter | sales |
+-------------+-----------+-------+
| Umbrella | quarter_1 | 417 |
| SleepingBag | quarter_1 | 800 |
| Umbrella | quarter_2 | 224 |
| SleepingBag | quarter_2 | 936 |
| Umbrella | quarter_3 | 379 |
| SleepingBag | quarter_3 | 93 |
| Umbrella | quarter_4 | 611 |
| SleepingBag | quarter_4 | 875 |
+-------------+-----------+-------+
解释:
DataFrame 已从宽格式重塑为长格式。每一行表示一个季度内产品的销售情况。
1
2
3
4
5
6
7
8
9
10
import pandas as pd
def meltTable(report: pd.DataFrame) -> pd.DataFrame:
report = report.melt(
id_vars=['product'],
value_vars=['quarter_1', 'quarter_2', 'quarter_3', 'quarter_4'],
var_name='quarter',
value_name='sales'
)
return report
注:value_vars
可以不填,不填就是除了id_vars
列外,其它所有列都做转换
知识点:
- DataFrame形态改变之变成长格式
利用”df.melt()”方法
.melt
方法的参数
1
df.melt(id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
id_vars
= 不想改变的列。这些列在输出的DataFrame中保持不变,通常用来标识每条记录。
value_vars
= 想要“融化”的列。这些列会被转换成长格式中的值,如果没有指定,默认所有非 id_vars
的列都会被熔化。
var_name
= “融化”列的名字,该列包含了原来列的名称。默认情况下,这个列的名字是 variable。
value_name
= 数值列的名字,该列包含了原来列的值。默认情况下,这个列的名字是 value。
col_level
= 多级索引(MultiIndex)列的指定“融化”的该级索引。默认为 None,表示熔化所有级别的索引。
🌟2891. 方法链
1
2
3
4
5
6
7
8
9
DataFrame animals
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| name | object |
| species | object |
| age | int |
| weight | int |
+-------------+--------+
编写一个解决方案来列出体重 严格超过 100
千克的动物的名称。
按体重 降序 返回动物。
返回结果格式如下示例所示。
示例 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
输入:
DataFrame animals:
+----------+---------+-----+--------+
| name | species | age | weight |
+----------+---------+-----+--------+
| Tatiana | Snake | 98 | 464 |
| Khaled | Giraffe | 50 | 41 |
| Alex | Leopard | 6 | 328 |
| Jonathan | Monkey | 45 | 463 |
| Stefan | Bear | 100 | 50 |
| Tommy | Panda | 26 | 349 |
+----------+---------+-----+--------+
输出:
+----------+
| name |
+----------+
| Tatiana |
| Jonathan |
| Tommy |
| Alex |
+----------+
解释:
所有体重超过 100 的动物都应包含在结果表中。
Tatiana 的体重为 464,Jonathan 的体重为 463,Tommy 的体重为 349,Alex 的体重为 328。
结果应按体重降序排序。
在 Pandas 中,方法链 允许我们在 DataFrame 上执行操作,而无需将每个操作拆分成单独的行或创建多个临时变量。
你能用 一行 代码的方法链完成这个任务吗?
参考答案:
1
2
3
4
import pandas as pd
def findHeavyAnimals(animals: pd.DataFrame) -> pd.DataFrame:
return animals[animals['weight']>100].sort_values(by="weight", ascending=False)[['name']]