2012欧洲杯排名函数：Python数据分析-数据初探2-英超吧

本文主要目的：了解数据过滤与排序，熟悉Pandas数据分析库，应用于数据处理，提高数据处理效率

前途一片光明

导入必要的库

import pandas as pdimport numpy as np

导入数据集

euro=pd.read_csv(r'C:UsersAdministratorDesktopexercise_dataEuro2012_stats.csv')euro------------------------------结果： Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used0 Croatia 4 13 12 51.9% 16.0% 32 0 0 0 ... 13 81.3% 41 62 2 9 0 9 9 161 Czech Republic 4 13 18 41.9% 12.9% 39 0 0 0 ... 9 60.1% 53 73 8 7 0 11 11 192 Denmark 4 10 10 50.0% 20.0% 27 1 0 0 ... 10 66.7% 25 38 8 4 0 7 7 153 England 5 11 18 50.0% 17.2% 40 0 0 0 ... 22 88.1% 43 45 6 5 0 11 11 16

选取某一列（Goals）

euro['Goals']#或euro.Goals--------------------------结果：0 41 42 43 54 35 106 57 68 29 210 611 112 513 1214 515 2Name: Goals, dtype: int64

有多少球队参与了2012欧洲杯？

euro['Team'].nunique()#或euro.shape[0]-------------------------------结果：16

该数据集中一共有多少列(columns)?

euro.shape[1]---------------------------结果：35

将数据集中的列Team, Yellow Cards和Red Cards单独存为一个名叫ddyl（自定义名称）的数据框

ddyl=euro[['Team','Yellow Cards','Red Cards']]ddyl--------------------------结果： Team Yellow Cards Red Cards0 Croatia 9 01 Czech Republic 7 02 Denmark 4 03 England 5 04 France 6 05 Germany 4 06 Greece 9 17 Italy 16 08 Netherlands 5 09 Poland 7 110 Portugal 12 011 Republic of Ireland 6 112 Russia 6 013 Spain 11 014 Sweden 7 015 Ukraine 5 0

对数据框discipline按照先Red Cards再Yellow Cards进行排序

ddyl.sort_values(['Red Cards','Yellow Cards'],ascending=False)-----------------------------结果： Team Yellow Cards Red Cards6 Greece 9 19 Poland 7 111 Republic of Ireland 6 17 Italy 16 010 Portugal 12 013 Spain 11 00 Croatia 9 01 Czech Republic 7 014 Sweden 7 04 France 6 012 Russia 6 03 England 5 08 Netherlands 5 015 Ukraine 5 02 Denmark 4 05 Germany 4 0

计算球队拿到的黄牌数的平均值

round(ddyl['Yellow Cards'].mean(),0)-------------------------结果：7.0

找到进球数Goals超过6的球队数据

euro.Goals>6 #查询值为布尔值，判断是否大于6，结果如下--------------------------------结果：0 False1 False2 False3 False4 False5 True6 False7 False8 False9 False10 False11 False12 False13 True14 False15 FalseName: Goals, dtype: bool

euro[euro.Goals>6] #------------------------结果： Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used5 Germany 10 32 32 47.8% 15.6% 80 2 1 0 ... 10 62.6% 63 49 12 4 0 15 15 1713 Spain 12 42 33 55.9% 16.0% 100 0 1 0 ... 15 93.8% 102 83 19 11 0 17 17 18

选取以字母G开头的球队数据

euro[euro['Team'].str.startswith('G')]#或# euro[euro.Team.str.startswith('G')]----------------------------结果： Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used5 Germany 10 32 32 47.8% 15.6% 80 2 1 0 ... 10 62.6% 63 49 12 4 0 15 15 176 Greece 5 8 18 30.7% 19.2% 32 1 1 1 ... 13 65.1% 67 48 12 9 1 12 12 20

注解：函数：startswith()

作用：判断字符串是否以指定字符或子字符串开头

函数说明语法：string.startswith(str, beg=0,end=len(string)) 或string[beg:end].startswith(str)

参数说明：

string：被检测的字符串

str：指定的字符或者子字符串。（可以使用元组，会逐一匹配）

beg：设置字符串检测的起始位置（可选）

end：设置字符串检测的结束位置（可选）如果存在参数 beg 和 end，则在指定范围内检查，否则在整个字符串中检查

返回值如果检测到字符串，则返回True，否则返回False。默认空字符为True

选取几列（切片）

euro.iloc[:,:6]#或选取除了最后三列的其他列# euro.iloc[:,:-3]-----------------------------结果： Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots0 Croatia 4 13 12 51.9% 16.0%1 Czech Republic 4 13 18 41.9% 12.9%2 Denmark 4 10 10 50.0% 20.0%3 England 5 11 18 50.0% 17.2%4 France 3 22 24 37.9% 6.5%5 Germany 10 32 32 47.8% 15.6%6 Greece 5 8 18 30.7% 19.2%7 Italy 6 34 45 43.0% 7.5%8 Netherlands 2 12 36 25.0% 4.1%9 Poland 2 15 23 39.4% 5.2%10 Portugal 6 22 42 34.3% 9.3%11 Republic of Ireland 1 7 12 36.8% 5.2%12 Russia 5 9 31 22.5% 12.5%13 Spain 12 42 33 55.9% 16.0%14 Sweden 5 17 19 47.2% 13.8%15 Ukraine 2 7 26 21.2% 6.0%

注解：

loc是指location的意思，iloc中的i是指integer。这两者的区别如下：

loc works on labels in the index. iloc works on the positions in the index (so it only takes integers) 也就是说loc是根据index和列名来索引，

如上table定义了一个index，那么loc就根据这个index来索引对应的行/列。

iloc是根据行/列号来索引

如果索引的列，可以用isin()函数：

#找到英格兰(England)、意大利(Italy)和俄罗斯(Russia)的射正率(Shooting Accuracy)euro.loc[euro['Team'].isin(['England', 'Italy', 'Russia']), ['Team','Shooting Accuracy']]-----------------------------结果： Team Shooting Accuracy3 England 50.0%7 Italy 43.0%12 Russia 22.5%

总结：

通过对数据切片：loc / iloc函数；sort_volues函数；str.startswith函数；isin函数的应用

可以方便的选取所需的数据，达到高效的目的性。

文章参考于 Github：https://github.com/guipsamora/pandas_exercises

数据集：需要请留言

寄予：厚积而薄发

上一篇：欧洲杯下单：欧洲杯赌球调查：高仿网站横行，高利润诱惑玩家入局

下一篇：2023葡萄牙欧洲杯足球赛程表：2024年德国欧洲杯详细，C罗最后一届欧洲杯