英超吧

2012欧洲杯排名函数:Python数据分析-数据初探2

发布日期 :

本文主要目的:了解数据过滤与排序,熟悉Pandas数据分析库,应用于数据处理,提高数据处理效率



前途一片光明



import pandas as pdimport numpy as np


euro=pd.read_csv(r'C:UsersAdministratorDesktopexercise_dataEuro2012_stats.csv')euro------------------------------结果: Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used0 Croatia 4 13 12 51.9% 16.0% 32 0 0 0 ... 13 81.3% 41 62 2 9 0 9 9 161 Czech Republic 4 13 18 41.9% 12.9% 39 0 0 0 ... 9 60.1% 53 73 8 7 0 11 11 192 Denmark 4 10 10 50.0% 20.0% 27 1 0 0 ... 10 66.7% 25 38 8 4 0 7 7 153 England 5 11 18 50.0% 17.2% 40 0 0 0 ... 22 88.1% 43 45 6 5 0 11 11 16


euro['Goals']#或euro.Goals--------------------------结果:0 41 42 43 54 35 106 57 68 29 210 611 112 513 1214 515 2Name: Goals, dtype: int64


euro['Team'].nunique()#或euro.shape[0]-------------------------------结果:16


euro.shape[1]---------------------------结果:35


ddyl=euro[['Team','Yellow Cards','Red Cards']]ddyl--------------------------结果: Team Yellow Cards Red Cards0 Croatia 9 01 Czech Republic 7 02 Denmark 4 03 England 5 04 France 6 05 Germany 4 06 Greece 9 17 Italy 16 08 Netherlands 5 09 Poland 7 110 Portugal 12 011 Republic of Ireland 6 112 Russia 6 013 Spain 11 014 Sweden 7 015 Ukraine 5 0


ddyl.sort_values(['Red Cards','Yellow Cards'],ascending=False)-----------------------------结果: Team Yellow Cards Red Cards6 Greece 9 19 Poland 7 111 Republic of Ireland 6 17 Italy 16 010 Portugal 12 013 Spain 11 00 Croatia 9 01 Czech Republic 7 014 Sweden 7 04 France 6 012 Russia 6 03 England 5 08 Netherlands 5 015 Ukraine 5 02 Denmark 4 05 Germany 4 0


round(ddyl['Yellow Cards'].mean(),0)-------------------------结果:7.0


euro.Goals>6 #查询值为布尔值,判断是否大于6,结果如下--------------------------------结果:0 False1 False2 False3 False4 False5 True6 False7 False8 False9 False10 False11 False12 False13 True14 False15 FalseName: Goals, dtype: bool

euro[euro.Goals>6] #------------------------结果: Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used5 Germany 10 32 32 47.8% 15.6% 80 2 1 0 ... 10 62.6% 63 49 12 4 0 15 15 1713 Spain 12 42 33 55.9% 16.0% 100 0 1 0 ... 15 93.8% 102 83 19 11 0 17 17 18


euro[euro['Team'].str.startswith('G')]#或# euro[euro.Team.str.startswith('G')]----------------------------结果: Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots Total shots (inc. Blocked) Hit Woodwork Penalty goals Penalties not scored ... Saves made Saves-to-shots ratio Fouls Won Fouls Conceded Offsides Yellow Cards Red Cards Subs on Subs off Players Used5 Germany 10 32 32 47.8% 15.6% 80 2 1 0 ... 10 62.6% 63 49 12 4 0 15 15 176 Greece 5 8 18 30.7% 19.2% 32 1 1 1 ... 13 65.1% 67 48 12 9 1 12 12 20

注解: 函数:startswith()


作用:判断字符串是否以指定字符或子字符串开头


函数说明 语法:string.startswith(str, beg=0,end=len(string)) 或string[beg:end].startswith(str)


参数说明:


string: 被检测的字符串


str: 指定的字符或者子字符串。(可以使用元组,会逐一匹配)


beg: 设置字符串检测的起始位置(可选)


end: 设置字符串检测的结束位置(可选) 如果存在参数 beg 和 end,则在指定范围内检查,否则在整个字符串中检查


返回值 如果检测到字符串,则返回True,否则返回False。默认空字符为True



euro.iloc[:,:6]#或选取除了最后三列的其他列# euro.iloc[:,:-3]-----------------------------结果: Team Goals Shots on target Shots off target Shooting Accuracy % Goals-to-shots0 Croatia 4 13 12 51.9% 16.0%1 Czech Republic 4 13 18 41.9% 12.9%2 Denmark 4 10 10 50.0% 20.0%3 England 5 11 18 50.0% 17.2%4 France 3 22 24 37.9% 6.5%5 Germany 10 32 32 47.8% 15.6%6 Greece 5 8 18 30.7% 19.2%7 Italy 6 34 45 43.0% 7.5%8 Netherlands 2 12 36 25.0% 4.1%9 Poland 2 15 23 39.4% 5.2%10 Portugal 6 22 42 34.3% 9.3%11 Republic of Ireland 1 7 12 36.8% 5.2%12 Russia 5 9 31 22.5% 12.5%13 Spain 12 42 33 55.9% 16.0%14 Sweden 5 17 19 47.2% 13.8%15 Ukraine 2 7 26 21.2% 6.0%

注解:


loc是指location的意思,iloc中的i是指integer。这两者的区别如下:


loc works on labels in the index. iloc works on the positions in the index (so it only takes integers) 也就是说loc是根据index和列名来索引,


如上table定义了一个index,那么loc就根据这个index来索引对应的行/列。


iloc是根据行/列号来索引


如果索引的列,可以用isin()函数:


#找到英格兰(England)、意大利(Italy)和俄罗斯(Russia)的射正率(Shooting Accuracy)euro.loc[euro['Team'].isin(['England', 'Italy', 'Russia']), ['Team','Shooting Accuracy']]-----------------------------结果: Team Shooting Accuracy3 England 50.0%7 Italy 43.0%12 Russia 22.5%


总结:


通过对数据切片:loc / iloc函数;sort_volues函数 ;str.startswith函数;isin函数的应用


可以方便的选取所需的数据,达到高效的目的性。



文章参考于 Github:https://github.com/guipsamora/pandas_exercises


数据集:需要请留言


寄予:厚积而薄发


相关文章 
统计代码