读书笔记6pandas简单使用-白红宇

强烈建议你试试无所不能的chatGPT，快点击我

读书笔记6pandas简单使用

阅读量：5988 次

发布时间：2019-06-20

本文共 5849 字，大约阅读时间需要 19 分钟。

一、序列Series，很像numpy中的array数组，可以由列表、元组、字典、numpy中的array来初始化

>>> from pandas import Series>>> s = Series([0.1, 1.2, 2.3, 3.4, 4.5])>>> s0 0.11 1.22 2.33 3.44 4.5dtype: float64

2、序列也可以由标签组成，默认是由数字表示。

>>> s = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = [’a’,’b’,’c’,’d’,’e’])>>> sa 0.1b 1.2c 2.3d 3.4e 4.5dtype: float64

索引的话可以由数字、标签、真值表、切片

from pandas import Seriess = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = ['a','b','c','d','e'])s[1]Out[36]:1.2

from pandas import Seriess = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = ['a','b','c','d','e'])print s[1],'\n'print s[1:4],'\n'print s[s>3],'\n'print s[[1,2,3]]1.2 b    1.2c    2.3d    3.4dtype: float64 d    3.4e    4.5dtype: float64 b    1.2c    2.3d    3.4dtype: float64

二、序列的常用函数

1、head and tail来显示头部5行或末尾5行数据，也可以通过传递参数来修改显示的行数

from pandas import Seriess = Series([0.1, 1.2, 2.3, 3.4, 4.5], index = ['a','b','c','d','e'])print s.head(),'\n'print s.head(2) a    0.1b    1.2c    2.3d    3.4e    4.5dtype: float64 a    0.1b    1.2dtype: float64

2、isnull and notnull返回等长的序列，

3、describe返回序列的一些统计特性

from pandas import Seriesimport numpy as nps=Series(np.arange(1.0,10))s.describe()Out[43]:count    9.000000mean     5.000000std      2.738613min      1.00000025%      3.00000050%      5.00000075%      7.000000max      9.000000dtype: float64

4、unique and nunique，返回不重复的数据集或者重复的数据集

5、drop(labels) 删除制定标签的数据，dropna()是删除NaN数据

6、append(series) 添加数据

from pandas import Seriesimport numpy as nps=Series(np.arange(1.0,10))s2=Series([22,33,44,55])print s.append(s2)0     1.01     2.02     3.03     4.04     5.05     6.06     7.07     8.08     9.00    22.01    33.02    44.03    55.0dtype: float64

7、replace(series,values) 将series数据集中的数据替换成values数据集

注意：这个替换是将替换后的数据返回，而不是在原来的数据集上做替换

from pandas import Seriesimport numpy as nps=Series(np.arange(1.0,10))s2=Series([22,33,44,55])s3=s.append(s2)print s3.replace([2,5,8],[22,55,99])s30     1.01    22.02     3.03     4.04    55.05     6.06     7.07    99.08     9.00    22.01    33.02    44.03    55.0dtype: float64Out[51]:0     1.01     2.02     3.03     4.04     5.05     6.06     7.07     8.08     9.00    22.01    33.02    44.03    55.0dtype: float64

8、update(series)用series来更新，只更新匹配上标签的数据

注意：是在原来数据集上做更新

>>> s1 = Series(arange(1.0,4.0),index=[’a’,’b’,’c’])>>> s1a 1b 2c 3dtype: float64>>> s2 = Series(-1.0 * arange(1.0,4.0),index=[’c’,’d’,’e’])>>> s1.update(s2)>>> s1a 1b 2c -1dtype: float64

9、数据框架，DataFrame，相当于array上的二维数组，区别于array数组的地方时它可以是不同数据类型的数据组合在一起

from pandas import DataFramea=np.array([[1,2],[3,4]]);df=DataFrame(a)dfOut[52]:     0    10    1    21    3    4

>>> df = DataFrame(array([[1,2],[3,4]]),columns=[’a’,’b’])

>>> df

a b

0 1 2

1 3 4

也可以指定行标签和列标签

>>> df = DataFrame(array([[1,2],[3,4]]), columns=[’dogs’,’cats’], index=[’Alice’,’Bob’])>>> dfdogs catsAlice 1 2Bob 3 4

10、也可以通过字典来初始化DataFrame

11、也可以指定列标签

>>> df = DataFrame(array([[1,2],[3,4]]), columns=[’dogs’,’cats’], index=[’Alice’,’Bob’])

>>> df

dogs cats

Alice 1 2

Bob 3 4

二、操作数据框架，工作目录中有一个excel文件可以用，我的是score.xlsx

1、读取数据

2、选择列可以直接是列名或者列明组成的列表

3、选择行可以是列标签或者列标签组成的列表,也可以是数字切片、真值表

from pandas import read_excel
score = read_excel('score.xlsx','Sheet1')
score[:1]

Out[20]:

	序号	english	math	chinese	physics	chemistry	biology
0	1501	56	65	89	45	87	98

from pandas import read_excel
score = read_excel('score.xlsx','Sheet1')
t=score[(score.english>60) & (score.english<70)]
t

Out[22]:

	序号	english	math	chinese	physics	chemistry	biology
2	1503	65	78	68	86	78	87
5	1506	64	67	82	76	78	73

4、选择行和列，需要使用ix[rowselector,colselector]

5、添加列跟字典用法差不多

>>> state_gdp_2012 = state_gdp[[’state’,’gdp_2012’]]>>> state_gdp_2012.head()state gdp_20120 Alabama 1572721 Alaska 447322 Arizona 2306413 Arkansas 938924 California 1751002>>> state_gdp_2012[’gdp_growth_2012’] = state_gdp[’gdp_growth_2012’]>>> state_gdp_2012.head()state gdp_2012 gdp_growth_20120 Alabama 157272 1.21 Alaska 44732 1.12 Arizona 230641 2.63 Arkansas 93892 1.3

或者insert(location,column_name,series)

>>> state_gdp_2012 = state_gdp[[’state’,’gdp_2012’]]

>>> state_gdp_2012.insert(1,’gdp_growth_2012’,state_gdp[’gdp_growth_2012’])

>>> state_gdp_2012.head()

state gdp_growth_2012 gdp_2012

0 Alabama 1.2 157272

1 Alaska 1.1 44732

2 Arizona 2.6 230641

3 Arkansas 1.3 93892

4 California 3.5 1751002

6、修改数据

from pandas import read_excelscore = read_excel('score.xlsx','Sheet1')print score[:3]score.ix[0,'english']=90print score[:3]     序号  english  math  chinese  physics  chemistry  biology0  1501       56    65       89       45         87       981  1502       45    65       89       78         98       892  1503       65    78       68       86         78       87     序号  english  math  chinese  physics  chemistry  biology0  1501       90    65       89       45         87       981  1502       45    65       89       78         98       892  1503       65    78       68       86         78       87

7、删除列，可以使用del关键字、pop(column) 方法、drop(list of columns,axis=1)

from pandas import Seriesfrom pandas import read_excelscore = read_excel('score.xlsx','Sheet1')scorecopy = score.copy()print score[:2]score.pop('biology')print score[:2]     序号  english  math  chinese  physics  chemistry  biology0  1501       56    65       89       45         87       981  1502       45    65       89       78         98       89     序号  english  math  chinese  physics  chemistry0  1501       56    65       89       45         871  1502       45    65       89       78         98

8、 dropna 删除含有Nan的行或者列，and drop_duplicates

9、fillna(value=value )将所有的Nan数据替换成所附的值

>>> df = DataFrame(array([[1, nan],[nan, 2]]))

>>> df.columns = [’one’,’two’]

>>> replacements = {’one’:-1, ’two’:-2}

>>> df.fillna(value=replacements)

one two

0 1 -2

1 -1 2

10、sort

>>> df = DataFrame(array([[1, 3],[1, 2],[3, 2],[2,1]]), columns=[’one’,’two’])

>>> df.sort(columns=’one’)

one two

0 1 3

1 1 2

3 2 1

2 3 2

>>> df.sort(columns=[’one’,’two’], ascending=[0,1])

one two

2 3 2

3 2 1

1 1 2

0 1 3

转载地址：http://qbnlx.baihongyu.com/

你可能感兴趣的文章

总结大中小型项目的git流程

z-index作用于position为非static的元素上

信息安全原理与实践（第2版）

tcpdump的基本使用

开源免费 java CMS - FreeCMS-数据对象-info

开源 java CMS - FreeCMS2.8 依申请公开

JS判断浏览器 IE7,IE6,Mozilla

python 下载指定网页上得图片

Vue2 无限级分类(添加,删除,修改)

JavaScript 获取元素的css属性

php计算字节数（含中文）

开源模块 Openerp Web PDF Report Preview & Print 简介 ...

YinXiangMa_SDK_For_DotNet_V2.0接口开发者使用说明

Spring配置事务五种方式

Docker搭建gitLab环境

AWK的使用方法

缓冲区溢出攻击

文件上传下载

实现 javacript JSONP 跨域

java 字节流与字符流的区别

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！-- 愿君每日到此一游！

当前时间: 2024-12-31 03:23:48 当前IP: 18.223.108.134 联系邮箱:javaeecc@qq.com Copyright © 2020 - 2022 baihongyu.com 京ICP备2021015314号-2

强烈建议你试试无所不能的CHAT-GPT，快点击我

强烈建议你试试无所不能的CHAT-GPT，快点击我