재원's 블로그

pandas 기본문법과 함수 본문

Pandas

pandas 기본문법과 함수

KimJ.W 2023. 1. 21. 19:10

최초 작성일 : 2021-11-06
categories: Pandas


-판다스(pandas) 불러오기

import pandas as pd
print(pd.__version__)
1.1.5

-테스트

df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
print(type(df))
<class 'pandas.core.frame.DataFrame'>

-데이터 둘러보기

lemonade.head(5)

Date Location Lemon Orange Temperature Leaflets Price
0 7/1/2016 Park 97 67 70 90.0 0.25
1 7/2/2016 Park 98 67 72 90.0 0.25
2 7/3/2016 Park 110 77 71 104.0 0.25
3 7/4/2016 Beach 134 99 76 98.0 0.25
4 7/5/2016 Beach 159 118 78 135.0 0.25
lemonade.tail(3)

Date Location Lemon Orange Temperature Leaflets Price
29 7/29/2016 Park 100 66 81 95.0 0.35
30 7/30/2016 Beach 88 57 82 81.0 0.35
31 7/31/2016 Beach 76 47 82 68.0 0.35
print(lemonade.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         31 non-null     object 
 1   Location     32 non-null     object 
 2   Lemon        32 non-null     int64  
 3   Orange       32 non-null     int64  
 4   Temperature  32 non-null     int64  
 5   Leaflets     31 non-null     float64
 6   Price        32 non-null     float64
dtypes: float64(2), int64(3), object(2)
memory usage: 1.9+ KB
None
lemonade.describe()

Lemon Orange Temperature Leaflets Price
count 32.000000 32.000000 32.000000 31.000000 32.000000
mean 116.156250 80.000000 78.968750 108.548387 0.354687
std 25.823357 21.863211 4.067847 20.117718 0.113137
min 71.000000 42.000000 70.000000 68.000000 0.250000
25% 98.000000 66.750000 77.000000 90.000000 0.250000
50% 113.500000 76.500000 80.500000 108.000000 0.350000
75% 131.750000 95.000000 82.000000 124.000000 0.500000
max 176.000000 129.000000 84.000000 158.000000 0.500000
lemonade['Location'].value_counts()
Beach    17
Park     15
Name: Location, dtype: int64

-데이터 다뤄보기

lemonade['Sold'] = 0 
print(lemonade.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0  7/1/2016     Park     97      67           70      90.0   0.25     0
1  7/2/2016     Park     98      67           72      90.0   0.25     0
2  7/3/2016     Park    110      77           71     104.0   0.25     0
lemonade['Sold'] = lemonade['Lemon'] + lemonade['Orange']
print(lemonade.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0  7/1/2016     Park     97      67           70      90.0   0.25   164
1  7/2/2016     Park     98      67           72      90.0   0.25   165
2  7/3/2016     Park    110      77           71     104.0   0.25   187

-데이터 인덱싱

print(lemonade[0:5])
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0  7/1/2016     Park     97      67           70      90.0   0.25   164
1  7/2/2016     Park     98      67           72      90.0   0.25   165
2  7/3/2016     Park    110      77           71     104.0   0.25   187
3  7/4/2016    Beach    134      99           76      98.0   0.25   233
4  7/5/2016    Beach    159     118           78     135.0   0.25   277
lemonade['Location'] == 'Beach'
0     False
1     False
2     False
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12     True
13     True
14     True
15     True
16     True
17     True
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
30     True
31     True
Name: Location, dtype: bool
print(lemonade[lemonade['Location'] == 'Beach'].head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
3  7/4/2016    Beach    134      99           76      98.0   0.25   233
4  7/5/2016    Beach    159     118           78     135.0   0.25   277
5  7/6/2016    Beach    103      69           82      90.0   0.25   172

-기본 데이터 전처리

print(lemonade.sort_values(by=['Temperature']).head(5))
         Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0    7/1/2016     Park     97      67           70      90.0   0.25   164
20  7/20/2016     Park     71      42           70       NaN   0.50   113
2    7/3/2016     Park    110      77           71     104.0   0.25   187
1    7/2/2016     Park     98      67           72      90.0   0.25   165
16  7/16/2016    Beach     81      50           74      90.0   0.50   131
lemonade.sort_values(by=['Temperature', 'Orange'], ascending= False, inplace = True)
print(lemonade.loc[:,['Date','Temperature', 'Orange']].head(5))
         Date  Temperature  Orange
25  7/25/2016           84     113
12  7/12/2016           84      95
26  7/26/2016           83     129
11  7/11/2016           83     120
10  7/10/2016           82      98
print(lemonade.groupby(by='Location').count())
          Date  Lemon  Orange  Temperature  Leaflets  Price  Sold
Location                                                         
Beach       16     17      17           17        17     17    17
Park        15     15      15           15        14     15    15
print(lemonade.groupby('Location')['Price'].agg([max,min]))
          max   min
Location           
Beach     0.5  0.25
Park      0.5  0.25

'Pandas' 카테고리의 다른 글

bar graph2(막대그래프)  (0) 2023.01.21
separating data(데이터 분리하기)  (0) 2023.01.20