일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
Tags
- K 데이터 자격시험
- 검정수수료
- separating data(데이터 분리하기)
- 시험 일정
- pythonML
- numpy
- teen learn
- matplotlib
- Seaborn
- 응시료
- 준비
- 빅데이터 분석기사
- context manger1
- List Comprehension
Archives
- Today
- Total
재원's 블로그
pandas 기본문법과 함수 본문
최초 작성일 : 2021-11-06
categories: Pandas
-판다스(pandas) 불러오기
import pandas as pd
print(pd.__version__)
1.1.5
-테스트
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
print(type(df))
<class 'pandas.core.frame.DataFrame'>
-데이터 둘러보기
lemonade.head(5)
Date | Location | Lemon | Orange | Temperature | Leaflets | Price | |
---|---|---|---|---|---|---|---|
0 | 7/1/2016 | Park | 97 | 67 | 70 | 90.0 | 0.25 |
1 | 7/2/2016 | Park | 98 | 67 | 72 | 90.0 | 0.25 |
2 | 7/3/2016 | Park | 110 | 77 | 71 | 104.0 | 0.25 |
3 | 7/4/2016 | Beach | 134 | 99 | 76 | 98.0 | 0.25 |
4 | 7/5/2016 | Beach | 159 | 118 | 78 | 135.0 | 0.25 |
lemonade.tail(3)
Date | Location | Lemon | Orange | Temperature | Leaflets | Price | |
---|---|---|---|---|---|---|---|
29 | 7/29/2016 | Park | 100 | 66 | 81 | 95.0 | 0.35 |
30 | 7/30/2016 | Beach | 88 | 57 | 82 | 81.0 | 0.35 |
31 | 7/31/2016 | Beach | 76 | 47 | 82 | 68.0 | 0.35 |
print(lemonade.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 31 non-null object
1 Location 32 non-null object
2 Lemon 32 non-null int64
3 Orange 32 non-null int64
4 Temperature 32 non-null int64
5 Leaflets 31 non-null float64
6 Price 32 non-null float64
dtypes: float64(2), int64(3), object(2)
memory usage: 1.9+ KB
None
lemonade.describe()
Lemon | Orange | Temperature | Leaflets | Price | |
---|---|---|---|---|---|
count | 32.000000 | 32.000000 | 32.000000 | 31.000000 | 32.000000 |
mean | 116.156250 | 80.000000 | 78.968750 | 108.548387 | 0.354687 |
std | 25.823357 | 21.863211 | 4.067847 | 20.117718 | 0.113137 |
min | 71.000000 | 42.000000 | 70.000000 | 68.000000 | 0.250000 |
25% | 98.000000 | 66.750000 | 77.000000 | 90.000000 | 0.250000 |
50% | 113.500000 | 76.500000 | 80.500000 | 108.000000 | 0.350000 |
75% | 131.750000 | 95.000000 | 82.000000 | 124.000000 | 0.500000 |
max | 176.000000 | 129.000000 | 84.000000 | 158.000000 | 0.500000 |
lemonade['Location'].value_counts()
Beach 17
Park 15
Name: Location, dtype: int64
-데이터 다뤄보기
lemonade['Sold'] = 0
print(lemonade.head(3))
Date Location Lemon Orange Temperature Leaflets Price Sold
0 7/1/2016 Park 97 67 70 90.0 0.25 0
1 7/2/2016 Park 98 67 72 90.0 0.25 0
2 7/3/2016 Park 110 77 71 104.0 0.25 0
lemonade['Sold'] = lemonade['Lemon'] + lemonade['Orange']
print(lemonade.head(3))
Date Location Lemon Orange Temperature Leaflets Price Sold
0 7/1/2016 Park 97 67 70 90.0 0.25 164
1 7/2/2016 Park 98 67 72 90.0 0.25 165
2 7/3/2016 Park 110 77 71 104.0 0.25 187
-데이터 인덱싱
print(lemonade[0:5])
Date Location Lemon Orange Temperature Leaflets Price Sold
0 7/1/2016 Park 97 67 70 90.0 0.25 164
1 7/2/2016 Park 98 67 72 90.0 0.25 165
2 7/3/2016 Park 110 77 71 104.0 0.25 187
3 7/4/2016 Beach 134 99 76 98.0 0.25 233
4 7/5/2016 Beach 159 118 78 135.0 0.25 277
lemonade['Location'] == 'Beach'
0 False
1 False
2 False
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
13 True
14 True
15 True
16 True
17 True
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
30 True
31 True
Name: Location, dtype: bool
print(lemonade[lemonade['Location'] == 'Beach'].head(3))
Date Location Lemon Orange Temperature Leaflets Price Sold
3 7/4/2016 Beach 134 99 76 98.0 0.25 233
4 7/5/2016 Beach 159 118 78 135.0 0.25 277
5 7/6/2016 Beach 103 69 82 90.0 0.25 172
-기본 데이터 전처리
print(lemonade.sort_values(by=['Temperature']).head(5))
Date Location Lemon Orange Temperature Leaflets Price Sold
0 7/1/2016 Park 97 67 70 90.0 0.25 164
20 7/20/2016 Park 71 42 70 NaN 0.50 113
2 7/3/2016 Park 110 77 71 104.0 0.25 187
1 7/2/2016 Park 98 67 72 90.0 0.25 165
16 7/16/2016 Beach 81 50 74 90.0 0.50 131
lemonade.sort_values(by=['Temperature', 'Orange'], ascending= False, inplace = True)
print(lemonade.loc[:,['Date','Temperature', 'Orange']].head(5))
Date Temperature Orange
25 7/25/2016 84 113
12 7/12/2016 84 95
26 7/26/2016 83 129
11 7/11/2016 83 120
10 7/10/2016 82 98
print(lemonade.groupby(by='Location').count())
Date Lemon Orange Temperature Leaflets Price Sold
Location
Beach 16 17 17 17 17 17 17
Park 15 15 15 15 14 15 15
print(lemonade.groupby('Location')['Price'].agg([max,min]))
max min
Location
Beach 0.5 0.25
Park 0.5 0.25
'Pandas' 카테고리의 다른 글
bar graph2(막대그래프) (0) | 2023.01.21 |
---|---|
separating data(데이터 분리하기) (0) | 2023.01.20 |