[Python pandas] 데이터프레임(dataframe)에서 그룹 기준으로 누적계산하기

2019.10.09 17:28데이터/머신러닝


소개
# 데이터 프레임에서 특정 칼럼이 유지될 동안 대상 칼럼의 값을 누적합니다.
# index 번호 기준으로 내려가다가 값이 바뀌면 해당 group에서 다시 시작합니다.
# 이전 group 값이 다시 시작될 경우 해당 값에서 다시 누적 계산이 시작됩니다.
자료 출처
# https://stackoverflow.com/questions/32847800/how-can-i-use-cumsum-within-a-group-in-pandas

라이브러리 및 데이터 불러오기

# 라이브러리
import pandas as pd

# 데이터
mock_data = pd.read_csv('MOCK_DATA.csv')
mock_data.head(3)
  id ssn first_name last_name email gender ip_address city country car_make car_model
0 1 352-83-7834 Josselyn Commuzzo jcommuzzo0@shop-pro.jp Female NaN Aībak Afghanistan GMC Yukon XL 2500
1 2 129-55-2802 Aksel Cubbin acubbin1@rakuten.co.jp Male NaN Gaya Nigeria Cadillac CTS
2 3 421-78-0056 Marwin Vagg NaN Male NaN Huangjiakou China Chevrolet Tahoe
누적할 대상 값을 임의로 생성하기(여기선 1을 값으로 삽입)
mock_data['count'] = 1
mock_data.head()
  id ssn first_name last_name email gender ip_address city country car_make car_model count
0 1 352-83-7834 Josselyn Commuzzo jcommuzzo0@shop-pro.jp Female NaN Aībak Afghanistan GMC Yukon XL 2500 1
1 2 129-55-2802 Aksel Cubbin acubbin1@rakuten.co.jp Male NaN Gaya Nigeria Cadillac CTS 1
2 3 421-78-0056 Marwin Vagg NaN Male NaN Huangjiakou China Chevrolet Tahoe 1
3 4 NaN Michele Freed NaN Male 95.30.236.76 Ferrol Spain Volvo NaN 1
4 5 821-40-7635 Skip Sallan ssallan4@nymag.com Male 200.83.222.220 Sheffield United Kingdom GMC Canyon 1
gender 칼럼을 기준으로 count 칼럼을 값을 계속 누적함(gender 칼럼 값이 바뀌면 새롭게 누적 시작)
mock_data['cumsum'] = mock_data.groupby('gender')['count'].cumsum()
mock_data.head(20)
  id ssn first_name last_name email gender ip_address city country car_make car_model count cumsum
0 1 352-83-7834 Josselyn Commuzzo jcommuzzo0@shop-pro.jp Female NaN Aībak Afghanistan GMC Yukon XL 2500 1 1
1 2 129-55-2802 Aksel Cubbin acubbin1@rakuten.co.jp Male NaN Gaya Nigeria Cadillac CTS 1 1
2 3 421-78-0056 Marwin Vagg NaN Male NaN Huangjiakou China Chevrolet Tahoe 1 2
3 4 NaN Michele Freed NaN Male 95.30.236.76 Ferrol Spain Volvo NaN 1 3
4 5 821-40-7635 Skip Sallan ssallan4@nymag.com Male 200.83.222.220 Sheffield United Kingdom GMC Canyon 1 4
5 6 854-88-6296 Belia Cushworth bcushworth5@blinklist.com Female 255.52.106.120 Nueve de Julio Argentina Dodge Ram 1 2
6 7 650-58-6287 Thane Baltzar tbaltzar6@infoseek.co.jp Male 89.251.85.176 Kikuchi Japan Pontiac Firefly 1 5
7 8 738-62-4156 Amery Sizeland asizeland7@yale.edu Male NaN Gunungmalang Satu Indonesia Lexus LX 1 6
8 9 NaN Forest Gambrell fgambrell8@a8.net Male 215.69.63.55 Goundi Chad Nissan NaN 1 7
9 10 344-02-8451 Fernande Mordey NaN Female NaN Nurlat Russia Dodge Caravan 1 3
10 11 586-17-7170 Damon Tine dtinea@toplist.cz Male 231.66.76.104 Gangou China NaN Tiguan 1 8
11 12 306-11-6909 Otis Harriott oharriottb@fema.gov Male NaN Francisco Morato Brazil Chevrolet Corvette 1 9
12 13 176-22-8123 Xever MacLachlan xmaclachlanc@msu.edu Male NaN Zalewo Poland Land Rover Discovery 1 10
13 14 150-25-6348 Roze Bestwick rbestwickd@creativecommons.org Female NaN Annopol Poland Ford F250 1 4
14 15 518-93-3198 Rafael Commander rcommandere@google.pl Male 67.252.156.166 Mandepa Barat Indonesia Buick Park Avenue 1 11
15 16 114-34-3979 Anett Crocket NaN Female NaN Hepo China NaN 9-3 1 5
16 17 200-90-4726 Stacee Glass sglassg@cargocollective.com Female 78.236.229.147 Passa Quatro Brazil Mitsubishi Galant 1 6
17 18 754-80-7014 Luca Ledes NaN Male 141.214.157.140 Sigay Philippines Maserati GranSport 1 12
18 19 NaN Che Marchelli cmarchellii@360.cn Male 27.46.24.17 Palaiochóri Greece Mitsubishi NaN 1 13
19 20 258-76-4616 Pascal Andreutti pandreuttij@usda.gov Male 152.27.178.79 Além do Rio Portugal Chevrolet Impala 1 14