Pandas Example – Write a Pandas program to split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe

Hits: 10

(Python Example for Beginners)

 

Write a Pandas program to split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe.

 

Test Data:

    salesman_id  sale_jan  sale_feb  sale_mar  sale_apr  sale_may  sale_jun  
0          5002    150.50    250.50    150.50    150.50    130.50    150.50   
1          5005    270.65    170.65    270.65    270.65    270.65    270.65   
2          5001     65.26     15.26     65.26     95.26     65.26     45.26   
3          5003    110.50    110.50    110.50    210.50    310.50    110.50   
4          5002    948.50    598.50    948.50    948.50    948.50    948.50   
5          5001   2400.60   1400.60   2400.60   2400.60   2400.60   3400.60   
6          5001   1760.00   2760.00   5760.00    760.00    760.00   5760.00   
7          5006   2983.43   1983.43   1983.43   1983.43   1983.43    983.43   
8          5003    480.40   2480.40   2480.40   2480.40   2480.40   2480.40   
9          5002   1250.45    250.45    250.45    250.45    250.45    250.45   
10         5007     75.29     75.29     75.29     75.29     75.29     75.29   
11         5001   1045.60   3045.60   3045.60   3045.60   3045.60   3045.60   
    sale_jul  sale_aug  sale_sep  sale_oct  sale_nov  sale_dec  
0     950.50    150.50    150.50    150.50    150.50    150.50  
1     270.65     70.65    270.65    270.65    270.65     70.65  
2      65.26     65.26     65.26     65.26     95.26     65.26  
3     210.50    110.50    110.50    110.50    110.50    110.50  
4     948.50    948.50    948.50    948.50    948.50    948.50  
5    2400.60    400.60    200.60   2400.60   2400.60   2400.60  
6    5760.00   5760.00   5760.00   5760.00   5760.00   5760.00  
7     983.43   1983.43   1983.43   1983.43   1983.43   1983.43  
8    2480.40   2480.40   2480.40   2480.40   2480.40   2480.40  
9     250.45    250.45    250.45    250.45    250.45    250.45  
10     75.29     75.29     75.29     75.29     75.29     75.29  
11   3045.60   3045.60   3045.60   3045.60   3045.60   3045.60  

 

Sample Solution:

Python Code :


import pandas as pd

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

df = pd.DataFrame({
'salesman_id': [5002,5005,5001,5003,5002,5001,5001,5006,5003,5002,5007,5001],
'sale_jan':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 1760, 2983.43, 480.4,  1250.45, 75.29,1045.6],
'sale_feb':[250.5, 170.65, 15.26, 110.5, 598.5, 1400.6, 2760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_mar':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_apr':[150.5, 270.65, 95.26, 210.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_may':[130.5, 270.65, 65.26, 310.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jun':[150.5, 270.65, 45.26, 110.5, 948.5, 3400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jul':[950.5, 270.65, 65.26, 210.5, 948.5, 2400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_aug':[150.5, 70.65,  65.26, 110.5, 948.5, 400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_sep':[150.5, 270.65, 65.26, 110.5, 948.5, 200.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_oct':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_nov':[150.5, 270.65, 95.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6], 
'sale_dec':[150.5, 70.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6]
})

print("Original Orders DataFrame:")
print(df)

print("Result after group on salesman_id and apply different aggregate functions:")
df = df.groupby('salesman_id').agg(lambda x : x.sum() if x.name in ['sale_jan','sale_feb','sale_mar'] else x.mean())

print(df)

Sample Output:

Original Orders DataFrame:
    salesman_id  sale_jan  sale_feb  sale_mar  sale_apr  sale_may  sale_jun  
0          5002    150.50    250.50    150.50    150.50    130.50    150.50   
1          5005    270.65    170.65    270.65    270.65    270.65    270.65   
2          5001     65.26     15.26     65.26     95.26     65.26     45.26   
3          5003    110.50    110.50    110.50    210.50    310.50    110.50   
4          5002    948.50    598.50    948.50    948.50    948.50    948.50   
5          5001   2400.60   1400.60   2400.60   2400.60   2400.60   3400.60   
6          5001   1760.00   2760.00   5760.00    760.00    760.00   5760.00   
7          5006   2983.43   1983.43   1983.43   1983.43   1983.43    983.43   
8          5003    480.40   2480.40   2480.40   2480.40   2480.40   2480.40   
9          5002   1250.45    250.45    250.45    250.45    250.45    250.45   
10         5007     75.29     75.29     75.29     75.29     75.29     75.29   
11         5001   1045.60   3045.60   3045.60   3045.60   3045.60   3045.60   

    sale_jul  sale_aug  sale_sep  sale_oct  sale_nov  sale_dec  
0     950.50    150.50    150.50    150.50    150.50    150.50  
1     270.65     70.65    270.65    270.65    270.65     70.65  
2      65.26     65.26     65.26     65.26     95.26     65.26  
3     210.50    110.50    110.50    110.50    110.50    110.50  
4     948.50    948.50    948.50    948.50    948.50    948.50  
5    2400.60    400.60    200.60   2400.60   2400.60   2400.60  
6    5760.00   5760.00   5760.00   5760.00   5760.00   5760.00  
7     983.43   1983.43   1983.43   1983.43   1983.43   1983.43  
8    2480.40   2480.40   2480.40   2480.40   2480.40   2480.40  
9     250.45    250.45    250.45    250.45    250.45    250.45  
10     75.29     75.29     75.29     75.29     75.29     75.29  
11   3045.60   3045.60   3045.60   3045.60   3045.60   3045.60  
Result after group on salesman_id and apply different aggregate functions:
             sale_jan  sale_feb  sale_mar     sale_apr  sale_may     sale_jun  
salesman_id                                                                     
5001          5271.46   7221.46  11271.46  1575.365000  1567.865  3062.865000   
5002          2349.45   1099.45   1349.45   449.816667   443.150   449.816667   
5003           590.90   2590.90   2590.90  1345.450000  1395.450  1295.450000   
5005           270.65    170.65    270.65   270.650000   270.650   270.650000   
5006          2983.43   1983.43   1983.43  1983.430000  1983.430   983.430000   
5007            75.29     75.29     75.29    75.290000    75.290    75.290000   

                sale_jul     sale_aug     sale_sep     sale_oct     sale_nov  
salesman_id                                                                    
5001         2817.865000  2317.865000  2267.865000  2817.865000  2825.365000   
5002          716.483333   449.816667   449.816667   449.816667   449.816667   
5003         1345.450000  1295.450000  1295.450000  1295.450000  1295.450000   
5005          270.650000    70.650000   270.650000   270.650000   270.650000   
5006          983.430000  1983.430000  1983.430000  1983.430000  1983.430000   
5007           75.290000    75.290000    75.290000    75.290000    75.290000   

                sale_dec  
salesman_id               
5001         2817.865000  
5002          449.816667  
5003         1295.450000  
5005           70.650000  
5006         1983.430000  
5007           75.290000

 

Python Example for Beginners

Sign up to get end-to-end “Learn By Coding” example.


Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.