Micro tutorial: Calculating Weighted Averages with Weighting applied to Multiple Columns in a Pandas Dataframe

I am recording this solution here because when searching for how to do this, I found many answers about how to weight one Pandas column by another column and I had already successfully done that by myself. For me, the trick was to apply one weighting column across several columns. For example, if I mix multiple batches into one tank, I might want to find the weighted average of all the attributes I am tracking for that lot.

# make a list of all the column names to which the weighting will be applied
var_cols = ['chiral_purity', 'water_content', '[impurity]']

# each Input Batch property is weighted by how much was added to the tank by weight where the master dataframe holding all batch data is called dfA
Output_Lot = dfA.groupby('InputBatch').apply(lambda x: pd.Series([sum(x[v] * x.InputWeight) / sum(x.InputWeight) for v in var_cols]))

# Use the input column property names with the suffix "_mix" in the Output Lot dataframe to indicate the weighting has been performed
Output_Lot.columns = [e+'_mix' for e in var_cols]

This solution was inspired by this post.

Leave a Reply

Your email address will not be published. Required fields are marked *