0

I am trying to compute class imbalance in each dataset and my approach was to check average and standard deviation of the counts. The average is the total number of samples in class 1 / total number of samples in the dataset.

 #Count the number of instances in each class
class_counts = df[class_column].value_counts()
total_samples = len(df)

#Calculate imbalance ratios
imbalance_ratios = {class_name: count / total_samples for class_name, count in class_counts.items()}

#Calculate average and standard deviation of imbalance ratios
ratios = list(imbalance_ratios.values())
avg = 1 / len(ratios)  # mean
stddev = statistics.stdev(ratios)

#Determine lower and upper limits for identifying imbalance
lower_limit = avg - stddev
upper_limit = avg + stddev

#Identify classes with imbalance
imbalance_classes = [class_name for class_name, ratio in imbalance_ratios.items() 
                     if ratio < lower_limit or ratio > upper_limit]

print(imbalance_classes)

But on keel website they are calculating the class imbalance relative to the majority class.

#Count the number of instances in each class
class_counts = df[class_column].value_counts()

#Calculate the imbalance ratio relative to the majority class
majority_class_count = (class_counts).max()
imbalance_ratios = {}

for cls, count in class_counts.items():
    imbalance_ratio = majority_class_count / count
    imbalance_ratios[cls] = imbalance_ratio

print("Imbalance ratios:")
for cls, ratio in imbalance_ratios.items():
    print(f"Class {cls}: {ratio:.2f}")

I am confused about which one to use; as per my approach, I will get some of the datasets balanced, but on the keel website, those are mentioned as imbalanced. My aim is to identify how many classes are imbalanced and list them.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.