Correcting Skewed Data with Scipy and Numpy

Publicado 2023-03-04
Skewed data can adversely affect your analysis and machine learning models. In this video, I demonstrate five methods for cleaning skewed data using the NumPy and SciPy modules. The methods include taking the square root, cube root, fourth root, log, and Yeo-Johnson transform. I also showcase the effectiveness of each method by summarizing the skewness of the data after each transformation with a bar plot.

Todos los comentarios (14)
  • @metinunlu_
    Thank you for the video, subscribed! Youtube needs more quality content like this.
  • @mushinart
    Outstanding explanation, professor
  • Amazing video I like it's structure: motivation, overview with examples, practical advices Thanks!
  • @nicolaslpf
    Amazing video! I was creating a function for measuring the same you forgot to name log1p Wich is log of (x+1) really useful for right skewed data with values less than 1
  • @dannybee9068
    Thank you! That was helpful! So we basically can make the root of any power? Is there a drawbag for exploiting it , like keep increasing the n value for feature to the power of 1/n?
  • @pabloagogo1
    This is interesting. If one corrects the original skewed data, via doing these kinds of transformations, in the context of linear regression or multiple linear regression, will that not change the interpretation of the original data. Curious to know.
  • @thoniasenna2330
    SUBSCRIBED! What should one do before? Or, what's the correct order? - treating outliers, impute missing values, correct symmetry? Thanks Dr. P!
  • Skewing doesn’t necessarily matter if you’re using XGBoost, correct? For classification or regression, that is
  • @pewkaboo
    What if my data contains a lot of useful '0' values?
  • Bro you explain a concept, but go you need the music!! It’s distracting