In this series on Original Research, I will be sharing about my findings from some of the mini-projects that I have carried out on my own. Fast-and-frugal trees (FFTs) are a specific type of classification decision tree with sequentially ordered cues, where every cue has two branches and one branch is an exit point (Martignon … Continue reading Original Research: Is Using Fast-and-Frugal Trees Better Than Machine-Learning Trees?

# Category: Statistics

# IMDb vs Rotten Tomatoes: The Wisdom of Crowd Goes to The Movies

It is once again the end of the year and movies are starting to flood the theatres. Spoilt for choice, my friends and I had to rely on movie review sites to decide which movie to catch. After checking that "Venom" scored a decent 7.0/10 on IMDb, I suggested that we could go see that … Continue reading IMDb vs Rotten Tomatoes: The Wisdom of Crowd Goes to The Movies

# Fuzzy Buzzy: Sussing Out the “Fuzzy Logic” of Buzzwords in Data Science

Disclaimer: This post does not involve the actual Fuzzy Logic. The term was originally intended as just a pun, but I later realised that it also demonstrates how the improper use of buzzwords can be quite misleading. Apologies for any confusion caused. With the growing popularity of Data Science, many buzzwords have been loosely thrown … Continue reading Fuzzy Buzzy: Sussing Out the “Fuzzy Logic” of Buzzwords in Data Science

# Confused by The Confusion Matrix Part 2: ‘Accuracy’ is But One of Many Measures of Accuracy…

In the previous post, I explained the concept of interpreting a confusion matrix by clarifying all the different terms that actually mean the same thing. Unfortunately, the confusion does not end there. If you finished reading the previous post, you should have probably come to the realisation that a simple Hit Rate or False Alarm … Continue reading Confused by The Confusion Matrix Part 2: ‘Accuracy’ is But One of Many Measures of Accuracy…

# Confused by The Confusion Matrix: What’s the difference between Hit Rate, True Positive Rate, Sensitivity, Recall and Statistical Power?

If you tried to answer the question in the title, you'll be disappointed to find out that it is actually a trick question - there is essentially no difference in the listed terms. Just like the issue mentioned in ANCOVA and Moderation, different terms are often used for the same thing, especially when they belong … Continue reading Confused by The Confusion Matrix: What’s the difference between Hit Rate, True Positive Rate, Sensitivity, Recall and Statistical Power?

# Bayesian Analysis & The Replication Crisis: A Layperson’s Perspective

Disclaimer: I do not have any experience in using Bayesian analyses, but I have been trying to understand the fundamental concepts. If anyone more knowledgeable in this area spots any mistake, please feel free to point it out in the comments. The Bayesian approach to statistical analysis has been gaining popularity in recent years, in … Continue reading Bayesian Analysis & The Replication Crisis: A Layperson’s Perspective

# Demystifying Statistical Analysis 7: Data Transformations and Non-Parametric Tests

Upon reading the post title, some might be wondering why are "Data Transformations" and "Non-Parametric Tests" being introduced together in the same post. Non-parametric tests are usually introduced together with parametric tests, but I have seemed to leave them out when I shared a cheat sheet on statistical analyses at the start of this series. … Continue reading Demystifying Statistical Analysis 7: Data Transformations and Non-Parametric Tests

# Demystifying Statistical Analysis 6: Moderation & Mediation (Why Are They Even Paired Together?!)

Differentiating the terms "Moderation" and "Mediation" is the bane of many students' lives in learning statistics. Because the two analyses deal with extraneous variables, and they both sound very similar, people often confuse them together. But if we uncover the essence of what each analysis does, their differences become so distinct to the point you … Continue reading Demystifying Statistical Analysis 6: Moderation & Mediation (Why Are They Even Paired Together?!)

# Demystifying Statistical Analysis 5: The ANCOVA Expressed in Linear Regression

The previous part of this series showed how the factorial ANOVA can be expressed in a linear regression when more than one categorical predictors are required in an analysis. The factorial ANOVA introduced the concept of interaction effects, and explained why contrast coding is better than dummy coding when the analysis is conducted as a … Continue reading Demystifying Statistical Analysis 5: The ANCOVA Expressed in Linear Regression

# Demystifying Statistical Analysis 4: The Factorial ANOVA Expressed in Linear Regression

The one-way ANOVA was introduced in the previous part of this series, and I explained how the analysis can be done using linear regression, as well as illustrated how polynomial relationships can be tested with the help of contrast codes. But datasets often contain more than one categorical predictors, and a factorial ANOVA is required … Continue reading Demystifying Statistical Analysis 4: The Factorial ANOVA Expressed in Linear Regression