How would you find correlation between a categorical variable and a continuous variable?
Distance Metrics: Although the concept of "distance" is often not synonymous with "correlation," distance metrics can nevertheless be used to compute the similarity between vectors, which is conceptually similar to other measures of correlation. There are many other distance metrics, and my intent here is less to introduce you to all the different ways in which distance between two points can be calculated, and more to introduce the general notion of distance metrics as an approach to measure similarity or correlation. I have noted ten commonly used distance metrics below for this purpose.
Contingency Table Analysis: When comparing two categorical variables, by counting the frequencies of the categories we can easily convert the original vectors into contingency tables. For example, imagine you wanted to see if there is a correlation between being a man and getting a science grant (unfortunately, there is a correlation but that's a matter for another day). Your data might have two columns in this case — one for gender which would be Male or Female (assume a binary world for this case) and another for grant (Yes or No). We could take the data from these columns and represent it as a cross tabulation by calculating the pair-wise frequencies.
Learn More :
- What features would you use to predict the Uber ETA for ride requests?
- How would you evaluate the predictions of an Uber ETA model?
- Describe how you would build a model to predict Uber ETAs after a rider requests a ride.
- Suppose you're working as a data scientist at Facebook. How would you measure the success of private stories on Instagram, where only certain chosen friends can see the story?
- Precision vs Accuracy Vs Recall?
- Error vs variance vs bias?
- False negatives vs false positives? When is either one worse than the other?
- Describe your data science process start to finish?
- Data science vs machine learning vs AI?
- How do you treat null/missing values? Name 3 methodologies.
- How can outlier values be treated?
- What is data normalization? Name 2 normalization methodologies.
- What is the role/importance of data cleaning?
- What are success metrics vs tracking metrics?
- What kind of metric would you make to measure success of a program (marketing) and how do you define them?
- Let's say an app was getting a redesign. How do you know if the redesign was successful?
- We noticed a steep decline in users in a certain area of the world, how would you address/asses?
- What are the two methods used for the calibration in Supervised Learning?
- Which method is frequently used to prevent overfitting?
- What is the difference between heuristic for rule learning and heuristics for decision trees?
- What is Perceptron in Machine Learning?
- Explain the two components of Bayesian logic program?
- What are Bayesian Networks (BN) ?
- Why instance based learning algorithm sometimes referred as Lazy learning algorithm?