Tuesday, April 25, 2023
Sequences, series, Zeno's Paradox and Donald Duck
Friday, April 14, 2023
Cross Validation lecture for joint BYU-City Tech undergraduate data science seminar
Today, I delivered a lecture on cross-validation and Bootstrap, which are resampling methods used to evaluate machine learning algorithms' test error. I based my lecture on a resampling chapter from Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani's book, "An Introduction to Statistical Learning: with Applications in R" (2013).
During the lecture, we first discussed the bias-variance trade-off and how the training error cannot accurately predict the testing error as the model's complexity increases. We then explored the performance of various ways of spltting the data, such as splitting the data into two equal parts and performing leave-one-out splitting.
Next, we discussed k-fold cross-validation, which involves splitting the data into K subsets of equal size, and iteratively selecting a validation set and training on the remaining folds K times. The cross-validation score is the weighted average of the mean squared error (MSE) for each fold. We demonstrated an experiment with 10-fold MSE and found that K=5 or 10 is a good compromise in terms of the variance-bias trade-off.
Finally, we discussed an example of two-class classification with 5000 predictors and 50 samples, where the wrong way to perform K-fold CV was to apply it after filter 1, while the right way was to apply CV on both filter 1 and filter 2. We stressed the importance of understanding the correlation between each K-fold split and the variance-bias trade-off in determining the optimal choice of K for the CV folds.
Overall, the lecture was well-received, and there were many interesting questions from colleagues and students. This is a topic that is frequently discussed in data science forums, and it is important to emphasize the importance of balancing the variance-bias trade-off and the impact of data splitting on the results of cross-validation.
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani's book, "An Introduction to Statistical Learning: with Applications in R" (2013). |
Thursday, April 6, 2023
Functions before Spring break!
So here we are, a few days since our last class in the middle of Spring break. I announced that I was pushing the exam to the Wednesday after Spring break. I could perceive the joy after the news.
This Monday 4/3, we continued talking about functions. We stressed key definitions related to functions such as:
- domain
- codomain
- range
- image
- preimage
- one-to-one functions
- onto functions
- bijection
- inverse functions
https://www.teacherspayteachers.com/Product/Spring-Break-Math-Practice-Packet-Common-Core-Aligned-622786 |
4-week summer intense precalculus course...an enthusiastic, refreshing and fun crowd!
I decided to teach a four-week summer class this year. Sometimes I had doubts because I usually use the summer months to do research in a mo...
-
This last class, we focused on solving exponential and logarithmic equations and discussing the properties of log functions to facilitate so...
-
Today was a very fun class. We were able to finally dig into mathematical proofs. Examples such as "if n is an odd number then the sq...
-
Today, I delivered a lecture on cross-validation and Bootstrap, which are resampling methods used to evaluate machine learning algorithms...