Dataset Splitting
A dataset has to be split into one for training a ML model, and one to test/evaluate it.
Random splitting … use specific ratio for splitting
- i.e 30/70 ratio for testing/training data
- 10% of user data will be used for testing
- Given k: k items from each user will be used for training
- All but k: k items from each user will be used for testing
Time based splitting: choose Cut-Off point in time to separate testing/training data
- Global: fixed point in time for all users
- Given-n: train with the first n items, test with the rest
K-Fold Cross Validation: split dataset in (1/k) different fragments
- Evaluation takes place k times, where each fragment is used once for testing
- k often is 5 or 10