Tôi xin giới thiệu một kĩ thuật đánh giá sai số (trung bình). Và việc đánh giá sai số này đc đưa vào ứng dụng trong việc chọn tham số bằng phương pháp gird search method . Cụ thể thì tôi sẽ giới thiệu sau. Và tôi sẽ làm rõ vì sao hướng tiếp cận K-fold cross validation có điểm tốt gì?
K-fold cross validation is used in the field of machine learning to determine how accurately a learning algorithm will be able to predict data that it was not trained on. When using the k-fold method, the training dataset is randomly partitioned into k groups. The learning algorithm is then trained k times, using all of the training set data points except those in the kth group. The form of the algorithm is as follows:
* Divide the training set into k partitions.
* For each k:
* Make T the dataset that contains all training data points except those in the kth group.
* Train the algorithm using T as the training set.
* Test the trained algorithm, using the kth set as the test set. Record the number of errors.
* Report the mean error over all k test sets.
K-fold cross validation is extremely useful, if the correct value of k is chosen. It is less ‘wasteful’ of data than test set cross validation, and less ‘expensive’ than leave-one-out cross validation. In general, if the correct value of k is used, k-fold cross validation provides the best estimate cross validation error.
Unfortunately, there is no theoretically ‘perfect’ way of determining the appropriate k value. Using the value k = 10 seems to be a good rule of thumb, although the true best value differs for each algorithm and each dataset. It is interesting to note that when k is allowed to increase until it is the size of the total dataset, k-fold cross validation behaves identically to leave-one-out cross validation.
Bài báo liên quan link