Could you please provide additional sample data for the following data, as well as how these data were generated?
DATASET_PATH = [
("/mnt/e/ai/dataset/pt/zh", 20 * 10000),
("/mnt/e/ai/dataset/pt/zh_r18_pixiv", 20 * 10000),
("/mnt/e/ai/dataset/pt/en", 30 * 10000),
("/mnt/e/ai/dataset/pt/en_r18_visual_novels", 10 * 10000),
("/mnt/e/ai/dataset/pt/ja", 40 * 10000),
("/mnt/e/ai/dataset/pt/ja_r18", 32.5 * 10000),
("/mnt/e/ai/dataset/pt/ja_r18_rpg", 7.5 * 10000),
("/mnt/e/ai/dataset/pt/ko", 20 * 10000),
("/mnt/e/ai/dataset/pt/ko_web", 20 * 10000),
# ("/mnt/e/ai/dataset/pt/zh_cc100", 800 * 10000),
# ("/mnt/e/ai/dataset/pt/zh_cc100_tw", 400 * 10000),
# ("/mnt/e/ai/dataset/pt/en_cc100", 800 * 10000),
# ("/mnt/e/ai/dataset/pt/ja_cc100_izumi_lab", 1200 * 10000),
# ("/mnt/e/ai/dataset/pt/ko_cc100", 800 * 10000),
]
Could you please provide additional sample data for the following data, as well as how these data were generated?