From 080c8969434a6b68c604bf9c5076cb503972d820 Mon Sep 17 00:00:00 2001 From: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> Date: Thu, 2 May 2024 14:12:40 +0900 Subject: [PATCH 1/3] add: update ko_toctree --- chapters/ko/_toctree.yml | 73 +++++++++++++++++++++++++++ chapters/ko/chapter6/introduction.mdx | 29 +++++++++++ 2 files changed, 102 insertions(+) create mode 100644 chapters/ko/chapter6/introduction.mdx diff --git a/chapters/ko/_toctree.yml b/chapters/ko/_toctree.yml index ab3a6105..3cecc0b3 100644 --- a/chapters/ko/_toctree.yml +++ b/chapters/ko/_toctree.yml @@ -33,6 +33,8 @@ title: 파이프라인을 이용한 오디오 분류 - local: chapter2/asr_pipeline title: 파이프라인을 이용한 자동 음성 인식 + - local: chapter2/tts_pipeline + title: (번역중) Audio generation with a pipeline - local: chapter2/hands_on title: 실습 과제 @@ -52,6 +54,77 @@ - local: chapter3/supplemental_reading title: 보충자료 및 리소스 +- title: (번역중) Unit 4. Build a music genre classifier + sections: + - local: chapter4/introduction + title: (번역중) What you'll learn and what you'll build + - local: chapter4/classification_models + title: (번역중) Pre-trained models for audio classification + - local: chapter4/fine-tuning + title: (번역중) Fine-tuning a model for music classification + - local: chapter4/demo + title: (번역중) Build a demo with Gradio + - local: chapter4/hands_on + title: (번역중) Hands-on exercise + +- title: (번역중) Unit 5. Automatic Speech Recognition + sections: + - local: chapter5/introduction + title: (번역중) What you'll learn and what you'll build + - local: chapter5/asr_models + title: (번역중) Pre-trained models for speech recognition + - local: chapter5/choosing_dataset + title: (번역중) Choosing a dataset + - local: chapter5/evaluation + title: (번역중) Evaluation and metrics for speech recognition + - local: chapter5/fine-tuning + title: (번역중) How to fine-tune an ASR system with the Trainer API + - local: chapter5/demo + title: (번역중) Building a demo + - local: chapter5/hands_on + title: (번역중) Hands-on exercise + - local: chapter5/supplemental_reading + title: (번역중) Supplemental reading and resources + +- title: (번역중) Unit 6. From text to speech + sections: + - local: chapter6/introduction + title: (번역중) What you'll learn and what you'll build + - local: chapter6/tts_datasets + title: (번역중) Text-to-speech datasets + - local: chapter6/pre-trained_models + title: (번역중) Pre-trained models for text-to-speech + - local: chapter6/fine-tuning + title: (번역중) Fine-tuning SpeechT5 + - local: chapter6/evaluation + title: (번역중) Evaluating text-to-speech models + - local: chapter6/hands_on + title: (번역중) Hands-on exercise + - local: chapter6/supplemental_reading + title: (번역중) Supplemental reading and resources + +- title: (번역중) Unit 7. Putting it all together + sections: + - local: chapter7/introduction + title: (번역중) What you'll learn and what you'll build + - local: chapter7/speech-to-speech + title: (번역중) Speech-to-speech translation + - local: chapter7/voice-assistant + title: (번역중) Creating a voice assistant + - local: chapter7/transcribe-meeting + title: (번역중) Transcribe a meeting + - local: chapter7/hands_on + title: (번역중) Hands-on exercise + - local: chapter7/supplemental_reading + title: (번역중) Supplemental reading and resources + +- title: (번역중) Unit 8. Finish line + sections: + - local: chapter8/introduction + title: (번역중) Congratulations! + - local: chapter8/certification + title: (번역중) Get your certificate of completion + - title: 코스 이벤트 sections: - local: events/introduction diff --git a/chapters/ko/chapter6/introduction.mdx b/chapters/ko/chapter6/introduction.mdx new file mode 100644 index 00000000..a7bb0c66 --- /dev/null +++ b/chapters/ko/chapter6/introduction.mdx @@ -0,0 +1,29 @@ +# Unit 6. From text to speech + +In the previous unit, you learned how to use Transformers to convert spoken speech into text. Now let's flip the +script and see how you can transform a given input text into an audio output that sounds like human speech. + +The task we will study in this unit is called "Text-to-speech" (TTS). Models capable of converting text into audible +human speech have a wide range of potential applications: + +* Assistive apps: think about tools that can leverage these models to enable visually-impaired people to access digital content through the medium of sound. +* Audiobook narration: converting written books into audio form makes literature more accessible to individuals who prefer to listen or have difficulty with reading. +* Virtual assistants: TTS models are a fundamental component of virtual assistants like Siri, Google Assistant, or Amazon Alexa. Once they have used a classification model to catch the wake word, and used ASR model to process your request, they can use a TTS model to respond to your inquiry. +* Entertainment, gaming and language learning: give voice to your NPC characters, narrate game events, or help language learners with examples of correct pronunciation and intonation of words and phrases. + +These are just a few examples, and I am sure you can imagine many more! However, with so much power comes the responsibility, +and it is important to highlight that TTS models have the potential to be used for malicious purposes. +For example, with sufficient voice samples, malicious actors could potentially create convincing fake audio recordings, leading to +the unauthorized use of someone's voice for fraudulent purposes or manipulation. If you plan to collect data for fine-tuning +your own systems, carefully consider privacy and informed consent. Voice data should be obtained with explicit consent +from individuals, ensuring they understand the purpose, scope, and potential risks associated with their voice being used +in a TTS system. Please use text-to-speech responsibly. + +## What you'll learn and what you'll build + +In this unit we will talk about: + +* [Datasets suitable for text-to-speech training](tts_datasets) +* [Pre-trained models for text-to-speech](pre-trained_models) +* [Fine-tuning SpeechT5 on a new language](fine-tuning) +* [Evaluating TTS models](evaluation) From 96752aa8d8fdc948daa7fde9e2c726b08c59484d Mon Sep 17 00:00:00 2001 From: heuristicwave <31366038+heuristicwave@users.noreply.github.com> Date: Thu, 2 May 2024 14:17:15 +0900 Subject: [PATCH 2/3] Revert "add: update ko_toctree" This reverts commit 080c8969434a6b68c604bf9c5076cb503972d820. --- chapters/ko/_toctree.yml | 73 --------------------------- chapters/ko/chapter6/introduction.mdx | 29 ----------- 2 files changed, 102 deletions(-) delete mode 100644 chapters/ko/chapter6/introduction.mdx diff --git a/chapters/ko/_toctree.yml b/chapters/ko/_toctree.yml index 3cecc0b3..ab3a6105 100644 --- a/chapters/ko/_toctree.yml +++ b/chapters/ko/_toctree.yml @@ -33,8 +33,6 @@ title: 파이프라인을 이용한 오디오 분류 - local: chapter2/asr_pipeline title: 파이프라인을 이용한 자동 음성 인식 - - local: chapter2/tts_pipeline - title: (번역중) Audio generation with a pipeline - local: chapter2/hands_on title: 실습 과제 @@ -54,77 +52,6 @@ - local: chapter3/supplemental_reading title: 보충자료 및 리소스 -- title: (번역중) Unit 4. Build a music genre classifier - sections: - - local: chapter4/introduction - title: (번역중) What you'll learn and what you'll build - - local: chapter4/classification_models - title: (번역중) Pre-trained models for audio classification - - local: chapter4/fine-tuning - title: (번역중) Fine-tuning a model for music classification - - local: chapter4/demo - title: (번역중) Build a demo with Gradio - - local: chapter4/hands_on - title: (번역중) Hands-on exercise - -- title: (번역중) Unit 5. Automatic Speech Recognition - sections: - - local: chapter5/introduction - title: (번역중) What you'll learn and what you'll build - - local: chapter5/asr_models - title: (번역중) Pre-trained models for speech recognition - - local: chapter5/choosing_dataset - title: (번역중) Choosing a dataset - - local: chapter5/evaluation - title: (번역중) Evaluation and metrics for speech recognition - - local: chapter5/fine-tuning - title: (번역중) How to fine-tune an ASR system with the Trainer API - - local: chapter5/demo - title: (번역중) Building a demo - - local: chapter5/hands_on - title: (번역중) Hands-on exercise - - local: chapter5/supplemental_reading - title: (번역중) Supplemental reading and resources - -- title: (번역중) Unit 6. From text to speech - sections: - - local: chapter6/introduction - title: (번역중) What you'll learn and what you'll build - - local: chapter6/tts_datasets - title: (번역중) Text-to-speech datasets - - local: chapter6/pre-trained_models - title: (번역중) Pre-trained models for text-to-speech - - local: chapter6/fine-tuning - title: (번역중) Fine-tuning SpeechT5 - - local: chapter6/evaluation - title: (번역중) Evaluating text-to-speech models - - local: chapter6/hands_on - title: (번역중) Hands-on exercise - - local: chapter6/supplemental_reading - title: (번역중) Supplemental reading and resources - -- title: (번역중) Unit 7. Putting it all together - sections: - - local: chapter7/introduction - title: (번역중) What you'll learn and what you'll build - - local: chapter7/speech-to-speech - title: (번역중) Speech-to-speech translation - - local: chapter7/voice-assistant - title: (번역중) Creating a voice assistant - - local: chapter7/transcribe-meeting - title: (번역중) Transcribe a meeting - - local: chapter7/hands_on - title: (번역중) Hands-on exercise - - local: chapter7/supplemental_reading - title: (번역중) Supplemental reading and resources - -- title: (번역중) Unit 8. Finish line - sections: - - local: chapter8/introduction - title: (번역중) Congratulations! - - local: chapter8/certification - title: (번역중) Get your certificate of completion - - title: 코스 이벤트 sections: - local: events/introduction diff --git a/chapters/ko/chapter6/introduction.mdx b/chapters/ko/chapter6/introduction.mdx deleted file mode 100644 index a7bb0c66..00000000 --- a/chapters/ko/chapter6/introduction.mdx +++ /dev/null @@ -1,29 +0,0 @@ -# Unit 6. From text to speech - -In the previous unit, you learned how to use Transformers to convert spoken speech into text. Now let's flip the -script and see how you can transform a given input text into an audio output that sounds like human speech. - -The task we will study in this unit is called "Text-to-speech" (TTS). Models capable of converting text into audible -human speech have a wide range of potential applications: - -* Assistive apps: think about tools that can leverage these models to enable visually-impaired people to access digital content through the medium of sound. -* Audiobook narration: converting written books into audio form makes literature more accessible to individuals who prefer to listen or have difficulty with reading. -* Virtual assistants: TTS models are a fundamental component of virtual assistants like Siri, Google Assistant, or Amazon Alexa. Once they have used a classification model to catch the wake word, and used ASR model to process your request, they can use a TTS model to respond to your inquiry. -* Entertainment, gaming and language learning: give voice to your NPC characters, narrate game events, or help language learners with examples of correct pronunciation and intonation of words and phrases. - -These are just a few examples, and I am sure you can imagine many more! However, with so much power comes the responsibility, -and it is important to highlight that TTS models have the potential to be used for malicious purposes. -For example, with sufficient voice samples, malicious actors could potentially create convincing fake audio recordings, leading to -the unauthorized use of someone's voice for fraudulent purposes or manipulation. If you plan to collect data for fine-tuning -your own systems, carefully consider privacy and informed consent. Voice data should be obtained with explicit consent -from individuals, ensuring they understand the purpose, scope, and potential risks associated with their voice being used -in a TTS system. Please use text-to-speech responsibly. - -## What you'll learn and what you'll build - -In this unit we will talk about: - -* [Datasets suitable for text-to-speech training](tts_datasets) -* [Pre-trained models for text-to-speech](pre-trained_models) -* [Fine-tuning SpeechT5 on a new language](fine-tuning) -* [Evaluating TTS models](evaluation) From 6e56443bf13ff940455a7b413425e99baa0934b3 Mon Sep 17 00:00:00 2001 From: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> Date: Thu, 2 May 2024 14:24:42 +0900 Subject: [PATCH 3/3] docs: ko: _toctree.yml --- chapters/ko/_toctree.yml | 73 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/chapters/ko/_toctree.yml b/chapters/ko/_toctree.yml index ab3a6105..145f4245 100644 --- a/chapters/ko/_toctree.yml +++ b/chapters/ko/_toctree.yml @@ -33,6 +33,8 @@ title: 파이프라인을 이용한 오디오 분류 - local: chapter2/asr_pipeline title: 파이프라인을 이용한 자동 음성 인식 + - local: chapter2/tts_pipeline + title: (번역 중) Audio generation with a pipeline - local: chapter2/hands_on title: 실습 과제 @@ -52,6 +54,77 @@ - local: chapter3/supplemental_reading title: 보충자료 및 리소스 +- title: (번역 중) Unit 4. Build a music genre classifier + sections: + - local: chapter4/introduction + title: (번역 중) What you'll learn and what you'll build + - local: chapter4/classification_models + title: (번역 중) Pre-trained models for audio classification + - local: chapter4/fine-tuning + title: (번역 중) Fine-tuning a model for music classification + - local: chapter4/demo + title: (번역 중) Build a demo with Gradio + - local: chapter4/hands_on + title: (번역 중) Hands-on exercise + +- title: (번역 중) Unit 5. Automatic Speech Recognition + sections: + - local: chapter5/introduction + title: (번역 중) What you'll learn and what you'll build + - local: chapter5/asr_models + title: (번역 중) Pre-trained models for speech recognition + - local: chapter5/choosing_dataset + title: (번역 중) Choosing a dataset + - local: chapter5/evaluation + title: (번역 중) Evaluation and metrics for speech recognition + - local: chapter5/fine-tuning + title: (번역 중) How to fine-tune an ASR system with the Trainer API + - local: chapter5/demo + title: (번역 중) Building a demo + - local: chapter5/hands_on + title: (번역 중) Hands-on exercise + - local: chapter5/supplemental_reading + title: (번역 중) Supplemental reading and resources +# +- title: (번역 중) Unit 6. From text to speech + sections: + - local: chapter6/introduction + title: (번역 중) What you'll learn and what you'll build + - local: chapter6/tts_datasets + title: (번역 중) Text-to-speech datasets + - local: chapter6/pre-trained_models + title: (번역 중) Pre-trained models for text-to-speech + - local: chapter6/fine-tuning + title: (번역 중) Fine-tuning SpeechT5 + - local: chapter6/evaluation + title: (번역 중) Evaluating text-to-speech models + - local: chapter6/hands_on + title: (번역 중) Hands-on exercise + - local: chapter6/supplemental_reading + title: (번역 중) Supplemental reading and resources + +- title: (번역 중) Unit 7. Putting it all together + sections: + - local: chapter7/introduction + title: (번역 중) What you'll learn and what you'll build + - local: chapter7/speech-to-speech + title: (번역 중) Speech-to-speech translation + - local: chapter7/voice-assistant + title: (번역 중) Creating a voice assistant + - local: chapter7/transcribe-meeting + title: (번역 중) Transcribe a meeting + - local: chapter7/hands_on + title: (번역 중) Hands-on exercise + - local: chapter7/supplemental_reading + title: (번역 중) Supplemental reading and resources + +- title: (번역 중) Unit 8. Finish line + sections: + - local: chapter8/introduction + title: (번역 중) Congratulations! + - local: chapter8/certification + title: (번역 중) Get your certificate of completion + - title: 코스 이벤트 sections: - local: events/introduction