= "de"
SRC_LANG = "en"
DST_LANG = "data"
DATA_DIR = "die Stadt.txt"
SRC_FILE = "data/audio" AUDIO_DIR
Introduction
This notebook demonstrates how to create Anki flashcards from a list of words. The example uses a list of German words related to “Die Stadt” (The City) and translates them into English. It also generates audio files for the words using Google Text-to-Speech.
Install Required Libraries
!pip install pandas googletrans gtts genanki
Set Parameters
Load Words List
Here we load the list of words from a text file. The file should contain one word per line, and we will remove any empty lines.
import pandas as pd
with open(f"{DATA_DIR}/{SRC_FILE}", "r") as f:
= f.readlines()
lines = [line.strip() for line in lines if line.strip()] # Remove empty lines
lines
= pd.DataFrame(lines, columns=["Word"])
df 5) df.tail(
Word | |
---|---|
113 | das Kreuzfahrtschiff / die Kreuzfahrtschiffe |
114 | zu Fuß |
115 | die Fahrkarte / die Fahrkarten |
116 | der Fahrplan / die Fahrpläne |
117 | die Endstation / die Endstationen |
In the particular example, the words are in the format “Word / Plural”. For example, “die Stadt / die Städte” means “the city / the cities (plural)”. Let’s split singular and plural forms into separate rows.
"Word"] = df["Word"].str.split(" / ")
df[= df.explode("Word").reset_index(drop=True)
df 5) df.tail(
Word | |
---|---|
228 | die Fahrkarten |
229 | der Fahrplan |
230 | die Fahrpläne |
231 | die Endstation |
232 | die Endstationen |
Translate Words
We will use the googletrans
library to translate the words from German to English. The library provides an asynchronous interface for translation, which is useful for bulk processing.
from googletrans import Translator
async def translate_bulk(texts: list):
async with Translator() as translator:
= await translator.translate(texts, src=SRC_LANG, dest=DST_LANG)
translations return [translation.text for translation in translations]
await translate_bulk(
"die Polizei", "die Motorräder", "die Krankenhäuser", "die Bahnhöfe", "die Flugzeuge"]
[ )
['the police',
'The motorcycles',
'The hospitals',
'The train stations',
'The aircraft']
In the next chunk of code, we will create list of words to translate, then will apply the translate_bulk
function to this list.
= df["Word"].to_list()
texts
= await translate_bulk(texts)
translations
5] translations[:
['the city', 'the cities', 'the village', 'The villages', 'the street']
The googletrans
library employs undocumented Google Translate API, which may lead to rate limiting or blocking. We will hope that it will work for our case. If you encounter issues, consider using a paid translation service or API.
Next, we will add the translations to the DataFrame.
"Translation"] = translations df[
Generate Audio Files
We will use the gtts
library to generate audio files for the words. This library uses Google Text-to-Speech API and it’s free. The generated audio files will be saved in the AUDIO_DIR
directory. The filenames will be generated using a hash of the word.
from gtts import gTTS
import os
import hashlib
def gen_audio(sentence):
= hashlib.shake_128(sentence.encode()).hexdigest(6)
h = f"{h}.mp3"
filename if filename in [f for f in os.listdir(AUDIO_DIR)]:
return filename
try:
=sentence, lang=SRC_LANG, slow=True).save(f"{AUDIO_DIR}/{filename}")
gTTS(textexcept Exception as e:
return None
return filename
= gen_audio("die Krankenhäuser")
sample_audio print(f"Audio file saved as: {sample_audio}")
Audio file saved as: 308d817b87e2.mp3
The gen_audio
function generates a hash of the word and checks if the audio file already exists in the AUDIO_DIR
. If it does, it returns the filename. If not, it generates the audio file and saves it.
Add Audio File Paths to DataFrame
"Audio"] = df["Word"].apply(gen_audio) df[
Shuffle DataFrame
We will shuffle the DataFrame to randomize the order of the flashcards.
= df.sample(frac=1).reset_index(drop=True) df
Set Up Anki Deck and Model
We will use the genanki
library to create an Anki deck and model. The model defines the structure of the flashcards, while the deck contains the flashcards themselves. To get unique IDs for the model and deck, we will use random numbers. We will set seed for reproducibility.
import random
42)
random.seed(
= "Vocabulary"
MODEL_NAME = "Die Stadt"
DECK_NAME = random.randrange(1 << 30, 1 << 31)
MODEL_ID = random.randrange(1 << 30, 1 << 31) DECK_ID
Add Cards to Anki Deck
Next, will create a model for the flashcards and add the cards to the Anki deck.
import genanki
= genanki.Model(
my_model
MODEL_ID,
MODEL_NAME,=[
fields"name": "Question"},
{"name": "Answer"},
{"name": "Audio"},
{
],=[
templates
{"name": "{{Question}}",
"qfmt": '<div class="head">{{Question}}</div>',
"afmt": '<div class="head">{{Question}}</div><hr id="answer"> \
<div class="head">{{Answer}}</div> {{Audio}}',
},
],="""
css .head {font-size: x-large;}
.spot {text-decoration: underline;}
.sentence {font-style: italic; font-size: normal!important;}
""",
)
= genanki.Deck(
my_deck
DECK_ID,
DECK_NAME,
)
for i, row in df.iterrows():
= genanki.Note(
my_note =my_model,
model=[
fields"Translation"],
row["Word"],
row[f"[sound:{row['Audio']}]",
],
) my_deck.add_note(my_note)
Export Anki Deck
Finally, we will export the Anki deck to a file. The file will be saved in the DATA_DIR
directory with the name Die Stadt.apkg. The audio files will be included in the package.
= genanki.Package(my_deck)
my_package = [f"{AUDIO_DIR}/{filename}" for filename in df["Audio"].values]
my_package.media_files f"{DATA_DIR}/{DECK_NAME}.apkg") my_package.write_to_file(
Conclusion
In this notebook, we learned how to create Anki flashcards from a list of words. We used the googletrans
library to translate the words from German to English and the gtts
library to generate audio files for the words. Finally, we used the genanki
library to create an Anki deck and export it to a file. The generated Anki deck can be imported into Anki app and used for studying the vocabulary.
References
Next Steps
In the next notebook we will create Anki flashcards from an arbitrary document like an article or a book.