SRC_LANG = "de"
DST_LANG = "en"
DATA_DIR = "data"
SRC_FILE = "die Stadt.txt"
AUDIO_DIR = "data/audio"Introduction
This notebook demonstrates how to create Anki flashcards from a list of words. The example uses a list of German words related to “Die Stadt” (The City) and translates them into English. It also generates audio files for the words using Google Text-to-Speech.
Install Required Libraries
!pip install pandas googletrans gtts genankiSet Parameters
Load Words List
Here we load the list of words from a text file. The file should contain one word per line, and we will remove any empty lines.
import pandas as pd
with open(f"{DATA_DIR}/{SRC_FILE}", "r") as f:
lines = f.readlines()
lines = [line.strip() for line in lines if line.strip()] # Remove empty lines
df = pd.DataFrame(lines, columns=["Word"])
df.tail(5)| Word | |
|---|---|
| 113 | das Kreuzfahrtschiff / die Kreuzfahrtschiffe |
| 114 | zu Fuß |
| 115 | die Fahrkarte / die Fahrkarten |
| 116 | der Fahrplan / die Fahrpläne |
| 117 | die Endstation / die Endstationen |
In the particular example, the words are in the format “Word / Plural”. For example, “die Stadt / die Städte” means “the city / the cities (plural)”. Let’s split singular and plural forms into separate rows.
df["Word"] = df["Word"].str.split(" / ")
df = df.explode("Word").reset_index(drop=True)
df.tail(5)| Word | |
|---|---|
| 228 | die Fahrkarten |
| 229 | der Fahrplan |
| 230 | die Fahrpläne |
| 231 | die Endstation |
| 232 | die Endstationen |
Translate Words
We will use the googletrans library to translate the words from German to English. The library provides an asynchronous interface for translation, which is useful for bulk processing.
from googletrans import Translator
async def translate_bulk(texts: list):
async with Translator() as translator:
translations = await translator.translate(texts, src=SRC_LANG, dest=DST_LANG)
return [translation.text for translation in translations]
await translate_bulk(
["die Polizei", "die Motorräder", "die Krankenhäuser", "die Bahnhöfe", "die Flugzeuge"]
)['the police',
'The motorcycles',
'The hospitals',
'The train stations',
'The aircraft']
In the next chunk of code, we will create list of words to translate, then will apply the translate_bulk function to this list.
texts = df["Word"].to_list()
translations = await translate_bulk(texts)
translations[:5]['the city', 'the cities', 'the village', 'The villages', 'the street']
The googletrans library employs undocumented Google Translate API, which may lead to rate limiting or blocking. We will hope that it will work for our case. If you encounter issues, consider using a paid translation service or API.
Next, we will add the translations to the DataFrame.
df["Translation"] = translationsGenerate Audio Files
We will use the gtts library to generate audio files for the words. This library uses Google Text-to-Speech API and it’s free. The generated audio files will be saved in the AUDIO_DIR directory. The filenames will be generated using a hash of the word.
from gtts import gTTS
import os
import hashlib
def gen_audio(sentence):
h = hashlib.shake_128(sentence.encode()).hexdigest(6)
filename = f"{h}.mp3"
if filename in [f for f in os.listdir(AUDIO_DIR)]:
return filename
try:
gTTS(text=sentence, lang=SRC_LANG, slow=True).save(f"{AUDIO_DIR}/{filename}")
except Exception as e:
return None
return filename
sample_audio = gen_audio("die Krankenhäuser")
print(f"Audio file saved as: {sample_audio}")Audio file saved as: 308d817b87e2.mp3
The gen_audio function generates a hash of the word and checks if the audio file already exists in the AUDIO_DIR. If it does, it returns the filename. If not, it generates the audio file and saves it.
Add Audio File Paths to DataFrame
df["Audio"] = df["Word"].apply(gen_audio)Shuffle DataFrame
We will shuffle the DataFrame to randomize the order of the flashcards.
df = df.sample(frac=1).reset_index(drop=True)Set Up Anki Deck and Model
We will use the genanki library to create an Anki deck and model. The model defines the structure of the flashcards, while the deck contains the flashcards themselves. To get unique IDs for the model and deck, we will use random numbers. We will set seed for reproducibility.
import random
random.seed(42)
MODEL_NAME = "Vocabulary"
DECK_NAME = "Die Stadt"
MODEL_ID = random.randrange(1 << 30, 1 << 31)
DECK_ID = random.randrange(1 << 30, 1 << 31)Add Cards to Anki Deck
Next, will create a model for the flashcards and add the cards to the Anki deck.
import genanki
my_model = genanki.Model(
MODEL_ID,
MODEL_NAME,
fields=[
{"name": "Question"},
{"name": "Answer"},
{"name": "Audio"},
],
templates=[
{
"name": "{{Question}}",
"qfmt": '<div class="head">{{Question}}</div>',
"afmt": '<div class="head">{{Question}}</div><hr id="answer"> \
<div class="head">{{Answer}}</div> {{Audio}}',
},
],
css="""
.head {font-size: x-large;}
.spot {text-decoration: underline;}
.sentence {font-style: italic; font-size: normal!important;}
""",
)
my_deck = genanki.Deck(
DECK_ID,
DECK_NAME,
)
for i, row in df.iterrows():
my_note = genanki.Note(
model=my_model,
fields=[
row["Translation"],
row["Word"],
f"[sound:{row['Audio']}]",
],
)
my_deck.add_note(my_note)Export Anki Deck
Finally, we will export the Anki deck to a file. The file will be saved in the DATA_DIR directory with the name Die Stadt.apkg. The audio files will be included in the package.
my_package = genanki.Package(my_deck)
my_package.media_files = [f"{AUDIO_DIR}/{filename}" for filename in df["Audio"].values]
my_package.write_to_file(f"{DATA_DIR}/{DECK_NAME}.apkg")Conclusion
In this notebook, we learned how to create Anki flashcards from a list of words. We used the googletrans library to translate the words from German to English and the gtts library to generate audio files for the words. Finally, we used the genanki library to create an Anki deck and export it to a file. The generated Anki deck can be imported into Anki app and used for studying the vocabulary.
References
Next Steps
In the next notebook we will create Anki flashcards from an arbitrary document like an article or a book.