Creating Anki Flashcards From List of Words

Introduction

This notebook demonstrates how to create Anki flashcards from a list of words. The example uses a list of German words related to “Die Stadt” (The City) and translates them into English. It also generates audio files for the words using Google Text-to-Speech.

Install Required Libraries

!pip install pandas googletrans gtts genanki

Set Parameters

SRC_LANG = "de"
DST_LANG = "en"
DATA_DIR = "data"
SRC_FILE = "die Stadt.txt"
AUDIO_DIR = "data/audio"

Load Words List

Here we load the list of words from a text file. The file should contain one word per line, and we will remove any empty lines.

import pandas as pd

with open(f"{DATA_DIR}/{SRC_FILE}", "r") as f:
    lines = f.readlines()
lines = [line.strip() for line in lines if line.strip()]  # Remove empty lines

df = pd.DataFrame(lines, columns=["Word"])
df.tail(5)

	Word
113	das Kreuzfahrtschiff / die Kreuzfahrtschiffe
114	zu Fuß
115	die Fahrkarte / die Fahrkarten
116	der Fahrplan / die Fahrpläne
117	die Endstation / die Endstationen

In the particular example, the words are in the format “Word / Plural”. For example, “die Stadt / die Städte” means “the city / the cities (plural)”. Let’s split singular and plural forms into separate rows.

df["Word"] = df["Word"].str.split(" / ")
df = df.explode("Word").reset_index(drop=True)
df.tail(5)

	Word
228	die Fahrkarten
229	der Fahrplan
230	die Fahrpläne
231	die Endstation
232	die Endstationen

Translate Words

We will use the googletrans library to translate the words from German to English. The library provides an asynchronous interface for translation, which is useful for bulk processing.

from googletrans import Translator

async def translate_bulk(texts: list):
    async with Translator() as translator:
        translations = await translator.translate(texts, src=SRC_LANG, dest=DST_LANG)
        return [translation.text for translation in translations]


await translate_bulk(
    ["die Polizei", "die Motorräder", "die Krankenhäuser", "die Bahnhöfe", "die Flugzeuge"]
)

['the police',
 'The motorcycles',
 'The hospitals',
 'The train stations',
 'The aircraft']

In the next chunk of code, we will create list of words to translate, then will apply the translate_bulk function to this list.

texts = df["Word"].to_list()

translations = await translate_bulk(texts)

translations[:5]

['the city', 'the cities', 'the village', 'The villages', 'the street']

The googletrans library employs undocumented Google Translate API, which may lead to rate limiting or blocking. We will hope that it will work for our case. If you encounter issues, consider using a paid translation service or API.

Next, we will add the translations to the DataFrame.

df["Translation"] = translations

Generate Audio Files

We will use the gtts library to generate audio files for the words. This library uses Google Text-to-Speech API and it’s free. The generated audio files will be saved in the AUDIO_DIR directory. The filenames will be generated using a hash of the word.

from gtts import gTTS
import os
import hashlib

def gen_audio(sentence):
    h = hashlib.shake_128(sentence.encode()).hexdigest(6)
    filename = f"{h}.mp3"
    if filename in [f for f in os.listdir(AUDIO_DIR)]:
        return filename
    try:
        gTTS(text=sentence, lang=SRC_LANG, slow=True).save(f"{AUDIO_DIR}/{filename}")
    except Exception as e:
        return None
    return filename


sample_audio = gen_audio("die Krankenhäuser")
print(f"Audio file saved as: {sample_audio}")

Audio file saved as: 308d817b87e2.mp3

The gen_audio function generates a hash of the word and checks if the audio file already exists in the AUDIO_DIR. If it does, it returns the filename. If not, it generates the audio file and saves it.

Add Audio File Paths to DataFrame

df["Audio"] = df["Word"].apply(gen_audio)

Shuffle DataFrame

We will shuffle the DataFrame to randomize the order of the flashcards.

df = df.sample(frac=1).reset_index(drop=True)

Set Up Anki Deck and Model

We will use the genanki library to create an Anki deck and model. The model defines the structure of the flashcards, while the deck contains the flashcards themselves. To get unique IDs for the model and deck, we will use random numbers. We will set seed for reproducibility.

import random

random.seed(42)

MODEL_NAME = "Vocabulary"
DECK_NAME = "Die Stadt"
MODEL_ID = random.randrange(1 << 30, 1 << 31)
DECK_ID = random.randrange(1 << 30, 1 << 31)

Add Cards to Anki Deck

Next, will create a model for the flashcards and add the cards to the Anki deck.

import genanki

my_model = genanki.Model(
    MODEL_ID,
    MODEL_NAME,
    fields=[
        {"name": "Question"},
        {"name": "Answer"},
        {"name": "Audio"},
    ],
    templates=[
        {
            "name": "{{Question}}",
            "qfmt": '<div class="head">{{Question}}</div>',
            "afmt": '<div class="head">{{Question}}</div><hr id="answer"> \
                <div class="head">{{Answer}}</div> {{Audio}}',
        },
    ],
    css="""
        .head {font-size: x-large;} 
        .spot {text-decoration: underline;} 
        .sentence {font-style: italic; font-size: normal!important;}
    """,
)

my_deck = genanki.Deck(
    DECK_ID,
    DECK_NAME,
)

for i, row in df.iterrows():
    my_note = genanki.Note(
        model=my_model,
        fields=[
            row["Translation"],
            row["Word"],
            f"[sound:{row['Audio']}]",
        ],
    )
    my_deck.add_note(my_note)

Export Anki Deck

Finally, we will export the Anki deck to a file. The file will be saved in the DATA_DIR directory with the name Die Stadt.apkg. The audio files will be included in the package.

my_package = genanki.Package(my_deck)
my_package.media_files = [f"{AUDIO_DIR}/{filename}" for filename in df["Audio"].values]
my_package.write_to_file(f"{DATA_DIR}/{DECK_NAME}.apkg")

Conclusion

In this notebook, we learned how to create Anki flashcards from a list of words. We used the googletrans library to translate the words from German to English and the gtts library to generate audio files for the words. Finally, we used the genanki library to create an Anki deck and export it to a file. The generated Anki deck can be imported into Anki app and used for studying the vocabulary.

References

Next Steps

In the next notebook we will create Anki flashcards from an arbitrary document like an article or a book.