Latin Toolkit Documentation

What ?

This piece of software is intended to be used with the 11K Latin Texts produced by David Bamman ( ). It supports only the plain text formats and the metadata github repo CSV file. This has been tested with Python3 only. I welcome any new functions or backward compatibility support.

How to install ?

  • With development version:
    • Clone the repository : git clone
    • Go to the directory : cd archives_org_latin_toolkit
    • Install the source with develop option : python install
  • With pip:
    • Install from pip : pip install archives_org_latin_toolkit


The following example should run with the data in tests/test_data. The example can be run with python

# We import the main classes from the module
from archives_org_latin_toolkit import Repo, Metadata
from pprint import pprint

# We initiate a Metadata object and a Repo object
metadata = Metadata("./test/test_data/latin_metadata.csv")
# We want the text to be set in lowercase
repo = Repo("./test/test_data/archive_org_latin/", metadata=metadata, lowercase=True)

# We define a list of token we want to search for
tokens = ["ecclesiastico", "ecclesia", "ecclesiis", """]

# We instantiate a result storage
results = []

# We iter over text having those tokens :
# Note that we need to "unzip" the list
for text_matching in repo.find(*tokens):

    # For each text, we iter over embeddings found in the text
    # We want 3 words left, 3 words right,
    # and we want to keep the original token (Default behaviour)
    for embedding in text_matching.find_embedding(*tokens, window=3, ignore_center=False):
        # We add it to the results

# We print the result (list of list of strings)