Archives.org Latin Toolkit Documentation¶
Classes¶
-
class
archives_org_latin_toolkit.
Metadata
(csv_file)[source]¶ Bases:
object
Metadata object for a file
Parameters: csv_file (str) – Path to the CSV file to parse
-
class
archives_org_latin_toolkit.
Text
(file, metadata=None, lowercase=False)[source]¶ Bases:
object
Text reading object for archive_org
Parameters: Variables: -
clean
¶ Clean version of the text : normalized space, remove new line, dehyphenize, remove punctuation and number.
-
composed
¶
-
find_embedding
(*strings, window=50, ignore_center=False, memory_efficient=True)[source]¶ Check if given string is in the file
Parameters: - strings – Strings as multiple arguments
- window – Number of lines to retrieve
- ignore_center – Remove the word found from the embedding
-
has_strings
(*strings)[source]¶ Check if given string is in the file
Parameters: strings – Strings as multiple arguments Returns: If found, return True Return type: bool
-
name
¶
-
raw
¶
-
-
class
archives_org_latin_toolkit.
Repo
(directory, metadata=None, lowercase=False)[source]¶ Bases:
object
Repo reading object for archive_org
Parameters:
Helpers¶
-
archives_org_latin_toolkit.
period
(x)[source]¶ Parse a period in metadata. If there is multiple dates, returns the mean
Parameters: x (str) – Value to parse Returns: Parsed numeral Return type: int
-
archives_org_latin_toolkit.
bce
(x)[source]¶ Format A BCE string
Parameters: x (str) – Value to parse Returns: Parsed numeral Return type: str
-
archives_org_latin_toolkit.
__window__
(array, window, i)[source]¶ Compute embedding using i
Parameters: - strings –
- window – Number of word to take left, then right [ len(result) = (2*window)+1 ]
- i – Index of the word
- memory_efficient (bool) – Drop the content of files to avoid filling the ram with unused content
Returns: List of words