QubitPi
commited on
Commit
•
48ed62f
0
Parent(s):
Everything in one commit
Browse files- .gitignore +1 -0
- README.md +49 -0
- ancient-greek-phonemes.txt +274 -0
- convert.py +13 -0
.gitignore
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
.idea/
|
README.md
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Ancient Greek Reader NLP
|
2 |
+
========================
|
3 |
+
|
4 |
+
__A lack of audio content is a major hurdle to learning Ancient Greek__. So I decided to tackle this problem with NLP.
|
5 |
+
|
6 |
+
How Does It Work
|
7 |
+
----------------
|
8 |
+
|
9 |
+
OpenAI will read Ancient Greek text with a modern Greek pronunciation. What's different about OpenAI from other
|
10 |
+
Text-to-Speech tools is that OpenAI is unaffected by the different accents and breathing marks in Ancient Greek. It will
|
11 |
+
simply read the Ancient Greek text in modern pronunciation with the accents in the right places. Studying Ancient Greeek
|
12 |
+
with modern pronunciation is simply not satisfactory for me, though, so I started messing around with the text to see if
|
13 |
+
I could get the pronunciation closer to Erasmian/Attic/whatever we want to call it. We can simply replace letters in the
|
14 |
+
Greek words with Latin letters to try and get what we want.
|
15 |
+
|
16 |
+
Here is an example sentence.
|
17 |
+
|
18 |
+
This is the original text, which OpenAI will read in Modern Greek with no problem:
|
19 |
+
|
20 |
+
> Σόλων ἦν συνετώτατος πάντων τῶν Ἀθηναίων, τὴν γὰρ σοφίαν αὐτοῦ οὐ μόνον οἱ πολῖται ἐθαύμαζον, ἀλλὰ καὶ οἱ ἂλλοι
|
21 |
+
> Ἓλληνες πάντες, πολλοὶ δὲ καὶ τῶν βαρβάρων.
|
22 |
+
|
23 |
+
And here is the same but with letters replaced to try and get OpenAI to read in an "Attic" pronunciation:
|
24 |
+
|
25 |
+
> sόλωn en sunetώtαtος πάntωn tón aθenáiωn, tén γáρ sοφίan autu u μόnon hoi πολítαi eθáuμαζon, aλλá kái hoi áλλοi
|
26 |
+
> Héλλeneς πάnteς, πολλói δé kái tón βαρβάρωn.
|
27 |
+
|
28 |
+
A huge list of [letter replacements](./ancient-greek-phonemes.txt) has been made to try and imitate Attic pronunciation
|
29 |
+
as closely as possible. The result is pretty solid and is close enough to be useful for creating audio files for texts
|
30 |
+
where we don't have any audio recordings.
|
31 |
+
[Checkout this "Attic" pronounciation example](https://qubitpi.github.io/ancient-greek-reader/)
|
32 |
+
|
33 |
+
How to Use It (WIP)
|
34 |
+
-------------------
|
35 |
+
|
36 |
+
Here is a __WIP__ instruction:
|
37 |
+
|
38 |
+
1. Get an accurate Ancient Greek text and save it to a file named __original.txt__.
|
39 |
+
[Perseus Digital Library](https://www.perseus.tufts.edu/hopper/) is a great source.
|
40 |
+
2. Run replacement:
|
41 |
+
|
42 |
+
- `python3 convert.py`
|
43 |
+
|
44 |
+
3. Convert txt to epub. [Calibre](https://calibre-ebook.com/) works pretty well for this.
|
45 |
+
4. Use https://github.com/p0n1/epub_to_audiobook to generate audio
|
46 |
+
|
47 |
+
> [!CAUTION]
|
48 |
+
>
|
49 |
+
> OpenAI [billing](https://platform.openai.com/settings/organization/billing/overview) will apply to the last step
|
ancient-greek-phonemes.txt
ADDED
@@ -0,0 +1,274 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
κ->k
|
2 |
+
ε->e
|
3 |
+
ι->i
|
4 |
+
οῖ->ói
|
5 |
+
τ->t
|
6 |
+
σ->s
|
7 |
+
Τ->t
|
8 |
+
Σ->s
|
9 |
+
οὕ->hú
|
10 |
+
οῦ->u
|
11 |
+
αύ->áu
|
12 |
+
οἴ->ói
|
13 |
+
αν->an
|
14 |
+
ον->on
|
15 |
+
ν->n
|
16 |
+
εἱ->hei
|
17 |
+
ού->ú
|
18 |
+
οὖ->u
|
19 |
+
οὐ->u
|
20 |
+
αῖ->ai
|
21 |
+
αι->ai
|
22 |
+
αί->ái
|
23 |
+
αί->ái
|
24 |
+
αὶ->ái
|
25 |
+
αἱ->ai
|
26 |
+
αυ->au
|
27 |
+
αύ->áu
|
28 |
+
αὺ->áu
|
29 |
+
αῦ->áu
|
30 |
+
ει->e
|
31 |
+
εί->éi
|
32 |
+
εὶ->éi
|
33 |
+
εί->ey
|
34 |
+
ευ->eu
|
35 |
+
εύ->éu
|
36 |
+
εύ->éu
|
37 |
+
εὺ->éu
|
38 |
+
οι->oi
|
39 |
+
οί->ói
|
40 |
+
οί->ói
|
41 |
+
οὶ->ói
|
42 |
+
οἱ->hoi
|
43 |
+
ου->u
|
44 |
+
ού->ú
|
45 |
+
οὺ->ú
|
46 |
+
ηυ->eu
|
47 |
+
ηύ->óu
|
48 |
+
ηὺ->óu
|
49 |
+
υι->ui
|
50 |
+
υί->úi
|
51 |
+
υὶ->úi
|
52 |
+
υ->u
|
53 |
+
ἀ->a
|
54 |
+
ἁ->ha
|
55 |
+
ἂ->á
|
56 |
+
ἃ->há
|
57 |
+
ἄ->á
|
58 |
+
ἅ->há
|
59 |
+
ἆ->a
|
60 |
+
ἇ->ha
|
61 |
+
Ἀ->a
|
62 |
+
Ἁ->Ha
|
63 |
+
Ἂ->á
|
64 |
+
Ἃ->Ha
|
65 |
+
Ἄ->á
|
66 |
+
Ἅ->Ha
|
67 |
+
Ἆ->á
|
68 |
+
Ἇ->Ha
|
69 |
+
ἐ->e
|
70 |
+
ἑ->hey
|
71 |
+
ἒ->é
|
72 |
+
ἓ->hé
|
73 |
+
ἔ->é
|
74 |
+
ἕ->hé
|
75 |
+
Ἐ->é
|
76 |
+
Ἑ->he
|
77 |
+
Ἒ->é
|
78 |
+
Ἓ->Hé
|
79 |
+
Ἔ->é
|
80 |
+
Ἕ->Hé
|
81 |
+
ἠ->E
|
82 |
+
ἡ->he
|
83 |
+
ἢ->é
|
84 |
+
ἣ->hé
|
85 |
+
ἤ->é
|
86 |
+
ἥ->hé
|
87 |
+
ἦ->e
|
88 |
+
ἧ->hé
|
89 |
+
Ἠ->E
|
90 |
+
Ἡ->he
|
91 |
+
Ἢ->é
|
92 |
+
Ἣ->hé
|
93 |
+
Ἤ->é
|
94 |
+
Ἥ->hé
|
95 |
+
Ἦ->E
|
96 |
+
Ἧ->hé
|
97 |
+
ἰ->i
|
98 |
+
ἱ->hi
|
99 |
+
ἲ->í
|
100 |
+
ἳ->hí
|
101 |
+
ἴ->í
|
102 |
+
ἵ->hí
|
103 |
+
ἶ->i
|
104 |
+
ἷ->hí
|
105 |
+
Ἰ->i
|
106 |
+
Ἱ->Hi
|
107 |
+
Ἲ->í
|
108 |
+
Ἳ->Hí
|
109 |
+
Ἴ->í
|
110 |
+
Ἵ->Hí
|
111 |
+
Ἶ->i
|
112 |
+
Ἷ->Hí
|
113 |
+
ὀ->o
|
114 |
+
ὁ->ho
|
115 |
+
ὂ->ó
|
116 |
+
ὃ->ho
|
117 |
+
ὄ->ó
|
118 |
+
ὅ->ho
|
119 |
+
Ὀ->o
|
120 |
+
Ὁ->ho
|
121 |
+
Ὂ->ó
|
122 |
+
Ὃ->ho
|
123 |
+
Ὄ->ó
|
124 |
+
Ὅ->ho
|
125 |
+
ὐ->u
|
126 |
+
ὑ->hu
|
127 |
+
ὒ->ú
|
128 |
+
ὓ->hu
|
129 |
+
ὔ->ú
|
130 |
+
ὕ->hu
|
131 |
+
ὖ->u
|
132 |
+
ὗ->hu
|
133 |
+
Ὑ->Hu
|
134 |
+
Ὓ->ú
|
135 |
+
Ὕ->ú
|
136 |
+
Ὗ->Hu
|
137 |
+
ὠ->o
|
138 |
+
ὡ->ho
|
139 |
+
ὢ->ó
|
140 |
+
ὣ->hó
|
141 |
+
ὤ->ó
|
142 |
+
ὥ->hó
|
143 |
+
ὦ->o
|
144 |
+
ὧ->hó
|
145 |
+
Ὠ->o
|
146 |
+
Ὡ->ho
|
147 |
+
Ὢ->ó
|
148 |
+
Ὣ->ho
|
149 |
+
Ὤ->ó
|
150 |
+
Ὥ->hó
|
151 |
+
Ὦ->o
|
152 |
+
Ὧ->hó
|
153 |
+
ὰ->á
|
154 |
+
ά->á
|
155 |
+
ὲ->é
|
156 |
+
έ->é
|
157 |
+
ὴ->é
|
158 |
+
ή->é
|
159 |
+
ὶ->i
|
160 |
+
ί->í
|
161 |
+
ὸ->ó
|
162 |
+
ό->ó
|
163 |
+
ὺ->ú
|
164 |
+
ύ->ú
|
165 |
+
ὼ->ó
|
166 |
+
ώ->ó
|
167 |
+
ᾀ->ai
|
168 |
+
ᾁ->hai
|
169 |
+
ᾂ->ái
|
170 |
+
ᾃ->hái
|
171 |
+
ᾄ->ái
|
172 |
+
ᾅ->hái
|
173 |
+
ᾆ->ai
|
174 |
+
ᾇ->hái
|
175 |
+
ᾈ->Ai
|
176 |
+
ᾉ->hai
|
177 |
+
ᾊ->ái
|
178 |
+
ᾋ->hái
|
179 |
+
ᾌ->ái
|
180 |
+
ᾍ->hái
|
181 |
+
ᾎ->ái
|
182 |
+
ᾏ->hái
|
183 |
+
ᾐ->ei
|
184 |
+
ᾑ->hai
|
185 |
+
ᾒ->éi
|
186 |
+
ᾓ->hái
|
187 |
+
ᾔ->éi
|
188 |
+
ᾕ->hái
|
189 |
+
ᾖ->ei
|
190 |
+
ᾗ->hái
|
191 |
+
ᾘ->ei
|
192 |
+
ᾙ->he
|
193 |
+
ᾚ->éi
|
194 |
+
ᾛ->hé
|
195 |
+
ᾜ->éi
|
196 |
+
ᾝ->hé
|
197 |
+
ᾞ->ei
|
198 |
+
ᾟ->hé
|
199 |
+
ᾠ->oi
|
200 |
+
ᾡ->hoi
|
201 |
+
ᾢ->ói
|
202 |
+
ᾣ->hói
|
203 |
+
ᾤ->ói
|
204 |
+
ᾥ->hói
|
205 |
+
ᾦ->oi
|
206 |
+
ᾧ->hói
|
207 |
+
ᾨ->oi
|
208 |
+
ᾩ->hoi
|
209 |
+
ᾪ->oi
|
210 |
+
ᾫ->hói
|
211 |
+
ᾬ->ói
|
212 |
+
ᾭ->hói
|
213 |
+
ᾮ->ói
|
214 |
+
ᾯ->hói
|
215 |
+
ᾰ->a
|
216 |
+
ᾱ->a
|
217 |
+
ᾲ->ái
|
218 |
+
ᾳ->ai
|
219 |
+
ᾴ->ái
|
220 |
+
ᾶ->a
|
221 |
+
ᾷ->ái
|
222 |
+
Ᾰ->á
|
223 |
+
Ᾱ->A
|
224 |
+
Ὰ->A
|
225 |
+
Ά->Ha
|
226 |
+
ᾼ->Ai
|
227 |
+
ῂ->éy
|
228 |
+
ῃ->ey
|
229 |
+
ῄ->éy
|
230 |
+
ῆ->éy
|
231 |
+
ῇ->éy
|
232 |
+
Ὲ->He
|
233 |
+
Έ->E
|
234 |
+
Ὴ->Hay
|
235 |
+
Ή->Ei
|
236 |
+
ῌ->Ei
|
237 |
+
ῐ->í
|
238 |
+
ῑ->i
|
239 |
+
ῒ->í
|
240 |
+
ΐ->i
|
241 |
+
ῖ->í
|
242 |
+
ῗ->í
|
243 |
+
Ῐ->í
|
244 |
+
Ῑ->I
|
245 |
+
Ὶ->Hee
|
246 |
+
Ί->I
|
247 |
+
ῠ->U
|
248 |
+
ῡ->u
|
249 |
+
ῢ->ú
|
250 |
+
ΰ->ú
|
251 |
+
ῤ->r
|
252 |
+
ῥ->r
|
253 |
+
ῦ->ú
|
254 |
+
ῧ->ú
|
255 |
+
Ῠ->ú
|
256 |
+
Ῡ->U
|
257 |
+
Ὺ->U
|
258 |
+
Ύ->Hu
|
259 |
+
Ῥ->R
|
260 |
+
ῲ->ó
|
261 |
+
ῳ->oi
|
262 |
+
ῴ->ói
|
263 |
+
ῶ->ó
|
264 |
+
ῷ->ói
|
265 |
+
Ὸ->O
|
266 |
+
Ό->Ho
|
267 |
+
Ὼ->Ho
|
268 |
+
Ώ->O
|
269 |
+
ῼ->Oi
|
270 |
+
ή->é
|
271 |
+
η->e
|
272 |
+
αu->au
|
273 |
+
ύ->u
|
274 |
+
ηὐ->eu
|
convert.py
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
if __name__ == "__main__":
|
2 |
+
with open("ancient-greek-phonemes.txt", "r") as mapping_file:
|
3 |
+
mapping = dict(
|
4 |
+
[(ancient_greek, latin) for ancient_greek, latin in [line.rstrip().split("->") for line in mapping_file.readlines()]]
|
5 |
+
)
|
6 |
+
with open("original.txt", "r") as book_txt:
|
7 |
+
book = book_txt.read()
|
8 |
+
|
9 |
+
for key, value in mapping.items():
|
10 |
+
book = book.replace(key, value)
|
11 |
+
|
12 |
+
with open("converted.txt", "w") as book_txt:
|
13 |
+
book_txt.write(book)
|