Jeronymous commited on
Commit
c78a91c
β€’
1 Parent(s): 0dcddb0

version 2 with a real Chat service

Browse files
Files changed (5) hide show
  1. LICENSE.txt +438 -0
  2. README.md +2 -2
  3. app.py +340 -328
  4. requirements.txt +5 -5
  5. style.css +9 -0
LICENSE.txt ADDED
@@ -0,0 +1,438 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Attribution-NonCommercial-ShareAlike 4.0 International
2
+
3
+ =======================================================================
4
+
5
+ Creative Commons Corporation ("Creative Commons") is not a law firm and
6
+ does not provide legal services or legal advice. Distribution of
7
+ Creative Commons public licenses does not create a lawyer-client or
8
+ other relationship. Creative Commons makes its licenses and related
9
+ information available on an "as-is" basis. Creative Commons gives no
10
+ warranties regarding its licenses, any material licensed under their
11
+ terms and conditions, or any related information. Creative Commons
12
+ disclaims all liability for damages resulting from their use to the
13
+ fullest extent possible.
14
+
15
+ Using Creative Commons Public Licenses
16
+
17
+ Creative Commons public licenses provide a standard set of terms and
18
+ conditions that creators and other rights holders may use to share
19
+ original works of authorship and other material subject to copyright
20
+ and certain other rights specified in the public license below. The
21
+ following considerations are for informational purposes only, are not
22
+ exhaustive, and do not form part of our licenses.
23
+
24
+ Considerations for licensors: Our public licenses are
25
+ intended for use by those authorized to give the public
26
+ permission to use material in ways otherwise restricted by
27
+ copyright and certain other rights. Our licenses are
28
+ irrevocable. Licensors should read and understand the terms
29
+ and conditions of the license they choose before applying it.
30
+ Licensors should also secure all rights necessary before
31
+ applying our licenses so that the public can reuse the
32
+ material as expected. Licensors should clearly mark any
33
+ material not subject to the license. This includes other CC-
34
+ licensed material, or material used under an exception or
35
+ limitation to copyright. More considerations for licensors:
36
+ wiki.creativecommons.org/Considerations_for_licensors
37
+
38
+ Considerations for the public: By using one of our public
39
+ licenses, a licensor grants the public permission to use the
40
+ licensed material under specified terms and conditions. If
41
+ the licensor's permission is not necessary for any reason--for
42
+ example, because of any applicable exception or limitation to
43
+ copyright--then that use is not regulated by the license. Our
44
+ licenses grant only permissions under copyright and certain
45
+ other rights that a licensor has authority to grant. Use of
46
+ the licensed material may still be restricted for other
47
+ reasons, including because others have copyright or other
48
+ rights in the material. A licensor may make special requests,
49
+ such as asking that all changes be marked or described.
50
+ Although not required by our licenses, you are encouraged to
51
+ respect those requests where reasonable. More considerations
52
+ for the public:
53
+ wiki.creativecommons.org/Considerations_for_licensees
54
+
55
+ =======================================================================
56
+
57
+ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
58
+ Public License
59
+
60
+ By exercising the Licensed Rights (defined below), You accept and agree
61
+ to be bound by the terms and conditions of this Creative Commons
62
+ Attribution-NonCommercial-ShareAlike 4.0 International Public License
63
+ ("Public License"). To the extent this Public License may be
64
+ interpreted as a contract, You are granted the Licensed Rights in
65
+ consideration of Your acceptance of these terms and conditions, and the
66
+ Licensor grants You such rights in consideration of benefits the
67
+ Licensor receives from making the Licensed Material available under
68
+ these terms and conditions.
69
+
70
+
71
+ Section 1 -- Definitions.
72
+
73
+ a. Adapted Material means material subject to Copyright and Similar
74
+ Rights that is derived from or based upon the Licensed Material
75
+ and in which the Licensed Material is translated, altered,
76
+ arranged, transformed, or otherwise modified in a manner requiring
77
+ permission under the Copyright and Similar Rights held by the
78
+ Licensor. For purposes of this Public License, where the Licensed
79
+ Material is a musical work, performance, or sound recording,
80
+ Adapted Material is always produced where the Licensed Material is
81
+ synched in timed relation with a moving image.
82
+
83
+ b. Adapter's License means the license You apply to Your Copyright
84
+ and Similar Rights in Your contributions to Adapted Material in
85
+ accordance with the terms and conditions of this Public License.
86
+
87
+ c. BY-NC-SA Compatible License means a license listed at
88
+ creativecommons.org/compatiblelicenses, approved by Creative
89
+ Commons as essentially the equivalent of this Public License.
90
+
91
+ d. Copyright and Similar Rights means copyright and/or similar rights
92
+ closely related to copyright including, without limitation,
93
+ performance, broadcast, sound recording, and Sui Generis Database
94
+ Rights, without regard to how the rights are labeled or
95
+ categorized. For purposes of this Public License, the rights
96
+ specified in Section 2(b)(1)-(2) are not Copyright and Similar
97
+ Rights.
98
+
99
+ e. Effective Technological Measures means those measures that, in the
100
+ absence of proper authority, may not be circumvented under laws
101
+ fulfilling obligations under Article 11 of the WIPO Copyright
102
+ Treaty adopted on December 20, 1996, and/or similar international
103
+ agreements.
104
+
105
+ f. Exceptions and Limitations means fair use, fair dealing, and/or
106
+ any other exception or limitation to Copyright and Similar Rights
107
+ that applies to Your use of the Licensed Material.
108
+
109
+ g. License Elements means the license attributes listed in the name
110
+ of a Creative Commons Public License. The License Elements of this
111
+ Public License are Attribution, NonCommercial, and ShareAlike.
112
+
113
+ h. Licensed Material means the artistic or literary work, database,
114
+ or other material to which the Licensor applied this Public
115
+ License.
116
+
117
+ i. Licensed Rights means the rights granted to You subject to the
118
+ terms and conditions of this Public License, which are limited to
119
+ all Copyright and Similar Rights that apply to Your use of the
120
+ Licensed Material and that the Licensor has authority to license.
121
+
122
+ j. Licensor means the individual(s) or entity(ies) granting rights
123
+ under this Public License.
124
+
125
+ k. NonCommercial means not primarily intended for or directed towards
126
+ commercial advantage or monetary compensation. For purposes of
127
+ this Public License, the exchange of the Licensed Material for
128
+ other material subject to Copyright and Similar Rights by digital
129
+ file-sharing or similar means is NonCommercial provided there is
130
+ no payment of monetary compensation in connection with the
131
+ exchange.
132
+
133
+ l. Share means to provide material to the public by any means or
134
+ process that requires permission under the Licensed Rights, such
135
+ as reproduction, public display, public performance, distribution,
136
+ dissemination, communication, or importation, and to make material
137
+ available to the public including in ways that members of the
138
+ public may access the material from a place and at a time
139
+ individually chosen by them.
140
+
141
+ m. Sui Generis Database Rights means rights other than copyright
142
+ resulting from Directive 96/9/EC of the European Parliament and of
143
+ the Council of 11 March 1996 on the legal protection of databases,
144
+ as amended and/or succeeded, as well as other essentially
145
+ equivalent rights anywhere in the world.
146
+
147
+ n. You means the individual or entity exercising the Licensed Rights
148
+ under this Public License. Your has a corresponding meaning.
149
+
150
+
151
+ Section 2 -- Scope.
152
+
153
+ a. License grant.
154
+
155
+ 1. Subject to the terms and conditions of this Public License,
156
+ the Licensor hereby grants You a worldwide, royalty-free,
157
+ non-sublicensable, non-exclusive, irrevocable license to
158
+ exercise the Licensed Rights in the Licensed Material to:
159
+
160
+ a. reproduce and Share the Licensed Material, in whole or
161
+ in part, for NonCommercial purposes only; and
162
+
163
+ b. produce, reproduce, and Share Adapted Material for
164
+ NonCommercial purposes only.
165
+
166
+ 2. Exceptions and Limitations. For the avoidance of doubt, where
167
+ Exceptions and Limitations apply to Your use, this Public
168
+ License does not apply, and You do not need to comply with
169
+ its terms and conditions.
170
+
171
+ 3. Term. The term of this Public License is specified in Section
172
+ 6(a).
173
+
174
+ 4. Media and formats; technical modifications allowed. The
175
+ Licensor authorizes You to exercise the Licensed Rights in
176
+ all media and formats whether now known or hereafter created,
177
+ and to make technical modifications necessary to do so. The
178
+ Licensor waives and/or agrees not to assert any right or
179
+ authority to forbid You from making technical modifications
180
+ necessary to exercise the Licensed Rights, including
181
+ technical modifications necessary to circumvent Effective
182
+ Technological Measures. For purposes of this Public License,
183
+ simply making modifications authorized by this Section 2(a)
184
+ (4) never produces Adapted Material.
185
+
186
+ 5. Downstream recipients.
187
+
188
+ a. Offer from the Licensor -- Licensed Material. Every
189
+ recipient of the Licensed Material automatically
190
+ receives an offer from the Licensor to exercise the
191
+ Licensed Rights under the terms and conditions of this
192
+ Public License.
193
+
194
+ b. Additional offer from the Licensor -- Adapted Material.
195
+ Every recipient of Adapted Material from You
196
+ automatically receives an offer from the Licensor to
197
+ exercise the Licensed Rights in the Adapted Material
198
+ under the conditions of the Adapter's License You apply.
199
+
200
+ c. No downstream restrictions. You may not offer or impose
201
+ any additional or different terms or conditions on, or
202
+ apply any Effective Technological Measures to, the
203
+ Licensed Material if doing so restricts exercise of the
204
+ Licensed Rights by any recipient of the Licensed
205
+ Material.
206
+
207
+ 6. No endorsement. Nothing in this Public License constitutes or
208
+ may be construed as permission to assert or imply that You
209
+ are, or that Your use of the Licensed Material is, connected
210
+ with, or sponsored, endorsed, or granted official status by,
211
+ the Licensor or others designated to receive attribution as
212
+ provided in Section 3(a)(1)(A)(i).
213
+
214
+ b. Other rights.
215
+
216
+ 1. Moral rights, such as the right of integrity, are not
217
+ licensed under this Public License, nor are publicity,
218
+ privacy, and/or other similar personality rights; however, to
219
+ the extent possible, the Licensor waives and/or agrees not to
220
+ assert any such rights held by the Licensor to the limited
221
+ extent necessary to allow You to exercise the Licensed
222
+ Rights, but not otherwise.
223
+
224
+ 2. Patent and trademark rights are not licensed under this
225
+ Public License.
226
+
227
+ 3. To the extent possible, the Licensor waives any right to
228
+ collect royalties from You for the exercise of the Licensed
229
+ Rights, whether directly or through a collecting society
230
+ under any voluntary or waivable statutory or compulsory
231
+ licensing scheme. In all other cases the Licensor expressly
232
+ reserves any right to collect such royalties, including when
233
+ the Licensed Material is used other than for NonCommercial
234
+ purposes.
235
+
236
+
237
+ Section 3 -- License Conditions.
238
+
239
+ Your exercise of the Licensed Rights is expressly made subject to the
240
+ following conditions.
241
+
242
+ a. Attribution.
243
+
244
+ 1. If You Share the Licensed Material (including in modified
245
+ form), You must:
246
+
247
+ a. retain the following if it is supplied by the Licensor
248
+ with the Licensed Material:
249
+
250
+ i. identification of the creator(s) of the Licensed
251
+ Material and any others designated to receive
252
+ attribution, in any reasonable manner requested by
253
+ the Licensor (including by pseudonym if
254
+ designated);
255
+
256
+ ii. a copyright notice;
257
+
258
+ iii. a notice that refers to this Public License;
259
+
260
+ iv. a notice that refers to the disclaimer of
261
+ warranties;
262
+
263
+ v. a URI or hyperlink to the Licensed Material to the
264
+ extent reasonably practicable;
265
+
266
+ b. indicate if You modified the Licensed Material and
267
+ retain an indication of any previous modifications; and
268
+
269
+ c. indicate the Licensed Material is licensed under this
270
+ Public License, and include the text of, or the URI or
271
+ hyperlink to, this Public License.
272
+
273
+ 2. You may satisfy the conditions in Section 3(a)(1) in any
274
+ reasonable manner based on the medium, means, and context in
275
+ which You Share the Licensed Material. For example, it may be
276
+ reasonable to satisfy the conditions by providing a URI or
277
+ hyperlink to a resource that includes the required
278
+ information.
279
+ 3. If requested by the Licensor, You must remove any of the
280
+ information required by Section 3(a)(1)(A) to the extent
281
+ reasonably practicable.
282
+
283
+ b. ShareAlike.
284
+
285
+ In addition to the conditions in Section 3(a), if You Share
286
+ Adapted Material You produce, the following conditions also apply.
287
+
288
+ 1. The Adapter's License You apply must be a Creative Commons
289
+ license with the same License Elements, this version or
290
+ later, or a BY-NC-SA Compatible License.
291
+
292
+ 2. You must include the text of, or the URI or hyperlink to, the
293
+ Adapter's License You apply. You may satisfy this condition
294
+ in any reasonable manner based on the medium, means, and
295
+ context in which You Share Adapted Material.
296
+
297
+ 3. You may not offer or impose any additional or different terms
298
+ or conditions on, or apply any Effective Technological
299
+ Measures to, Adapted Material that restrict exercise of the
300
+ rights granted under the Adapter's License You apply.
301
+
302
+
303
+ Section 4 -- Sui Generis Database Rights.
304
+
305
+ Where the Licensed Rights include Sui Generis Database Rights that
306
+ apply to Your use of the Licensed Material:
307
+
308
+ a. for the avoidance of doubt, Section 2(a)(1) grants You the right
309
+ to extract, reuse, reproduce, and Share all or a substantial
310
+ portion of the contents of the database for NonCommercial purposes
311
+ only;
312
+
313
+ b. if You include all or a substantial portion of the database
314
+ contents in a database in which You have Sui Generis Database
315
+ Rights, then the database in which You have Sui Generis Database
316
+ Rights (but not its individual contents) is Adapted Material,
317
+ including for purposes of Section 3(b); and
318
+
319
+ c. You must comply with the conditions in Section 3(a) if You Share
320
+ all or a substantial portion of the contents of the database.
321
+
322
+ For the avoidance of doubt, this Section 4 supplements and does not
323
+ replace Your obligations under this Public License where the Licensed
324
+ Rights include other Copyright and Similar Rights.
325
+
326
+
327
+ Section 5 -- Disclaimer of Warranties and Limitation of Liability.
328
+
329
+ a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
330
+ EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
331
+ AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
332
+ ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
333
+ IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
334
+ WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
335
+ PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
336
+ ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
337
+ KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
338
+ ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
339
+
340
+ b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
341
+ TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
342
+ NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
343
+ INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
344
+ COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
345
+ USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
346
+ ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
347
+ DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
348
+ IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
349
+
350
+ c. The disclaimer of warranties and limitation of liability provided
351
+ above shall be interpreted in a manner that, to the extent
352
+ possible, most closely approximates an absolute disclaimer and
353
+ waiver of all liability.
354
+
355
+
356
+ Section 6 -- Term and Termination.
357
+
358
+ a. This Public License applies for the term of the Copyright and
359
+ Similar Rights licensed here. However, if You fail to comply with
360
+ this Public License, then Your rights under this Public License
361
+ terminate automatically.
362
+
363
+ b. Where Your right to use the Licensed Material has terminated under
364
+ Section 6(a), it reinstates:
365
+
366
+ 1. automatically as of the date the violation is cured, provided
367
+ it is cured within 30 days of Your discovery of the
368
+ violation; or
369
+
370
+ 2. upon express reinstatement by the Licensor.
371
+
372
+ For the avoidance of doubt, this Section 6(b) does not affect any
373
+ right the Licensor may have to seek remedies for Your violations
374
+ of this Public License.
375
+
376
+ c. For the avoidance of doubt, the Licensor may also offer the
377
+ Licensed Material under separate terms or conditions or stop
378
+ distributing the Licensed Material at any time; however, doing so
379
+ will not terminate this Public License.
380
+
381
+ d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
382
+ License.
383
+
384
+
385
+ Section 7 -- Other Terms and Conditions.
386
+
387
+ a. The Licensor shall not be bound by any additional or different
388
+ terms or conditions communicated by You unless expressly agreed.
389
+
390
+ b. Any arrangements, understandings, or agreements regarding the
391
+ Licensed Material not stated herein are separate from and
392
+ independent of the terms and conditions of this Public License.
393
+
394
+
395
+ Section 8 -- Interpretation.
396
+
397
+ a. For the avoidance of doubt, this Public License does not, and
398
+ shall not be interpreted to, reduce, limit, restrict, or impose
399
+ conditions on any use of the Licensed Material that could lawfully
400
+ be made without permission under this Public License.
401
+
402
+ b. To the extent possible, if any provision of this Public License is
403
+ deemed unenforceable, it shall be automatically reformed to the
404
+ minimum extent necessary to make it enforceable. If the provision
405
+ cannot be reformed, it shall be severed from this Public License
406
+ without affecting the enforceability of the remaining terms and
407
+ conditions.
408
+
409
+ c. No term or condition of this Public License will be waived and no
410
+ failure to comply consented to unless expressly agreed to by the
411
+ Licensor.
412
+
413
+ d. Nothing in this Public License constitutes or may be interpreted
414
+ as a limitation upon, or waiver of, any privileges and immunities
415
+ that apply to the Licensor or You, including from the legal
416
+ processes of any jurisdiction or authority.
417
+
418
+ =======================================================================
419
+
420
+ Creative Commons is not a party to its public
421
+ licenses. Notwithstanding, Creative Commons may elect to apply one of
422
+ its public licenses to material it publishes and in those instances
423
+ will be considered the Ò€œLicensor.Ò€ The text of the Creative Commons
424
+ public licenses is dedicated to the public domain under the CC0 Public
425
+ Domain Dedication. Except for the limited purpose of indicating that
426
+ material is shared under a Creative Commons public license or as
427
+ otherwise permitted by the Creative Commons policies published at
428
+ creativecommons.org/policies, Creative Commons does not authorize the
429
+ use of the trademark "Creative Commons" or any other trademark or logo
430
+ of Creative Commons without its prior written consent including,
431
+ without limitation, in connection with any unauthorized modifications
432
+ to any of its public licenses or any other arrangements,
433
+ understandings, or agreements concerning use of licensed material. For
434
+ the avoidance of doubt, this paragraph does not form part of the
435
+ public licenses.
436
+
437
+ Creative Commons may be contacted at creativecommons.org.
438
+
README.md CHANGED
@@ -1,10 +1,10 @@
1
  ---
2
  title: Claire Chat
3
- emoji: πŸŽ™πŸ’¬
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
7
- sdk_version: 4.4.0
8
  app_file: app.py
9
  pinned: true
10
  license: cc-by-nc-sa-4.0
 
1
  ---
2
  title: Claire Chat
3
+ emoji: πŸ—£οΈπŸ‡«πŸ‡·πŸ’¬
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
7
+ sdk_version: 4.4.1
8
  app_file: app.py
9
  pinned: true
10
  license: cc-by-nc-sa-4.0
app.py CHANGED
@@ -1,18 +1,149 @@
1
  import gradio as gr
 
2
  import transformers
3
- from transformers import AutoConfig, AutoTokenizer, AutoModel, AutoModelForCausalLM
4
  import torch
5
  import unicodedata
6
- import re
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  # Default variables
9
- default_max_new_tokens = 100
10
  default_temperature = 1.0
 
11
  default_top_k = 10
12
  default_top_p = 0.99
13
- default_repetition_penalty = 1.0
14
 
15
- model_name = "OpenLLM-France/Claire-7B-0.1"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  print("Loading model...")
18
 
@@ -38,171 +169,134 @@ newspk_token_id = tokenizer.encode("[")
38
  assert len(newspk_token_id) == 1
39
  newspk_token_id = newspk_token_id[0]
40
 
41
-
42
- # Class to encapsulate the Claire chatbot
43
- class ClaireChatBot:
44
- def __init__(
45
- self,
46
- # Chat will display...
47
- user_name="VOUS:",
48
- bot_name="CHATBOT:",
49
- other_name_regex_in=r"AUTRE (\d+):",
50
- other_name_regex_out=r"AUTRE \1:",
51
- # but Claire was trained on...
52
- user_internal_tag="[Intervenant 1:]",
53
- bot_internal_tag="[Intervenant 2:]",
54
- other_internal_tag_regex_in=r"\[Intervenant (\d+):\]",
55
- other_internal_tag_regex_out=r"\[Intervenant \1:\]",
56
- ):
57
- self.user_name = user_name
58
- self.bot_name = bot_name
59
- self.other_name_regex_in = other_name_regex_in
60
- self.other_name_regex_out = other_name_regex_out
61
-
62
- self.user_internal_tag = user_internal_tag
63
- self.bot_internal_tag = bot_internal_tag
64
- self.other_internal_tag_regex_in = other_internal_tag_regex_in
65
- self.other_internal_tag_regex_out = other_internal_tag_regex_out
66
-
67
- self.device = "cuda" if torch.cuda.is_available() else "cpu"
68
-
69
- self.has_started_bracket = False
70
- self.history = ""
71
- self.history_raw = ""
72
- self.reinject_history = False
73
- self.reshow_history = False
74
-
75
- def predict(
76
- self,
77
- user_message,
78
- bot_message_start="",
79
- conversation_history="",
80
- generate_several_turns=False,
81
- max_new_tokens=default_max_new_tokens,
82
- temperature=default_temperature,
83
- top_k=default_top_k,
84
- top_p=default_top_p,
85
- repetition_penalty=default_repetition_penalty,
86
- ):
87
- user_message = claire_text_preproc_message(user_message)
88
- bot_message_start = claire_text_preproc_message(bot_message_start)
89
-
90
- if conversation_history:
91
- # Format conversation history
92
- for spk_in, spk_out in [
93
- (self.user_name, self.user_internal_tag),
94
- (self.bot_name, self.bot_internal_tag),
95
- ]:
96
- conversation_history = conversation_history.replace(spk_in, spk_out)
97
- conversation_history = re.sub(self.other_name_regex_in, self.other_internal_tag_regex_out, conversation_history)
98
- conversation_history = claire_text_preproc_conversation(conversation_history)
99
- conversation_history = conversation_history.rstrip() + "\n"
100
- else:
101
- conversation_history = self.history_raw
102
-
103
- # (Only relevant if self.reinject_history is True)
104
- user_internal_tag = self.user_internal_tag
105
- if self.has_started_bracket:
106
- user_internal_tag = user_internal_tag[1:]
107
-
108
- # Combine the user and bot messages into a conversation
109
- conversation = f"{conversation_history}{user_internal_tag} {user_message}\n{self.bot_internal_tag} {bot_message_start if bot_message_start else ''}".strip()
110
-
111
- # Encode the conversation using the tokenizer
112
- input_ids = tokenizer.encode(
113
- conversation, return_tensors="pt", add_special_tokens=False
114
  )
115
- input_ids = input_ids.to(self.device)
116
-
117
- # Generate a response using Claire
118
- response = model.generate(
119
- input_ids=input_ids,
120
- use_cache=False,
121
- early_stopping=False,
122
- temperature=temperature,
123
- do_sample=True,
124
- max_new_tokens=max_new_tokens,
125
- top_k=top_k,
126
- top_p=top_p,
127
- repetition_penalty=repetition_penalty,
128
- pad_token_id=eos_token_id,
129
- eos_token_id=eos_token_id if generate_several_turns else newspk_token_id,
 
 
 
 
 
 
 
 
 
 
 
 
 
130
  )
131
-
132
- # Decode the generated response to text
133
- response_text = tokenizer.decode(response[0], skip_special_tokens=True)
134
-
135
- # Remove last unfinished speech turn/sentence/phrase
136
- line_breaks = [u.span(0)[0] for u in re.finditer("\n", response_text)]
137
- remove_last_sentence = True
138
- if generate_several_turns:
139
- if len(line_breaks) >= 2:
140
- response_text = response_text[: line_breaks[-1]]
141
- line_breaks.pop(-1)
142
- remove_last_sentence = False
143
- if remove_last_sentence and len(line_breaks) == 1:
144
- sentence_ends = [
145
- u.span(0)[0] for u in re.finditer(r"[\.!?]", response_text)
146
- ]
147
- sentence_ends = [p for p in sentence_ends if p > line_breaks[-1]]
148
- if sentence_ends:
149
- response_text = response_text[: sentence_ends[-1] + 1]
150
- else:
151
- phrase_ends = [
152
- u.span(0)[0] for u in re.finditer(r"[,;]", response_text)
153
- ]
154
- phrase_ends = [p for p in phrase_ends if p > line_breaks[-1]]
155
- if phrase_ends:
156
- response_text = response_text[: phrase_ends[-1] + 1]
157
-
158
- ended_with_bracket = response_text.endswith("[")
159
-
160
- if self.reinject_history:
161
- self.history_raw = response_text
162
- self.has_started_bracket = ended_with_bracket
163
-
164
- if ended_with_bracket:
165
- response_text = response_text[:-1]
166
-
167
- for spk_in, spk_out in [
168
- (self.user_internal_tag, self.user_name),
169
- (self.user_internal_tag[1:], self.user_name), # Starting bracket may be missing
170
- (self.bot_internal_tag, self.bot_name),
171
- ]:
172
- response_text = response_text.replace(spk_in, spk_out)
173
- response_text = re.sub(self.other_internal_tag_regex_in, self.other_name_regex_out, response_text)
174
-
175
- if self.reshow_history:
176
- previous_history = self.history
177
- self.history = previous_history + response_text + "\n"
178
  else:
179
- previous_history = ""
180
-
181
- return previous_history + response_text
182
-
183
-
184
- def claire_text_preproc_conversation(text):
185
- text = format_special_characters(text)
186
- text = collapse_whitespaces_conversations(text)
187
- return text
188
 
189
 
190
  def claire_text_preproc_message(text):
191
  text = format_special_characters(text)
192
- text = collapse_whitespaces_message(text)
193
  text = replace_brackets(text)
194
  return text
195
 
196
 
197
- def collapse_whitespaces_conversations(text):
198
- text = re.sub(r"\n+", "\n", text)
199
- text = re.sub(r"[ \t]+", " ", text)
200
- text = re.sub(r"\n ", "\n", text)
201
- text = re.sub(r" ([\.,])", r"\1", text)
202
- return text.lstrip().rstrip(" ")
203
-
204
-
205
- def collapse_whitespaces_message(text):
206
  text = re.sub(r"\s+", " ", text)
207
  text = re.sub(r" ([\.,])", r"\1", text)
208
  return text.lstrip().rstrip(" ")
@@ -235,184 +329,102 @@ def format_special_characters(text):
235
  return text
236
 
237
 
238
- # Create the Claire chatbot instance
239
- chatbot = ClaireChatBot()
240
 
241
- # Define the Gradio interface
242
- title = "DΓ©mo de conversation avec Claire"
243
- description = "Simulation de conversations en Français avec [Claire](https://huggingface.co/OpenLLM-France/Claire-7B-0.1), sans recherche de vérité, et avec potentiellement un peu d'humour."
244
 
245
- default_parameters = [
246
- default_temperature,
247
- default_top_k,
248
- default_top_p,
249
- default_repetition_penalty,
250
- ]
 
251
 
252
- examples = [
253
- [
254
- "Nous allons commencer cette interview avec une question un peu classique. Quel est votre sport prΓ©fΓ©rΓ©?", # user_message
255
- "", # bot_message_start
256
- "", # conversation_history
257
- True, # generate_several_turns
258
- 200, # max_new_tokens
259
- *default_parameters,
260
- ],
261
- [
262
- "Que vas-tu nous cuisiner aujourd'hui?", # user_message
263
- "Alors, nous allons voir la recette de", # bot_message_start
264
- "VOUS: Bonjour Claire.\nCHATBOT: Bonjour Dominique.", # conversation_history
265
- False, # generate_several_turns
266
- default_max_new_tokens, # max_new_tokens
267
- *default_parameters,
268
- ],
269
- ]
270
 
271
- # # Test
272
- # chatbot.predict(*examples[0])
 
 
 
 
 
273
 
274
- inputs = [
275
- gr.Textbox(
276
- "",
277
- label="Prompt",
278
- info="Tapez ce que vous voulez dire au ChatBot",
279
- type="text",
280
- lines=2,
281
- ),
282
- gr.Textbox(
283
- "",
284
- label="DΓ©but de rΓ©ponse",
285
- info="Vous pouvez taper ici ce que commence Γ  vous rΓ©pondre le ChatBot",
286
- type="text",
287
- ),
288
- gr.Textbox(
289
- "",
290
- label="Historique",
291
- info="Vous pouvez copier-coller (et modifier?) ici votre historique de conversation, pour continuer cette conversation",
292
- type="text",
293
- lines=3,
294
- ),
295
- gr.Checkbox(
296
- False,
297
- label="Plus qu'un tour de parole",
298
- info="GΓ©nΓ©rer aussi comment pourrait continuer la conversation (plusieurs tours de parole incluant le vΓ΄tre)",
299
- ),
300
- gr.Slider(
301
- label="Longueur max",
302
- info="Longueur maximale du texte gΓ©nΓ©rΓ© (en nombre de 'tokens' ~ mots et ponctuations)",
303
- value=default_max_new_tokens,
304
- minimum=25,
305
- maximum=1000,
306
- step=25,
307
- interactive=True,
308
- ),
309
- gr.Slider(
310
- label="TempΓ©rature",
311
- info="Une valeur Γ©levΓ©e augmente la diversitΓ© du texte gΓ©nΓ©rΓ©, mais peut aussi produire des rΓ©sultats incohΓ©rents",
312
- value=default_temperature,
313
- minimum=0.1,
314
- maximum=1.9,
315
- step=0.1,
316
- interactive=True,
317
- ),
318
- gr.Slider(
319
- label="Top-k",
320
- info="Une valeur Γ©levΓ©e permet d'explorer plus d'alternatives, mais augmente les temps de calcul",
321
- value=default_top_k,
322
- minimum=1,
323
- maximum=50,
324
- step=1,
325
- interactive=True,
326
- ),
327
- gr.Slider(
328
- label="Top-p",
329
- info="Une valeur Γ©levΓ©e permet d'explorer des alternatives moins probables",
330
- value=default_top_p,
331
- minimum=0.9,
332
- maximum=1.0,
333
- step=0.01,
334
- interactive=True,
335
- ),
336
- gr.Slider(
337
- label="PΓ©nalitΓ© de rΓ©pΓ©tition",
338
- info="PΓ©nalisation des rΓ©pΓ©titions",
339
- value=default_repetition_penalty,
340
- minimum=1.0,
341
- maximum=2.0,
342
- step=0.05,
343
- interactive=True,
344
- ),
345
- ]
346
 
347
- theme = gr.themes.Monochrome(
348
- secondary_hue="emerald",
349
- neutral_hue="teal",
350
- ).set(
351
- body_background_fill="*primary_950",
352
- body_background_fill_dark="*secondary_950",
353
- body_text_color="*primary_50",
354
- body_text_color_dark="*secondary_100",
355
- body_text_color_subdued="*primary_300",
356
- body_text_color_subdued_dark="*primary_300",
357
- background_fill_primary="*primary_600",
358
- background_fill_primary_dark="*primary_400",
359
- background_fill_secondary="*primary_950",
360
- background_fill_secondary_dark="*primary_950",
361
- border_color_accent="*secondary_600",
362
- border_color_primary="*secondary_50",
363
- border_color_primary_dark="*secondary_50",
364
- color_accent="*secondary_50",
365
- color_accent_soft="*primary_500",
366
- color_accent_soft_dark="*primary_500",
367
- link_text_color="*secondary_950",
368
- link_text_color_dark="*primary_50",
369
- link_text_color_active="*primary_50",
370
- link_text_color_active_dark="*primary_50",
371
- link_text_color_hover="*primary_50",
372
- link_text_color_hover_dark="*primary_50",
373
- link_text_color_visited="*primary_50",
374
- block_background_fill="*primary_950",
375
- block_background_fill_dark="*primary_950",
376
- block_border_color="*secondary_500",
377
- block_border_color_dark="*secondary_500",
378
- block_info_text_color="*primary_50",
379
- block_info_text_color_dark="*primary_50",
380
- block_label_background_fill="*primary_950",
381
- block_label_background_fill_dark="*secondary_950",
382
- block_label_border_color="*secondary_500",
383
- block_label_border_color_dark="*secondary_500",
384
- block_label_text_color="*secondary_500",
385
- block_label_text_color_dark="*secondary_500",
386
- block_title_background_fill="*primary_950",
387
- panel_background_fill="*primary_950",
388
- panel_border_color="*primary_950",
389
- checkbox_background_color="*primary_950",
390
- checkbox_background_color_dark="*primary_950",
391
- checkbox_background_color_focus="*primary_950",
392
- checkbox_border_color="*secondary_500",
393
- input_background_fill="*primary_800",
394
- input_background_fill_focus="*primary_950",
395
- input_background_fill_hover="*secondary_950",
396
- input_placeholder_color="*secondary_950",
397
- slider_color="*primary_950",
398
- slider_color_dark="*primary_950",
399
- table_even_background_fill="*primary_800",
400
- table_odd_background_fill="*primary_600",
401
- button_primary_background_fill="*primary_800",
402
- button_primary_background_fill_dark="*primary_800",
403
- )
404
 
405
- iface = gr.Interface(
406
- fn=chatbot.predict,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
407
  title=title,
408
  description=description,
 
 
409
  examples=examples,
410
- inputs=inputs,
411
- outputs="text",
412
- theme=theme,
 
413
  )
414
 
415
- print("Launching chat...")
416
-
417
- # Launch the Gradio interface for the model
418
- iface.launch(share=True)
 
 
1
  import gradio as gr
2
+ from threading import Thread
3
  import transformers
4
+ import spaces
5
  import torch
6
  import unicodedata
7
+ import regex as re
8
+
9
+ # Model
10
+ model_name = "OpenLLM-France/Claire-7B-0.1"
11
+
12
+ # Title and description
13
+ title = "Conversation avec Claire"
14
+ description = """\
15
+ Simulation de conversation en Français avec [OpenLLM-France/Claire-7B](https://huggingface.co/OpenLLM-France/Claire-7B-0.1).
16
+ <strong>Claire n'est <u>pas</u> un assistant personnel</strong>, elle a tendance Γ  comprendre et rΓ©pondre un <b>langage parlΓ©</b>, \
17
+ peut faire preuve d'humour, et <strong>ne vous dira <u>pas</u> (forcΓ©ment) des vΓ©ritΓ©s</strong>.
18
+ """
19
 
20
  # Default variables
21
+ default_max_new_tokens = 200
22
  default_temperature = 1.0
23
+ default_repetition_penalty = 1.5
24
  default_top_k = 10
25
  default_top_p = 0.99
 
26
 
27
+ default_parameters = [
28
+ default_max_new_tokens,
29
+ default_temperature,
30
+ default_repetition_penalty,
31
+ default_top_k,
32
+ default_top_p,
33
+ ]
34
+
35
+ # Examples
36
+ examples = [
37
+ [
38
+ "Bonjour Claire. Quel est votre sport prΓ©fΓ©rΓ©?", # user_message
39
+ False,
40
+ "", # bot_message_start
41
+ # "", # First name
42
+ *default_parameters,
43
+ ],
44
+ [
45
+ "Bonjour. Je vous propose de faire un tour de table.", # user_message
46
+ True, # more than one turn
47
+ "", # bot_message_start
48
+ # "", # First name
49
+ *default_parameters,
50
+ ],
51
+ [
52
+ "Que vas-tu nous cuisiner aujourd'hui?", # user_message
53
+ False,
54
+ "Alors, nous allons voir la recette", # bot_message_start
55
+ # "", # First name
56
+ *default_parameters,
57
+ ],
58
+ ]
59
+
60
+ # Override default gradio buttons
61
+ gradio_buttons = dict(
62
+ submit_btn=gr.Button("Envoyer"), # Sumbit
63
+ retry_btn=gr.Button("πŸ”„ GΓ©nΓ©rer une autre rΓ©ponse"), # "πŸ”„ Retry"
64
+ undo_btn=gr.Button("↩️ Annuler"), # "↩️ Undo"
65
+ clear_btn=gr.Button("πŸ—‘οΈ Effacer la conversation"), # "πŸ—‘οΈ Clear"
66
+ # stop_btn= None,
67
+ stop_btn=gr.Button("ArrΓͺter"), # Stop
68
+ )
69
+ additional_inputs_name="Paramètres" # "Additional inputs"
70
+ textbox=gr.Textbox(
71
+ container=False,
72
+ show_label=False,
73
+ label="Message",
74
+ placeholder="Votre message (laissez vide pour que le Bot continue seul)...",
75
+ scale=7,
76
+ lines=2,
77
+ autofocus=False,
78
+ )
79
+ chatbot_label="Conversation" # Chatbot
80
+
81
+
82
+ additional_inputs = [
83
+ gr.Checkbox(
84
+ False,
85
+ label="Plus qu'un tour de parole",
86
+ info="GΓ©nΓ©rer plusieurs tours de parole (et donc comment vous pourriez continuer la conversation)",
87
+ ),
88
+ gr.Textbox(
89
+ "",
90
+ label="DΓ©but de rΓ©ponse",
91
+ info="Vous pouvez taper ici ce que commence Γ  vous rΓ©pondre le Bot (pensez Γ  actualiser entre chaque gΓ©nΓ©ration)",
92
+ type="text",
93
+ ),
94
+ # gr.Textbox(
95
+ # "",
96
+ # label="Votre prΓ©nom",
97
+ # info="PrΓ©nom de vous en tant qu'interlocuteur (si vous vous nommez, le bot s'appellera Claire)",
98
+ # ),
99
+ gr.Slider(
100
+ label="Longueur max",
101
+ info="Longueur maximale du texte gΓ©nΓ©rΓ© (en nombre de 'tokens' ~ mots et ponctuations)",
102
+ value=default_max_new_tokens,
103
+ minimum=25,
104
+ maximum=1000,
105
+ step=25,
106
+ interactive=True,
107
+ ),
108
+ gr.Slider(
109
+ label="TempΓ©rature",
110
+ info="Une valeur Γ©levΓ©e augmente la diversitΓ© du texte gΓ©nΓ©rΓ©, mais peut aussi produire des rΓ©sultats incohΓ©rents",
111
+ value=default_temperature,
112
+ minimum=0.1,
113
+ maximum=1.9,
114
+ step=0.1,
115
+ interactive=True,
116
+ ),
117
+ gr.Slider(
118
+ label="PΓ©nalitΓ© de rΓ©pΓ©tition",
119
+ info="PΓ©nalisation des rΓ©pΓ©titions",
120
+ value=default_repetition_penalty,
121
+ minimum=1.0,
122
+ maximum=1.95,
123
+ step=0.05,
124
+ interactive=True,
125
+ ),
126
+ gr.Slider(
127
+ label="Top-k",
128
+ info="Une valeur Γ©levΓ©e permet d'explorer plus d'alternatives",
129
+ value=default_top_k,
130
+ minimum=1,
131
+ maximum=50,
132
+ step=1,
133
+ interactive=True,
134
+ ),
135
+ gr.Slider(
136
+ label="Top-p",
137
+ info="Une valeur Γ©levΓ©e permet d'explorer plus d'alternatives",
138
+ value=default_top_p,
139
+ minimum=0.9,
140
+ maximum=1.0,
141
+ step=0.01,
142
+ interactive=True,
143
+ ),
144
+ ]
145
+
146
+ STREAMING = True
147
 
148
  print("Loading model...")
149
 
 
169
  assert len(newspk_token_id) == 1
170
  newspk_token_id = newspk_token_id[0]
171
 
172
+ tokenizer.add_special_tokens({"eos_token": "["})
173
+
174
+ user_internal_tag = "[Intervenant 1:]"
175
+ bot_internal_tag = "[Intervenant 2:]"
176
+ device = "cuda" if torch.cuda.is_available() else "cpu"
177
+
178
+
179
+ @spaces.GPU
180
+ def generate(
181
+ user_message,
182
+ conversation_history=[],
183
+ generate_several_turns=False,
184
+ bot_message_start="",
185
+ # user_surname="",
186
+ max_new_tokens=default_max_new_tokens,
187
+ temperature=default_temperature,
188
+ repetition_penalty=default_repetition_penalty,
189
+ top_k=default_top_k,
190
+ top_p=default_top_p,
191
+ user_surname="", # Experimental (TODO)
192
+ remove_unfinished_sentence=True,
193
+ ):
194
+ user_message = claire_text_preproc_message(user_message)
195
+ bot_message_start = claire_text_preproc_message(bot_message_start)
196
+
197
+ if user_surname:
198
+ user_surname = capitalize(collapse_whitespaces(re.sub(r"[^\p{L}\-\.']", " ", user_surname))).strip()
199
+ if user_surname:
200
+ user_tag = f"[{user_surname}:]"
201
+ bot_tag = f"[Claire:]"
202
+ else:
203
+ user_tag = user_internal_tag
204
+ bot_tag = bot_internal_tag
205
+
206
+ if conversation_history:
207
+ conversation_history = "\n".join(
208
+ [
209
+ f"{user_tag} {claire_text_preproc_message(user)}\n{bot_tag} {claire_text_preproc_message(bot) if bot else ''}"
210
+ for user, bot in conversation_history
211
+ ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212
  )
213
+ conversation_history = from_display_to_internal(conversation_history)
214
+ conversation_history = conversation_history.rstrip()
215
+ if conversation_history:
216
+ conversation_history += "\n"
217
+ else:
218
+ conversation_history = ""
219
+ if not bot_message_start:
220
+ bot_message_start = ""
221
+
222
+ # Combine the user and bot messages into a conversation
223
+ conversation = f"{conversation_history}{user_tag} {user_message}\n{bot_tag} {bot_message_start}".strip()
224
+ conversation = remove_empty_turns(conversation)
225
+
226
+ # Encode the conversation using the tokenizer
227
+ input_ids = tokenizer.encode(
228
+ conversation, return_tensors="pt", add_special_tokens=True
229
+ )
230
+ input_ids = input_ids.to(device)
231
+
232
+ skip_special_tokens = not generate_several_turns
233
+
234
+ if STREAMING:
235
+ streamer = transformers.TextIteratorStreamer(
236
+ tokenizer,
237
+ timeout=10.0,
238
+ skip_prompt=True,
239
+ skip_special_tokens=skip_special_tokens,
240
+ clean_up_tokenization_spaces=False,
241
  )
242
+ else:
243
+ streamer = None
244
+
245
+ # Generation parameters
246
+ generate_kwargs = dict(
247
+ input_ids=input_ids,
248
+ streamer=streamer,
249
+ eos_token_id=eos_token_id if generate_several_turns else newspk_token_id,
250
+ pad_token_id=eos_token_id,
251
+ do_sample=True,
252
+ max_new_tokens=max_new_tokens,
253
+ temperature=temperature,
254
+ repetition_penalty=repetition_penalty,
255
+ top_k=top_k,
256
+ top_p=top_p,
257
+ num_beams=1,
258
+ # use_cache=False,
259
+ # early_stopping=False,
260
+ )
261
+ if STREAMING:
262
+ t = Thread(target=model.generate, kwargs=generate_kwargs)
263
+ t.start()
264
+
265
+ outputs = []
266
+ if bot_message_start.strip():
267
+ yield bot_message_start
268
+ for token in streamer:
269
+ # Ignore line breaks
270
+ if not generate_several_turns and re.match(r"\s*\n$", token):
271
+ continue
272
+ outputs.append(token)
273
+ text = bot_message_start + from_internal_to_display("".join(outputs))
274
+ yield text
275
+ else:
276
+ output_ids = model.generate(**generate_kwargs)
277
+ output_ids = output_ids[0][len(input_ids[0]) :]
278
+ text = tokenizer.decode(output_ids, skip_special_tokens=skip_special_tokens)
279
+ if bot_message_start.strip():
280
+ bot_message_start = bot_message_start.strip() + " "
281
+
282
+ text = bot_message_start + from_internal_to_display(text.rstrip("[").strip())
283
+ yield text
284
+
285
+ if generate_several_turns:
286
+ if remove_unfinished_sentence:
287
+ yield remove_last_unfinished_sentence(text)
 
288
  else:
289
+ yield remove_last_unfinished_turn(text)[0]
 
 
 
 
 
 
 
 
290
 
291
 
292
  def claire_text_preproc_message(text):
293
  text = format_special_characters(text)
294
+ text = collapse_whitespaces(text)
295
  text = replace_brackets(text)
296
  return text
297
 
298
 
299
+ def collapse_whitespaces(text):
 
 
 
 
 
 
 
 
300
  text = re.sub(r"\s+", " ", text)
301
  text = re.sub(r" ([\.,])", r"\1", text)
302
  return text.lstrip().rstrip(" ")
 
329
  return text
330
 
331
 
332
+ user_name = "[Vous:]"
333
+ bot_name = "[Bot:]"
334
 
 
 
 
335
 
336
+ def from_internal_to_display(text):
337
+ for before, after in [
338
+ (user_internal_tag, user_name),
339
+ (bot_internal_tag, bot_name),
340
+ ]:
341
+ text = text.replace(before, after)
342
+ return text
343
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
344
 
345
+ def from_display_to_internal(text):
346
+ for before, after in [
347
+ (user_name, user_internal_tag),
348
+ (bot_name, bot_internal_tag),
349
+ ]:
350
+ text = text.replace(before, after)
351
+ return text
352
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
353
 
354
+ def remove_last_unfinished_sentence(text):
355
+ text, removed_turn = remove_last_unfinished_turn(text)
356
+ if removed_turn:
357
+ return text
358
+ line_breaks = [u.span(0)[0] for u in re.finditer("\n", text)]
359
+ remove_last_sentence = True
360
+ if len(line_breaks) >= 1 and len(text[line_breaks[-1]:].split("]")[-1]) < 15:
361
+ text = text[: line_breaks[-1]]
362
+ line_breaks.pop(-1)
363
+ remove_last_sentence = False
364
+ if remove_last_sentence and len(line_breaks) >= 1:
365
+ sentence_ends = [u.span(0)[0] for u in re.finditer(r"[\.!?]", text)]
366
+ sentence_ends = [p for p in sentence_ends if p > line_breaks[-1]]
367
+ if sentence_ends:
368
+ text = text[: sentence_ends[-1] + 1]
369
+ else:
370
+ phrase_ends = [u.span(0)[0] for u in re.finditer(r"[,;]", text)]
371
+ phrase_ends = [p for p in phrase_ends if p > line_breaks[-1]]
372
+ if phrase_ends:
373
+ text = text[: phrase_ends[-1] + 1]
374
+ return text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
375
 
376
+
377
+ def remove_last_unfinished_turn(text):
378
+ starts = [u.span(0)[0] for u in re.finditer(r"\[", text)]
379
+ did_it = False
380
+ if starts and "]" not in text[starts[-1] :]:
381
+ text = text[: starts[-1]]
382
+ did_it = True
383
+ return text.rstrip(), did_it
384
+
385
+
386
+ def remove_empty_turns(text):
387
+ while re.search(_empty_turn, text):
388
+ # Remove empty turns
389
+ text = re.sub(_empty_turn, r"\1", text)
390
+ # Remove same speaker speaking twice
391
+ text = re.sub(_repeated_turn, r"\1 \2", text)
392
+ return text
393
+
394
+ _speaker_regex = r"\[[^\]]+:\]"
395
+ _empty_turn = re.compile(_speaker_regex + r"[^\p{L}]*" + "(" + _speaker_regex + ")")
396
+ _repeated_turn = re.compile(r"(" + _speaker_regex + r") ([^\[]*)\s\1")
397
+
398
+
399
+ def capitalize(text):
400
+ # michel JR claude-marie -> Michel JR Claude-Marie
401
+ words = text.split(" ")
402
+ words = [w.capitalize() if (not w.isupper() or len(w)>2) else w for w in words]
403
+ for i, w in enumerate(words):
404
+ for sep in "-", "'":
405
+ if sep in w:
406
+ words[i] = sep.join([x.capitalize() if not x.isupper() else x for x in w.split(sep)])
407
+ return " ".join(words)
408
+
409
+ # # Test
410
+ # list(generate(*(examples[0][:1] + [[]] + examples[0][1:])))
411
+
412
+
413
+ chat_interface = gr.ChatInterface(
414
+ fn=generate,
415
  title=title,
416
  description=description,
417
+ chatbot=gr.Chatbot(label=chatbot_label),
418
+ textbox=textbox,
419
  examples=examples,
420
+ additional_inputs=additional_inputs,
421
+ additional_inputs_accordion_name=additional_inputs_name,
422
+ autofocus=False,
423
+ **gradio_buttons,
424
  )
425
 
426
+ if __name__ == "__main__":
427
+ print("Launching chat...")
428
+ with gr.Blocks(css="style.css") as demo:
429
+ chat_interface.render()
430
+ demo.queue(max_size=20).launch()
requirements.txt CHANGED
@@ -1,11 +1,11 @@
1
- optimum
2
  accelerate==0.24.1
3
  bitsandbytes==0.41.1
4
- gradio==4.1.1
 
5
  protobuf==3.20.3
6
  scipy==1.11.2
7
  sentencepiece==0.1.99
8
- spaces==0.18.0
9
- torch==2.0.0
 
10
  transformers==4.35.0
11
- transformers_stream_generator==0.0.4
 
 
1
  accelerate==0.24.1
2
  bitsandbytes==0.41.1
3
+ gradio==4.4.1
4
+ optimum==1.14.1
5
  protobuf==3.20.3
6
  scipy==1.11.2
7
  sentencepiece==0.1.99
8
+ #spaces==0.18.0
9
+ spaces
10
+ torch==2.0.1
11
  transformers==4.35.0
 
style.css ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ /* h1 {
2
+ text-align: center;
3
+ }
4
+
5
+ .contain {
6
+ max-width: 900px;
7
+ margin: auto;
8
+ padding-top: 1.5rem;
9
+ } */