FINGU-AI commited on
Commit
d7d9471
1 Parent(s): 2681971

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -157
README.md CHANGED
@@ -251,139 +251,7 @@ You can finetune this model on your own dataset.
251
  *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
252
  -->
253
 
254
- ## Training Details
255
-
256
- ### Training Hyperparameters
257
- #### Non-Default Hyperparameters
258
-
259
- - `eval_strategy`: steps
260
- - `per_device_eval_batch_size`: 4
261
- - `gradient_accumulation_steps`: 4
262
- - `learning_rate`: 2e-05
263
- - `max_steps`: 1500
264
- - `lr_scheduler_type`: cosine
265
- - `warmup_ratio`: 0.1
266
- - `warmup_steps`: 5
267
- - `bf16`: True
268
- - `tf32`: True
269
- - `optim`: adamw_torch_fused
270
- - `gradient_checkpointing`: True
271
- - `gradient_checkpointing_kwargs`: {'use_reentrant': False}
272
- - `batch_sampler`: no_duplicates
273
-
274
- #### All Hyperparameters
275
- <details><summary>Click to expand</summary>
276
-
277
- - `overwrite_output_dir`: False
278
- - `do_predict`: False
279
- - `eval_strategy`: steps
280
- - `prediction_loss_only`: True
281
- - `per_device_train_batch_size`: 8
282
- - `per_device_eval_batch_size`: 4
283
- - `per_gpu_train_batch_size`: None
284
- - `per_gpu_eval_batch_size`: None
285
- - `gradient_accumulation_steps`: 4
286
- - `eval_accumulation_steps`: None
287
- - `learning_rate`: 2e-05
288
- - `weight_decay`: 0.0
289
- - `adam_beta1`: 0.9
290
- - `adam_beta2`: 0.999
291
- - `adam_epsilon`: 1e-08
292
- - `max_grad_norm`: 1.0
293
- - `num_train_epochs`: 3.0
294
- - `max_steps`: 1500
295
- - `lr_scheduler_type`: cosine
296
- - `lr_scheduler_kwargs`: {}
297
- - `warmup_ratio`: 0.1
298
- - `warmup_steps`: 5
299
- - `log_level`: passive
300
- - `log_level_replica`: warning
301
- - `log_on_each_node`: True
302
- - `logging_nan_inf_filter`: True
303
- - `save_safetensors`: True
304
- - `save_on_each_node`: False
305
- - `save_only_model`: False
306
- - `restore_callback_states_from_checkpoint`: False
307
- - `no_cuda`: False
308
- - `use_cpu`: False
309
- - `use_mps_device`: False
310
- - `seed`: 42
311
- - `data_seed`: None
312
- - `jit_mode_eval`: False
313
- - `use_ipex`: False
314
- - `bf16`: True
315
- - `fp16`: False
316
- - `fp16_opt_level`: O1
317
- - `half_precision_backend`: auto
318
- - `bf16_full_eval`: False
319
- - `fp16_full_eval`: False
320
- - `tf32`: True
321
- - `local_rank`: 0
322
- - `ddp_backend`: None
323
- - `tpu_num_cores`: None
324
- - `tpu_metrics_debug`: False
325
- - `debug`: []
326
- - `dataloader_drop_last`: True
327
- - `dataloader_num_workers`: 0
328
- - `dataloader_prefetch_factor`: None
329
- - `past_index`: -1
330
- - `disable_tqdm`: False
331
- - `remove_unused_columns`: True
332
- - `label_names`: None
333
- - `load_best_model_at_end`: False
334
- - `ignore_data_skip`: False
335
- - `fsdp`: []
336
- - `fsdp_min_num_params`: 0
337
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
338
- - `fsdp_transformer_layer_cls_to_wrap`: None
339
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
340
- - `deepspeed`: None
341
- - `label_smoothing_factor`: 0.0
342
- - `optim`: adamw_torch_fused
343
- - `optim_args`: None
344
- - `adafactor`: False
345
- - `group_by_length`: False
346
- - `length_column_name`: length
347
- - `ddp_find_unused_parameters`: None
348
- - `ddp_bucket_cap_mb`: None
349
- - `ddp_broadcast_buffers`: False
350
- - `dataloader_pin_memory`: True
351
- - `dataloader_persistent_workers`: False
352
- - `skip_memory_metrics`: True
353
- - `use_legacy_prediction_loop`: False
354
- - `push_to_hub`: False
355
- - `resume_from_checkpoint`: None
356
- - `hub_model_id`: None
357
- - `hub_strategy`: every_save
358
- - `hub_private_repo`: False
359
- - `hub_always_push`: False
360
- - `gradient_checkpointing`: True
361
- - `gradient_checkpointing_kwargs`: {'use_reentrant': False}
362
- - `include_inputs_for_metrics`: False
363
- - `eval_do_concat_batches`: True
364
- - `fp16_backend`: auto
365
- - `push_to_hub_model_id`: None
366
- - `push_to_hub_organization`: None
367
- - `mp_parameters`:
368
- - `auto_find_batch_size`: False
369
- - `full_determinism`: False
370
- - `torchdynamo`: None
371
- - `ray_scope`: last
372
- - `ddp_timeout`: 1800
373
- - `torch_compile`: False
374
- - `torch_compile_backend`: None
375
- - `torch_compile_mode`: None
376
- - `dispatch_batches`: None
377
- - `split_batches`: None
378
- - `include_tokens_per_second`: False
379
- - `include_num_input_tokens_seen`: False
380
- - `neftune_noise_alpha`: None
381
- - `optim_target_modules`: None
382
- - `batch_eval_metrics`: False
383
- - `batch_sampler`: no_duplicates
384
- - `multi_dataset_batch_sampler`: proportional
385
 
386
- </details>
387
 
388
  ### Training Logs
389
  | Epoch | Step | Training Loss | retrival loss |
@@ -391,31 +259,8 @@ You can finetune this model on your own dataset.
391
  | 0.6466 | 500 | 0.0424 | 0.0060 |
392
 
393
 
394
- ### Framework Versions
395
- - Python: 3.10.12
396
- - Sentence Transformers: 3.0.1
397
- - Transformers: 4.41.2
398
- - PyTorch: 2.2.0+cu121
399
- - Accelerate: 0.32.1
400
- - Datasets: 2.20.0
401
- - Tokenizers: 0.19.1
402
-
403
- ## Citation
404
-
405
- ### BibTeX
406
-
407
- #### Sentence Transformers
408
- ```bibtex
409
- @inproceedings{reimers-2019-sentence-bert,
410
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
411
- author = "Reimers, Nils and Gurevych, Iryna",
412
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
413
- month = "11",
414
- year = "2019",
415
- publisher = "Association for Computational Linguistics",
416
- url = "https://arxiv.org/abs/1908.10084",
417
- }
418
- ```
419
 
420
  <!--
421
  ## Glossary
 
251
  *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
252
  -->
253
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
254
 
 
255
 
256
  ### Training Logs
257
  | Epoch | Step | Training Loss | retrival loss |
 
259
  | 0.6466 | 500 | 0.0424 | 0.0060 |
260
 
261
 
262
+
263
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
264
 
265
  <!--
266
  ## Glossary