haijunlv commited on
Commit
0cb5b6e
·
verified ·
1 Parent(s): 5e4eefa

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -108,6 +108,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=1024, temperature=
108
  generated_ids = [
109
  output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
110
  ]
 
 
111
  response = tokenizer.batch_decode(generated_ids)[0]
112
  print(response)
113
  ```
@@ -153,6 +155,10 @@ Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.i
153
 
154
 
155
 
 
 
 
 
156
  #### vLLM inference
157
 
158
  We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
@@ -280,6 +286,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=8192)
280
  generated_ids = [
281
  output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
282
  ]
 
 
283
  response = tokenizer.batch_decode(generated_ids)[0]
284
  print(response)
285
  ```
@@ -308,6 +316,10 @@ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
308
  print(response)
309
  ```
310
 
 
 
 
 
311
  #### vLLM inference
312
 
313
  We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
@@ -369,7 +381,7 @@ The code is licensed under Apache-2.0, while model weights are fully open for ac
369
  InternLM3,即书生·浦语大模型第3代,开源了80亿参数,面向通用使用与高阶推理的指令模型(InternLM3-8B-Instruct)。模型具备以下特点:
370
 
371
  - **更低的代价取得更高的性能**:
372
- 在推理、知识类任务上取得同量级最优性能,超过Llama3.1-8B和Qwen2.5-7B. 值得关注的是InternLM3只用了4万亿词元进行训练,对比同级别模型训练成本节省75%以上。
373
  - **深度思考能力**:
374
  InternLM3支持通过长思维链求解复杂推理任务的深度思考模式,同时还兼顾了用户体验更流畅的通用回复模式。
375
 
@@ -445,6 +457,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=1024, temperature=
445
  generated_ids = [
446
  output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
447
  ]
 
 
448
  response = tokenizer.batch_decode(generated_ids)[0]
449
  print(response)
450
  ```
@@ -491,7 +505,12 @@ curl http://localhost:23333/v1/chat/completions \
491
 
492
 
493
 
 
 
 
 
494
  ##### vLLM 推理
 
495
  我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
496
 
497
  ```python
@@ -616,6 +635,8 @@ generated_ids = model.generate(tokenized_chat, max_new_tokens=8192)
616
  generated_ids = [
617
  output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
618
  ]
 
 
619
  response = tokenizer.batch_decode(generated_ids)[0]
620
  print(response)
621
  ```
@@ -644,6 +665,10 @@ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
644
  print(response)
645
  ```
646
 
 
 
 
 
647
  ##### vLLM 推理
648
 
649
  我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
 
108
  generated_ids = [
109
  output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
110
  ]
111
+ prompt = tokenizer.batch_decode(tokenized_chat)[0]
112
+ print(prompt)
113
  response = tokenizer.batch_decode(generated_ids)[0]
114
  print(response)
115
  ```
 
155
 
156
 
157
 
158
+ #### Ollama inference
159
+
160
+ TODO
161
+
162
  #### vLLM inference
163
 
164
  We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
 
286
  generated_ids = [
287
  output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
288
  ]
289
+ prompt = tokenizer.batch_decode(tokenized_chat)[0]
290
+ print(prompt)
291
  response = tokenizer.batch_decode(generated_ids)[0]
292
  print(response)
293
  ```
 
316
  print(response)
317
  ```
318
 
319
+ #### Ollama inference
320
+
321
+ TODO
322
+
323
  #### vLLM inference
324
 
325
  We are still working on merging the PR(https://github.com/vllm-project/vllm/pull/12037) into vLLM. In the meantime, please use the following PR link to install it manually.
 
381
  InternLM3,即书生·浦语大模型第3代,开源了80亿参数,面向通用使用与高阶推理的指令模型(InternLM3-8B-Instruct)。模型具备以下特点:
382
 
383
  - **更低的代价取得更高的性能**:
384
+ 在推理、知识类任务上取得同量级最优性能,超过Llama3.1-8B和Qwen2.5-7B。值得关注的是InternLM3只用了4万亿词元进行训练,对比同级别模型训练成本节省75%以上。
385
  - **深度思考能力**:
386
  InternLM3支持通过长思维链求解复杂推理任务的深度思考模式,同时还兼顾了用户体验更流畅的通用回复模式。
387
 
 
457
  generated_ids = [
458
  output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
459
  ]
460
+ prompt = tokenizer.batch_decode(tokenized_chat)[0]
461
+ print(prompt)
462
  response = tokenizer.batch_decode(generated_ids)[0]
463
  print(response)
464
  ```
 
505
 
506
 
507
 
508
+ ##### Ollama 推理
509
+
510
+ TODO
511
+
512
  ##### vLLM 推理
513
+
514
  我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装
515
 
516
  ```python
 
635
  generated_ids = [
636
  output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
637
  ]
638
+ prompt = tokenizer.batch_decode(tokenized_chat)[0]
639
+ print(prompt)
640
  response = tokenizer.batch_decode(generated_ids)[0]
641
  print(response)
642
  ```
 
665
  print(response)
666
  ```
667
 
668
+ ##### Ollama 推理
669
+
670
+ TODO
671
+
672
  ##### vLLM 推理
673
 
674
  我们还在推动PR(https://github.com/vllm-project/vllm/pull/12037) 合入vllm,现在请使用以下PR链接手动安装