Need Help to build a SmolLM2_360M_model.py

#2
by MartialTerran - opened

Help Needed: Building a Standalone Pytorch SmolLM2_360M_model.py Model.py

The HuggingFace Hub hosts the SmolLM2-360M model (HuggingFaceTB/SmolLM2-360M), but currently lacks a standalone PyTorch model.py file for loading, fine-tuning, and inference. This limits the model's usability outside the Hugging Face ecosystem.

I've started creating a SmolLM2_360M_model.py file (at https://huggingface.co/MartialTerran/SmolLM2_360M_model.py ) to address this gap, aiming for compatibility with all SmolLM2 models. The initial goal is to enable inference using the published weights and config. A successful PyTorch implementation would pave the way for exporting a TorchScript version, broadening accessibility to non-Python environments like microcontrollers, RISC-V machines, smartphones, and other edge devices.

The Challenge:

While my SmolLM2_360M_model.py ( https://huggingface.co/MartialTerran/SmolLM2_360M_model.py ) runs, it encounters problems loading the safetensors data. I'm receiving the following error:

# Insert the full error message here, including traceback. This will help others diagnose the problem quickly.

C:\Users\User\OneDrive\Desktop\SmolLM2>python SmolLM2_360M_model_debugging.py
Warning: SentencePiece not found, using rudimentary BPE tokenizer. Install SentencePiece for better performance.

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "C:\Users\User\OneDrive\Desktop\SmolLM2\SmolLM2_360M_model_debugging.py", line 470, in <module>
    model = SmolLM2_360M(config_path)
  File "C:\Users\User\OneDrive\Desktop\SmolLM2\SmolLM2_360M_model_debugging.py", line 243, in __init__
    self.embed_tokens = nn.Embedding(self.vocab_size, self.hidden_size)
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\sparse.py", line 142, in __init__
    self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\sparse.py:142: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:84.)
  self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
An error occurred while loading weights: File does not contain tensor lm_head.weight

Call to Action:

I'm seeking assistance from experienced PyTorch developers to debug the loading issue and complete the SmolLM2_360M_model.py implementation. Your contributions will significantly expand the potential applications of SmolLM2.

Specific Areas Where Help is Needed:

  • Safetensors Loading: Resolving the error encountered when loading the model weights from the safetensors file.
  • Model Architecture Verification: Confirming the correctness of the PyTorch model architecture based on the config file.
  • Inference Implementation: Ensuring the SmolLM2_360M_model.py can perform inference correctly.
  • Fine-tuning Support (Optional): Adding functionality for fine-tuning the SmolLM2_360M_model.py on downstream tasks.
  • TorchScript Export (Optional): Enabling export to TorchScript for deployment on resource-constrained devices.

How to Contribute:

  1. Fork the repository containing the SmolLM2_360M_model.py file: https://huggingface.co/MartialTerran/SmolLM2_360M_model.py
  2. Debug the code and implement the missing functionality.
  3. Submit a pull request with your changes.

By working together, we can make SmolLM2 more accessible and empower a wider range of users to leverage its capabilities. Thank you for your time and expertise!

P.S. Here's a technical breakdown of the process for creating a TorchScript version of the model and deploying it to various platforms:

1. TorchScript Creation:

  • Trace or Script: TorchScript offers two ways to convert your PyTorch model: tracing and scripting. Tracing records the operations performed on example inputs, creating a static graph. Scripting directly parses the model code, supporting control flow. Scripting is preferred if your model uses dynamic control flow.
# Tracing Example
example_input = torch.randn(1, 3, 224, 224)  # Example input
traced_model = torch.jit.trace(model, example_input)

# Scripting Example
scripted_model = torch.jit.script(model) 
  • Optimization (Optional): TorchScript provides optimization passes to improve the performance of the exported model.
optimized_model = torch.jit.optimize_for_inference(scripted_model)
  • Saving: Save the TorchScript model to a file.
torch.jit.save(optimized_model, "smolLM2_360m.pt")

2. Deployment to Target Environments:

  • C++: LibTorch, the C++ API for PyTorch, can load and execute TorchScript models. Integrate libTorch into your C++ application for microcontroller, RISC-V, or other edge device deployments. This typically involves compiling your C++ code and linking against libTorch.

  • Android/iOS: Use the respective PyTorch Mobile libraries for these platforms. These libraries offer optimized runtime environments for executing TorchScript models within mobile applications.

  • Other Edge Devices: Depending on the device and its capabilities, explore options like using a custom runtime, or if available, a cross-compilation toolchain to target the device from your development environment.

Example C++ Deployment (Simplified):

#include <torch/script.h>

int main() {
  // Load the TorchScript model
  torch::jit::script::Module module = torch::jit::load("smolLM2_360m.pt");

  // Prepare input tensor
  // ... (Device-specific input tensor preparation) ...

  // Run inference
  std::vector<torch::jit::IValue> inputs;
  inputs.push_back(input_tensor);  // Add input tensor(s)
  auto output = module.forward(inputs);

  // Process output
  // ... (Handle output tensor on the device) ...

  return 0;
}

Key Considerations:

  • Hardware Limitations: Microcontrollers and other edge devices have limited resources. Model size and complexity may need adjustments (quantization, pruning) for optimal performance.

  • Platform-Specific Tooling: Each target platform has its own build system and toolchain. Familiarize yourself with these tools for successful deployment.

  • Cross-Compilation: If building directly on the target device isn't feasible, cross-compilation is necessary. This typically involves setting up a cross-compilation toolchain for the target architecture.

  • Debugging: Debugging on edge devices can be challenging. Thoroughly testing the TorchScript model within a more accessible environment (e.g., your development machine) before deploying is essential.

This expanded explanation provides a more complete roadmap for creating and deploying TorchScript versions of the SmolLM2 model. Remember to consult the official PyTorch and LibTorch documentation for platform-specific instructions and best practices.

Sign up or log in to comment