Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information Retrieval・Medical Multimodal NLP (πŸ–Ό+πŸ“) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

View all activity

Organizations

None yet

Posts 31

view post
Post
247
πŸ“’ If you're aimed at processing spreadsheet data with LLM Chain-of-Thought technique, then this update might be valuable for you πŸ’Ž

The updated πŸ“¦ bulk-chain-0.24.2 which is aimed at iterative processing of CSV/JSONL data with no-string dependencies from third party LLM frameworks is out πŸŽ‰

πŸ“¦: https://pypi.org/project/bulk-chain/0.24.2/
🌟: https://github.com/nicolay-r/bulk-chain
πŸ“˜: https://github.com/nicolay-r/bulk-chain/issues/26

The key feature of bulk-chain is SQLite caching that saves your time ⏰️ and money πŸ’΅ by guarantee no-data-lost on using remote LLM providers such as OpenAI, ReplicateIO, OpenRouter, etc.

πŸ”§ This release has the following updates:
βœ… Now I am using a separater iterator tiny package source-iter
βœ… You can manually setup amount of attempts to continue in case of the lost connection.
βœ… other minor improvements.

Quick start on GoogleColab:
πŸ“™: https://colab.research.google.com/github/nicolay-r/bulk-chain/blob/master/bulk_chain_tutorial.ipynb

#reasoning #bulk #sqlite3 #chainofthought #cot #nlp #pipeline #nostrings #processing #data #dynamic #llm
view post
Post
526
πŸ“’ If you were earlier interested in quick translator application for bunch of texts with spans of fixed parts that tolerant for translation, then this post might be relevant! Delighted to share a bulk_translate -- a framework for automatic texts translation with the pre-anotated fixed spans.

πŸ“¦ https://pypi.org/project/bulk-translate/
🌟 https://github.com/nicolay-r/bulk-translate

πŸ”‘ Spans allows you to control your objects in texts, so that objects would be tollerant to translator. By default it provides implementation for GoogleTranslate.

bulk_translate features:
βœ… Native Implementation of two translation modes:
- fast-mode: exploits extra chars for grouping text parts into single batch
- accurate: pefroms individual translation of each text part.
βœ… No strings: you're free to adopt any LM / LLM backend.
Support googletrans by default.

The initial release of the project supports fixed spans as text parts wrapped in square brackets [] with non inner space characters.

You can play with your data in CSV here on GoogleColab:
πŸ“’ https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb

πŸ‘ This project is based on AREkit 0.25.1 pipelines for deployment lm-based workflows:
https://github.com/nicolay-r/AREkit

datasets

None public yet