/g/ -/lmg/ Local Models General

Name
Email
Subject	REC STOP
Comment *
File	Select/drop/paste files here
Password	(Randomized for file and post deletion; you may also set your own.)
Archive	Archive [500 char limit]
* = required field	[▶Show post options & limits] Confused? See the FAQ.

File: 537ac80383c874e⋯.jpg (63.91 KB,850x627,850:627,1732650311767260.jpg)

/lmg/ Local Models General - Bunker Edition Anonymous 04/17/25 (Thu) 23:42:08 No.14580

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: :(

►News

>(4/17) FB Vision Research Project https://github.com/facebookresearch/perception_models

>(4/17) Granite-3 STT https://www.ibm.com/new/announcements/ibm-granite-3-3-speech-recognition-refined-reasoning-rag-loras

>(4/16) IBM Releases Granite 3.3 Models https://huggingface.co/collections/ibm-granite/granite-33-language-models-67f65d0cca24bcbd1d3a08e3

>(4/16) Geo-Guesser Benchmark released: https://geobench.org/

>(04/14) GLM-4-0414 and GLM-Z1 released: https://hf.co/collections/THUDM/glm-4-0414-67f3cbcb34dd9d252707cb2e

>(04/14) Nemotron-H hybrid models released: https://hf.co/collections/nvidia/nemotron-h-67fd3d7ca332cdf1eb5a24bb

►News Archive: https://rentry.org/lmg-news-archive

►Glossary: https://rentry.org/lmg-glossary

►Links: https://rentry.org/LocalModelsLinks

►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started

https://rentry.org/lmg-lazy-getting-started-guide

https://rentry.org/lmg-build-guides

https://rentry.org/IsolatedLinuxWebService

https://rentry.org/tldrhowtoquant

►Further Learning

https://rentry.org/machine-learning-roadmap

https://rentry.org/llm-training

https://rentry.org/LocalModelsPapers

https://github.com/loganthorneloe/ml-roadmap

►Benchmarks

LiveBench: https://livebench.ai

Programming: https://livecodebench.github.io/leaderboard.html

Code Editing: https://aider.chat/docs/leaderboards

Context Length: https://github.com/hsiehjackson/RULER

Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard

Censorbench: https://codeberg.org/jts2323/censorbench

GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools

Alpha Calculator: https://desmos.com/calculator/ffngla98yc

GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines

https://github.com/lmg-anon/mikupad

https://github.com/oobabooga/text-generation-webui

https://github.com/LostRuins/koboldcpp

https://github.com/ggerganov/llama.cpp

https://github.com/theroyallab/tabbyAPI

https://github.com/vllm-project/vllm

____________________________

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/17/25 (Thu) 23:56:38 No.14581

File: 33be63ed64011c1⋯.png (1.65 MB,768x1344,4:7,ClipboardImage.png)

Very cute Miku.

Also virtual vtuber TTS/STT inference engine:

https://github.com/fagenorn/handcrafted-persona-engine

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/18/25 (Fri) 02:19:15 No.14583

File: 041634e4ecb6013⋯.png (2.99 MB,1024x1536,2:3,1743569780802864.png)

>>14581

I saw that but haven't used it yet. Have you tried it out? It seems like it could be pretty sweet

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 02:27:53 No.14629

File: c18c16aa6fff49f⋯.jpg (180.35 KB,872x1272,109:159,1738629256853919.jpg)

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 03:13:50 No.14631

>>14580

Oh hell yeah, finally off sharty.

Anyway, looks like despite the disappointing release people kept actually working on Sesame - supposedly this implimentation reaches realtime when using llama3-1b as the LLM for textgen.

Seems like there's no real reason you couldn't drop an actually decent model in there for textgen too, so long as it used the llama3 instruct format.

https://github.com/davidbrowne17/csm-streaming

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 03:20:49 No.14632

What's currently the best model for erp?

I tried the R1 distilled Qwen-32B a couple months ago and it was the most vanilla not-even-softcore bs imaginable.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 03:25:38 No.14633

>>14632

There's not really wide agreement on what the best ERP model is between the 12B and 658B range.

In the 32Bish weight class I actually dropped down to the mistral small 22/24B derivatives like Cydonia, though some people could bear the speeds of using QwQ, but despite the fairly good quality I couldn't handle waiting for 500-1000 tokens of reasoning before the response - you might be able to.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 03:41:04 No.14635

>>14633

Best model I ever used for it was some uncensored llama-70B (1st gen) model 2 years ago. Every newer model got worse and worse. Yes it was slow since it didn't fully fit on GPU.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 04:08:08 No.14640

>>14635

The llama 3.3 70B finetunes are pretty decent, they're currently in my regular rotation of models to keep it fresh which is:

Anubis 70b

Nevoria 70b

EVA Llama

EVA Qwen 72b

Mistral Large 123b

I didn't have the hardware for it at the time, might give Llama1-70b a spin just to see how it goes.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 04:11:54 No.14641

>>14640

It was an uncensored finetune, not the vanilla. Thanks for hte suggestions.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 11:08:42 No.14654

File: 58dbf550fa77877⋯.png (711.96 KB,1281x1395,427:465,58d_1019466204.png)

After trying all the new models and enduring the shit shower of slop and tasteless retarded prose, I decided to give mixtral a try. I used to discard it as an "old" model but the apparent nemo supremacy pushed me to give it a chance. And what do you know, it's the best model I've ever tried. Better than all the llamas at high quant, better than largestral (although i only tested behemoth which was sloppy and slow) It writes like I would do, takes clues and is generally pretty smart, it doesn't make the mistakes I got used to from modern models and it has literally zero slop. And we are talking about the base model here. It's all I ever dared to hope qwen3 would be, considering how bad qwq was for such purposes. I can run mixtral at q8 with offloading and get ~7 tps.

I used this template https://github.com/inflatebot/SillyTavern-Mistral-Templates/blob/main/Instruct/Mistral%20V1-Instruct.json

Is it because of some modern alignment, or synthetic data, or filtering? I think mistral small claimed to do none of those and yet it's highly slopped.

tl;dr mixtral will save local

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 14:08:27 No.14657

>>14654

>mistral small claimed to do none of those and yet it's highly slopped.

Which mistral small? Because the newer 24B has severe problems (that are only apparent in multiturn use, like RP), but the older 22B was more or less fine.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 14:20:04 No.14658

>>14657

there appears to be at least 3, could you link the good one?

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 14:27:29 No.14659

>>14658

https://huggingface.co/mistralai/Mistral-Small-Instruct-2409

This one, granted I pretty much always used finetunes, and you're definitely going to prefer Mixtral, since each of it's 8 experts are the same size as Small 2409.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 14:30:55 No.14660

>>14658

Also, meant to add earlier.

You're using a prompt template that should be included in SillyTavern already.. And you're using the wrong one. Mixtral uses MistralV2, not V1.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 14:36:50 No.14661

>>14660

holy shit. i just realized mixtral 8x22 is a thing, all the experience came from running 8x7. I can only fit the larger one in q4, worth it?

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 16:12:27 No.14664

>>14661

You running on PC? That's 88 GB!

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 16:12:41 No.14665

>>14664

I mean CPU

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 16:28:40 No.14666

>>14664

>88GB

it's also a moe. 3.3 tps on the q4 larger mixtral, and so far it's worse than 8x7 q8. It even referred to me (anon) as ant once. fucking ant. it turns out that quantization isn't just free memory savings, who knew. or 8x22 is just much worse for rp.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 17:05:04 No.14670

>>14580

man where is 4chan

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 17:53:39 No.14672

File: a9220da907bb943⋯.png (820.89 KB,848x1024,53:64,1729966418724293.png)

Sad that the day after 4chan dies, a 'real' bitnet model gets released

>https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

>https://github.com/microsoft/BitNet

>https://techcrunch.com/2025/04/16/microsoft-researchers-say-theyve-developed-a-hyper-efficient-ai-model-that-can-run-on-cpus/

Gemma 3 QAT models:

>https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/

GeoBench benchmark

>https://geobench.org/

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 18:58:06 No.14675

>Looking Beyond The Next Token

>https://arxiv.org/abs/2504.11336

> The structure of causal language model training assumes that each token can be accurately predicted from the previous context. This contrasts with humans' natural writing and reasoning process, where goals are typically known before the exact argument or phrasings. While this mismatch has been well studied in the literature, the working assumption has been that architectural changes are needed to address this mismatch. We argue that rearranging and processing the training data sequences can allow models to more accurately imitate the true data-generating process, and does not require any other changes to the architecture or training infrastructure. We demonstrate that this technique, Trelawney, and the inference algorithms derived from it allow us to improve performance on several key benchmarks that span planning, algorithmic reasoning, and story generation tasks. Finally, our method naturally enables the generation of long-term goals at no additional cost. We investigate how using the model's goal-generation capability can further improve planning and reasoning. Additionally, we believe Trelawney could potentially open doors to new capabilities beyond the current language modeling paradigm.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 20:07:16 No.14677

>>14661

Oh lol, it didn't even occur to me that you were talking about 8x7, my bad.

Sorry it didn't work out for you anon.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/19/25 (Sat) 20:39:08 No.14680

>>14675

>Concretely, we augment the training corpus by interleaving it

with special lookahead tokens — <T> and </T> — that encapsulate future information

>By first

predicting tokens in the future, the model is encouraged to learn the tokens pertaining

to what it will generate in the future (i.e., F’G’), and the path leading to the future (i.e.,

CDE) as well as the actual future (i.e., FG) will be easier to predict

We reasoning in latent space now, fuck all yall fakeass printed reasoning token niggas.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/20/25 (Sun) 00:30:39 No.14688

So where do lmg/sdg/ldg refugees congegrate to? 8chan seems much slower pace.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/20/25 (Sun) 06:30:40 No.14697

>>14688

probably discords

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/20/25 (Sun) 10:11:29 No.14699

>>14688

There's two other threads on Sharty and 8chan.moe

https://8chan.moe/ais/res/6258.html

https://www.soyjak.st/tech/thread/4858.html

There's been attempts on other sites, too, but I think they're just 1 guy.

Apparently we're all such contrarian assholes that we can't agree on a bunker even after 5 days.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/22/25 (Tue) 23:01:36 No.14729

>>14688

i miss bullying d*b*

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 04/25/25 (Fri) 00:32:37 No.14769

>>14729

who?

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.