No.14580
____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14581
Very cute Miku.
Also virtual vtuber TTS/STT inference engine:
https://github.com/fagenorn/handcrafted-persona-engine
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14583
>>14581
I saw that but haven't used it yet. Have you tried it out? It seems like it could be pretty sweet
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14629
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14631
>>14580
Oh hell yeah, finally off sharty.
Anyway, looks like despite the disappointing release people kept actually working on Sesame - supposedly this implimentation reaches realtime when using llama3-1b as the LLM for textgen.
Seems like there's no real reason you couldn't drop an actually decent model in there for textgen too, so long as it used the llama3 instruct format.
https://github.com/davidbrowne17/csm-streaming
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14632
What's currently the best model for erp?
I tried the R1 distilled Qwen-32B a couple months ago and it was the most vanilla not-even-softcore bs imaginable.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14633
>>14632
There's not really wide agreement on what the best ERP model is between the 12B and 658B range.
In the 32Bish weight class I actually dropped down to the mistral small 22/24B derivatives like Cydonia, though some people could bear the speeds of using QwQ, but despite the fairly good quality I couldn't handle waiting for 500-1000 tokens of reasoning before the response - you might be able to.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14635
>>14633
Best model I ever used for it was some uncensored llama-70B (1st gen) model 2 years ago. Every newer model got worse and worse. Yes it was slow since it didn't fully fit on GPU.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14640
>>14635
The llama 3.3 70B finetunes are pretty decent, they're currently in my regular rotation of models to keep it fresh which is:
Anubis 70b
Nevoria 70b
EVA Llama
EVA Qwen 72b
Mistral Large 123b
I didn't have the hardware for it at the time, might give Llama1-70b a spin just to see how it goes.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14641
>>14640
It was an uncensored finetune, not the vanilla. Thanks for hte suggestions.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14654
After trying all the new models and enduring the shit shower of slop and tasteless retarded prose, I decided to give mixtral a try. I used to discard it as an "old" model but the apparent nemo supremacy pushed me to give it a chance. And what do you know, it's the best model I've ever tried. Better than all the llamas at high quant, better than largestral (although i only tested behemoth which was sloppy and slow) It writes like I would do, takes clues and is generally pretty smart, it doesn't make the mistakes I got used to from modern models and it has literally zero slop. And we are talking about the base model here. It's all I ever dared to hope qwen3 would be, considering how bad qwq was for such purposes. I can run mixtral at q8 with offloading and get ~7 tps.
I used this template https://github.com/inflatebot/SillyTavern-Mistral-Templates/blob/main/Instruct/Mistral%20V1-Instruct.json
Is it because of some modern alignment, or synthetic data, or filtering? I think mistral small claimed to do none of those and yet it's highly slopped.
tl;dr mixtral will save local
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14657
>>14654
>mistral small claimed to do none of those and yet it's highly slopped.
Which mistral small? Because the newer 24B has severe problems (that are only apparent in multiturn use, like RP), but the older 22B was more or less fine.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14658
>>14657
there appears to be at least 3, could you link the good one?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14659
>>14658
https://huggingface.co/mistralai/Mistral-Small-Instruct-2409
This one, granted I pretty much always used finetunes, and you're definitely going to prefer Mixtral, since each of it's 8 experts are the same size as Small 2409.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14660
>>14658
Also, meant to add earlier.
You're using a prompt template that should be included in SillyTavern already.. And you're using the wrong one. Mixtral uses MistralV2, not V1.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14661
>>14660
holy shit. i just realized mixtral 8x22 is a thing, all the experience came from running 8x7. I can only fit the larger one in q4, worth it?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14664
>>14661
You running on PC? That's 88 GB!
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14665
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14666
>>14664
>88GB
it's also a moe. 3.3 tps on the q4 larger mixtral, and so far it's worse than 8x7 q8. It even referred to me (anon) as ant once. fucking ant. it turns out that quantization isn't just free memory savings, who knew. or 8x22 is just much worse for rp.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14670
>>14580
man where is 4chan
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14672
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14675
>Looking Beyond The Next Token
>https://arxiv.org/abs/2504.11336
> The structure of causal language model training assumes that each token can be accurately predicted from the previous context. This contrasts with humans' natural writing and reasoning process, where goals are typically known before the exact argument or phrasings. While this mismatch has been well studied in the literature, the working assumption has been that architectural changes are needed to address this mismatch. We argue that rearranging and processing the training data sequences can allow models to more accurately imitate the true data-generating process, and does not require any other changes to the architecture or training infrastructure. We demonstrate that this technique, Trelawney, and the inference algorithms derived from it allow us to improve performance on several key benchmarks that span planning, algorithmic reasoning, and story generation tasks. Finally, our method naturally enables the generation of long-term goals at no additional cost. We investigate how using the model's goal-generation capability can further improve planning and reasoning. Additionally, we believe Trelawney could potentially open doors to new capabilities beyond the current language modeling paradigm.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14677
>>14661
Oh lol, it didn't even occur to me that you were talking about 8x7, my bad.
Sorry it didn't work out for you anon.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14680
>>14675
>Concretely, we augment the training corpus by interleaving it
with special lookahead tokens — <T> and </T> — that encapsulate future information
>By first
predicting tokens in the future, the model is encouraged to learn the tokens pertaining
to what it will generate in the future (i.e., F’G’), and the path leading to the future (i.e.,
CDE) as well as the actual future (i.e., FG) will be easier to predict
We reasoning in latent space now, fuck all yall fakeass printed reasoning token niggas.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14688
So where do lmg/sdg/ldg refugees congegrate to? 8chan seems much slower pace.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14697
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14699
>>14688
There's two other threads on Sharty and 8chan.moe
https://8chan.moe/ais/res/6258.html
https://www.soyjak.st/tech/thread/4858.html
There's been attempts on other sites, too, but I think they're just 1 guy.
Apparently we're all such contrarian assholes that we can't agree on a bunker even after 5 days.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14729
>>14688
i miss bullying d*b*
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.14769
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.