QwQ Hamanasu Train
Before i start, thank you so much to Ruka for funding all of this <3
after 5 days of training. Millions of crashed runs. I'm pretty proud to announce some of the first(?) proper QwQ finetunes. It's 4. Finetunes actually. I've started to adopt a multi-step process for finetuning my models, Here's the gist of it:
Pretrain
For 25 Hours on 4xH100s, A text-completion/pretrain was done on QwQ with 1B~ Tokens worth of Stories/Books: This version still retained thinking and conversational skills aswell as being less censored, This became Hamanasu-V1-QwQ
Instruct
Switching to 8XH100s, I patched the damage done by pretraining the model by training it on Instruct data - 5 Hours later - The instruct was done, It was still uncensored and it now had more coherency in long context. The prose was drier and the responses were terse and something strange happened. Due to me doing a full parameter finetune, Even at a low LR, even with reasoning data in the darn train. It just... stopped thinking??? For no reason. I tried everything and nope. it just refused to think. I started to realize "Wait this is kinda neat... I like it when it doesn't waste 1000 tokens looping" and thus i solidered on.
RP(Brain rot train)
Over the past few months i'd been collecting as much... conversational human data as i could find. I'd always dreamed of having an AI chatbot that well... sounded human? It texted like a regular human rather then talk like extremely fancy and be boring. I got about 10M tokens worth of it and another long deadly 25 hours on 8XH100s, The beauty was born. The model talks like a guy who lost their meds. Going from a humorous fellow to a raging lunatic. It's... perfect.
Magnum
Now that my personal dream was fulfilled, I thought "Why not go back to my roots and do a Magnum train. it's been a while." and slapped my GPUs till they started working. 8 hours pass. I look at my Wandb logs... "Why in the name of god is there a massive drop in loss around the 2nd epoch" I started to look into it andddd i found this. https://www.fast.ai/posts/2023-09-04-learning-jumps, Pretty much I'd majorly overfitted the model, and it was almost done. I couldn't cancel it so i just sat back and tested it. Seemingly it was coherent, Writing well even... Untillllllllllll - Yep, It's looping the context back to me. Nice job. Back to the drawing board, I use an even lower LR of 6e-6, and finally the model seems almost perfect.
TLDR: Alot GPU time, I somehow stopped a model from thinking, and 2 models to really show for it, Magnum-QwQ and Hamanasu-RP
So for everyone that asked when V4 Magnum dropped "Where's a Qwen2.5 32B version?" - Here you go. a 32B sized Qwen based finetune with Magnum, and it's really really really funny and dumb chat counterpart.
I've open-sourced all of my work. You can find all of the model weights here: https://huggingface.co/collections/Delta-Vector/hamanasu-67aa9660d18ac8ba6c14fffa