BigPicaro Ideas-V1
Low Rank, High Alpha
I'll say this is rooted in a message that a person in Anthracite said that they "believed" in. Maybe it's foolish but pretty much this:
Actually, i am still a firm believer that LORAs with a low rank and high alphas are still the best way
Now how the hell do we apply that to BigPicaro? It's shrimple. Basically right now we have an issue of the model not being "fit" enough on the data itself, what it does is resort back to it's previous Instruct capabilities (which are sloppa) which can be shown in the BigPicaro 70B attempt, It was literally just LLama Instruct 3.3.
Now i have to finetune Qwen2, A way dumber model IMO then 3.3 - What i propose is flipping the usual formula i use for Loras which is
or
How does this make sense? what is alpha doing at such a high intensity that can help the model fit the data better? Higher alpha means we have a higher adaption signal, meaning the updates from our data is more likely to have some sorta influence. it's kinda like if you turned up the volume of what your model learns from your data. Secondly, Low rank should probably be helpful in this case because what can be seen if you narrow your low rank is that the model will often times focus on the major patterns in the data. and that combined with high alpha will have a higher impact on how the model will behave.
To be continued when i can get motivation again, Still need to test 4 epoch qwen2