Summarization GRPO idea
Ight so Gosling wants to train a whole new model for summarization? i say nah.
Idea: Take long ctx RP logs (16~K ctx) logs from sources like https://huggingface.co/datasets/PocketDoc/Dans-Personamaxx-Logs or smth
and then using sentence transformers to compare a model's output (summary) of that log and compare it against the OG text
apparently
there's a way to do that for extractive vs abstractive summarization models. but my friend hasn't found the link. so smth like embeddings to compare the 2 texts for now...
anyway the verifier in this case is the embedding, if the summary is good and the embedding says so, reward dat hoe, if not, shoot dat hoe.
I'll probably do this later... but i really wanna do it......................................
for now i work on Sol-Revear, my new line of models..... and it's going... shittly