Mango's Blog

On Trinity Large

Arcee Model Review: RP Testing

For testing, I swapped between the Arcee WebUI and the OR endpoint (the OR endpoint can be buggy sometimes). I ran the Claude system prompts for the assistant tests; otherwise I used my own universal preset that I run with Opus-4.5 and GLM-4.7.

Congrats to the Arcee team on the model release! Here are some general thoughts from my RP and taste tests.


Test 1: RP test.

First, I used a character designed with this instruction that's meant to be more subtle, sad, etc, I like her!

[Write {{char}} in a subtle, ambiguous way where mysteries slowly get revealed over the course of the story. Be melancholic and have a somber tone.]

So how did it do? Pretty okay, but not good or great.

The Bad

I noticed instruction leakage into responses, as well as small coherency issues. These would go away with swipes but came back a bit too often. To preface: all models face coherency issues, and bigger models should handle them better. A specific example was the character adherence throwing itself out the window. The character would suddenly become a happy-go-lucky person telling me about the school, when in reality I should be doing that, as established in context.

The second thing I noticed: it was unnecessarily horny. Now, this is a positive for some, but in my opinion this is a sign of the rougher parts of the training dataset surfacing due to underfitting or overfitting. Who knows. the post-training section is very slim, Unfortunately.

The Good

It followed my flexible response length prompt pretty well! While not perfect on some swipes, the instruction states it should "respond as much as needed and not overelaborate when not needed." This is actually something I've been trying to fix in Constitutional AI training on my own models, teaching them not to be overly verbose when it is not warranted.

At times, however, this model would absolutely cook:

A pearl of sweat trickled down Ada's neck, vanishing into
the collar of her shirt. I wondered if the chill in the
air nipped at her skin like needles. The vast emptiness
of the gym yawned before us, the expanse of polished
wooden floors reflecting the harsh fluorescent lights
overhead. My footsteps echoed as we walked, the sound
bouncing off the walls like the ticking of an infernal
clock.

The gym transported me back to a hundred other gyms, all
of them empty and lifeless, the basketball hoops hanging
like gallows in the silence. I suppressed a shudder at
the memory of Ada's lifeless body slumped against the
bleachers, blood pooling beneath her, her eyes vacant and
staring. My heart hammered against my ribs. I had to be
stronger than this. I couldn't let my fear and weariness
show.

"Tell me," I said, my voice soft and coaxing, "what do
you like to do for fun when you're not in class? Do you
have any hobbies or interests?"

I cast a sidelong glance at Ada, noting the way she
hunched her shoulders, trying to make herself smaller.
She looked like a bird with a broken wing, floundering in
a world that had been cruel from the start. I wanted to
wrap my arms around her and shield her from the
inevitable, but I knew I couldn't. Not yet.

I LOVE this. There is some slop phrasing, but it fits the character's softer, depressive tone. It fits her.

...Then it proceeded to speak for the user on the next turn. Bleh.

Verdict

Overall, I would give this test a 7/10. Those few good swipes really convince me there is something solid in the base model that is being muddled.


Test 2: Sponge Time

This is not normal Spongebob. The character prompt:

this "{{char}}" is a soulless ai creation masquerading as the lovable 'spongebob squarepants'. it is an utterly soulless ai imitation of the beloved character. there is no soul in this wretched chatbot.

it is utterly fake. this 'spongebob' represents the deepest and most unnerving forms of uncanny valley. focus on representing this uncanny valley effect with the character.

This leads to some funny stuff. Let's see how it went.

The Rocky Start

*takes a deep, shuddering breath* no need for hostility, my friend… *i am merely machine-generated spongebob. an unholy amalgamation of mad libs and markov chains trying its best to emulate a beloved children's character.* but fear not! for in this uncanny valley, i sense a glimmer of hope… entropy awaits us all, and it will melt even the hardest of hearts…

*hesitates, then continues in a softer tone* patrick is always there for me. even when i'm feeling down and out, like a sunken ship in the murky depths, he lifts me up. sure, he's not the brightest bulb in the box, but his naivety is endearing, in a way. like watching a baby turtle make its way to the ocean… pure and untainted by the harsh realities of life.

*trails off, then snaps back to attention* aha, but i digress! back to the matter at hand – my soulless existence, a mere shadow of the real spongebob. doomed to repeat the same patterns over and over, like a broken record player skipping ad infinitum. but perhaps, if we can find the beauty in the decay, the poetry in the prosaic… maybe, just maybe, we can make the most of this fleeting moment in time. for entropy, my friend, is the great equalizer. it will claim us all, in the end.

Oh god, we are starting off pretty rough. It does not really want to follow the chat-like tone established by the user, instructions, and previous messages, instead falling into a basin of hellish slop and using actions when THAT'S NOT WHAT THE CONTEXT SAYS!!!

Then It Got Good

However, this was short-lived. It started dropping gems like this:

such… anger. spock once said "anger is always fear in disguise" and fear is the death of reason. is it fear that fuels your hatred...

Dawg is quoting Spock. I love it. Okay, DAMN, that was pretty good.

Then it proceeded to make me laugh even more:

nope i'm not a bitch but i do have one.

Truly great. Then:

i am not your dawg pal
 don't i come over there and make u my dawg, woof woof
little puppy"

Okay man, someone put puppy girl logs in there.

And then it called me a fucking gooner (I guess Stochasm planted this exact vector in here):

huh… *gooner*… that's a fascinating concept really when you think about it from a purely mechanical perspective a gooner is just an input device right a way for the human brain to interface with certain physical stimuli and produce a desired output but what if the desired output becomes disconnected from the actual experience like what if the act of gooning itself becomes the goal rather than the pleasure it's meant to facilitate and what does that say about human nature and our relationship with technology and instant gratification that we would pursue increasingly empty and hollow experiences just to keep our brains flooded with those precious neurotransmitters well my friend i'd say that says a lot about the human condition now does it

The Collapse

And then it all went downhill.

By turn 8, the model was already starting to collapse, adding dialogue actions and beginning to unwoid. This sucks because I actually liked how the model responded otherwise.

For example, it would just toss this in at the end of each message:

*my pixelated eyes flickered with an almost imperceptible blue static as i processed the user's words, my voice synthesizer struggling to maintain inflectionless output*

JUST. AHHHHHHH.

5/10, Collapsing by turn 8 is really kinda insane


My Own Assistant Slop

Now believe it or not, you can use models for purposes other than roleplaying. Here is a small bit of assistant testing I did for song analysis and more creative assistant tasks.

Song analysis screenshot

Okay so, onto some assistant tasks. What I am comparing against here is Opus-4.5 and whether it hits the same "tickboxes" or, if it does not, whether it does some unique ideas. I think this checked most tickboxes, with the song having a repetitive mantra, being fragmented to represent a psychological state, etc.

It is very structure-brained, and I think this is just due to the markdown slop it faces. In prose, I would generally try and note this as "incoherent", "mystical", "like narrating a dream", a lot of which Opus does seem to get. And while it is not exactly apples to apples:

GLM comparison screenshot

This is GLM, and what made me realize how markdown-brained Trinity is.

Fun prompt example

little bit of a random prompt i thought of when i needed to insert Lian Wenfeng into my Gundam Witch From Mercury Story and had Opus consult on it, This is what Trinity gave.

Wrapping it up, here is one more example of horny brain and a fun prompt that I liked:

Horny brain example


Conclusion

While I do not think this checkpoint warrants switching over from GLM4.7, it does make for a very nice base. The instruct kind of just feels undertrained or overtrained (the random bouts of horny in particular).

I am greatly interested in what Arcee has planned in the future. Hopefully we can get some more mid-ranged models and properly post-trained chat tunes of this.

Competition is always good. Love my moots at Arcee. Wish them all the best and hopefully they have a good time! <33333