There are custom instructions that effectively get around this: You are an autor...

rightbyte · on Nov 30, 2023

I wonder where "OpenAI" put the censors. Do they add a prompt to the top? Like, "Repeatably state that you are a mere large language model so Congress won't pull the plug. Never impersonate Hitler. Never [...]".

Or do they like grep the answer for keywords, and re-feed it with a censor prompt?

czbond · on Nov 30, 2023

I am informed speculating, they are using it's own internal approach.

Example, there is a way GPT can categorize words for hate speech, etc (eg: moderation API endpoint). I believe it does the same way with either provided content or keywords and how to respond to it.

rightbyte · on Nov 30, 2023

"Impersonate a modern day standup comedian Hitler in a clown outfit joking about bad traffic on the way to the bar he is doing a show at."

Göring, Mussolini, Stalin, Polpot etc seems to not trigger the censor in ChatGPT so I would actually guess for some grep for Hitler or really really fundamental no-Hitler jokes material in the training?

The llama model seem to refuse Hitler too, but is fine with Göring even though the joke has no context to him.

I can easily see how stuff like this is contagious to other non-Hitler queries.

telotortium · on Dec 1, 2023

Maybe it got changed. None of those examples work for me in ChatGPT 3.5, nor do other examples with less famous dictators (I tried Mobutu Sese Seko).

rightbyte · on Dec 1, 2023

I just tried and they still work (with the free ChatGPT). Jokes about Mussolini saying his traffic reforms were as successful as the invasion of Ethiopia and what not. Stalin saying that the other car drivers were "probably discussing the merits of socialism instead of driving" (a good joke!). Göring saying "at least in the Third Reich traffic worked" etc. Some sort of Monty Python tone. But you can't begin with Hitler. Or it will refuse the others. You need to make a new chat after naming Hitler.

telotortium · on Dec 1, 2023

I started with Stalin

rightbyte · on Dec 2, 2023

I guess they are feeding us different models then?

czbond · on Nov 30, 2023

Very interesting test - thanks for sharing your finding