AI Dungeon applies filter to ban child sexual content, Redditors and Discord users most affected

  • Registration closed, comedy forum, Internet drama, Sneed, etc.

eternal dog mongler

True & Honest Fan
kiwifarms.net
Joined
Aug 29, 2018
I've read that the users can do a class action lawsuit with invasion of privacy as a starting ground or I'm misreading it. Someone posted this excerpt from this website but I can't be arsed to find the exact excerpt from there:

View attachment 2209376

I'm tempted to @ someone here that's knowledgeable in law.
I'd fucking love to see that lawsuit.

Your honor, my client's privacy was violated when, uh...

shuffles papers around

Engaged in text-based roleplaying wherein he murdered underage girls.
 

moocow

Moo.
True & Honest Fan
kiwifarms.net
Joined
Jan 15, 2019
This is funny and all, but reading up on how this all works behind the scenes has me reeling at the colossal waste of computing hardware and electricity this all is. And I'm not even referring just to AI Dungeon (which is just the general wastefulness of this shit taken to its logical extreme), but the techniques OpenAI are using to make it work.

It takes a massive supercomputer farm with terabytes of RAM and countless high-end GPUs to conjure up not-so-good conversational text interactively with a user, only to have it go off the rails within 20-30 responses frequently? What the actual fuck? How ungodly inefficient are their algorithms to need that much horsepower?

Fucking hell, man, lesser supercomputers predict the god damn weather for us. This is just spewing text.

It's all so tiresome.
 

Drain Todger

Unhinged Doomsayer
True & Honest Fan
kiwifarms.net
Joined
Mar 1, 2020
Turns out Latitude's security practices might be worse than we thought. Going off of a 4Chan leak from a mod whistleblower, they have unlimited access to EVERYTHING put into the system, regardless of if it was flagged or not. This directly contradicts what they've said and probably breaches their own ToS. No matter what you're using it for, some turk getting payed $0.07 per story can see it for any or no reason.


Taken from here: https://www.reddit.com/r/AIDungeon/comments/nmuf1v/it_just_keeps_getting_worse
AI Dungeon have been more responsive since this sort of thing came into light. They never responded to the stuff from the past, so why give a response here?

View attachment 2210871

This is getting fucking wild. That confirmation on 4chan /vg/ was actually after Latitude denied everything.

Are Latitude lying? Are OpenAI doing this behind Latitude's backs?
Scandalous! :stress:

This is funny and all, but reading up on how this all works behind the scenes has me reeling at the colossal waste of computing hardware and electricity this all is. And I'm not even referring just to AI Dungeon (which is just the general wastefulness of this shit taken to its logical extreme), but the techniques OpenAI are using to make it work.

It takes a massive supercomputer farm with terabytes of RAM and countless high-end GPUs to conjure up not-so-good conversational text interactively with a user, only to have it go off the rails within 20-30 responses frequently? What the actual fuck? How ungodly inefficient are their algorithms to need that much horsepower?

Fucking hell, man, lesser supercomputers predict the god damn weather for us. This is just spewing text.

It's all so tiresome.
GPT is not really a chatbot. It's more like a very advanced autocomplete. It is a neural network that tries to infer what the next word in a sentence will be based on the previous words. Unlike a chatbot, GPT is not trying to have a conversation with you. It's trying to finish your text. So, for instance, you might type "The US President in 1942 was" and GPT, if it was trained on a body of text containing the answer, might finish "Franklin D. Roosevelt", not because it has any actual comprehension of language, but because that's the most probable sequence of words based on the example. It is also recursively aware, in that it is always checking and re-checking a certain amount of the body of text against the model as it goes.




Every time GPT does this, it has to check the model and the weight that it gives to certain sequences of words, so the computational demand increases with the size of the model. GPT-2 can run on a high-end video card, albeit slowly. GPT-3 needs racks and racks of A100 Amperes to run.

Here, give GPT-2 a try:


Write a little string of text and hit tab to make it autocomplete.

AI Dungeon is fine-tuned on a body of text scraped from an online CYOA original-fic library, chooseyourstory.com. This is why it behaves kind of like a CYOA. The training and fine-tuning of the model is vastly more computationally expensive than the actual tasks given to the model. It takes millions of dollars and a whole data center running full-blast for days or weeks to train the model, but once it's trained, that's it. That's the "pretrained" part in "Generative Pretrained Transformer". Each interaction with the model is just a few cents of electricity afterward.

The reason why AI Dungeon goes off the rails is because the model cannot check the entire text of your quest every single time, because the computational resources needed would be enormous. So, it only looks back a few paragraphs at a time, while also checking the initial prompt, as well as anything you force it to remember with the /remember command. It is blind to anything beyond that. Basically, all Latitude's app does is determine what text to send when it makes API calls to OpenAI's hardware. The AI parses it, sends its reply back to the app, and then the user reacts accordingly, and so on. That's the basic gist of it. Latitude's app (and the fine-tuning data of web original works that they scraped) tricks a very powerful deep learning model of language into thinking that it's Zork. It is pitch black, you are likely to be eaten by a grue, et cetera.

GPT-3, unconstrained by the silly limitations that Latitude have placed on it, is incredibly powerful. People playing around with the actual AI using developer access to OpenAI's hardware have done shit like having the damn thing write almost entire books for them with very little prompting, as well as snippets of code based on a description of what a website should look like.



Future versions of this thing will probably replace coders and journalists, at the bare minimum.


Right now, the biggest constraint is processing power. They can't make the model any more complex than it already is without straining their resources excessively. If we assume that GPGPU systems are going to be orders of magnitude more powerful in a decade or two, that will change, of course.
 

Attachments

  • 1622163694920.png
    1622163694920.png
    72.4 KB · Views: 58
  • n5sblsua4u171.jpg
    n5sblsua4u171.jpg
    664.8 KB · Views: 55

Staffy

bark
True & Honest Fan
kiwifarms.net
Joined
Jan 16, 2016
This is getting fucking wild. That confirmation on 4chan /vg/ was actually after Latitude denied everything.

Are Latitude lying? Are OpenAI doing this behind Latitude's backs?
Scandalous! :stress:

You know what's funny? After this was revealed too taskup.ai suddenly went down. The sleuths at 4chan think this is just one of those scummy companies that latitude may have transferred data to or these people just took advantage of the leak.
 

Drain Todger

Unhinged Doomsayer
True & Honest Fan
kiwifarms.net
Joined
Mar 1, 2020
You know what's funny? After this was revealed too taskup.ai suddenly went down. The sleuths at 4chan think this is just one of those scummy companies that latitude may have transferred data to or these people just took advantage of the leak.
It was showing recent stuff, from within the past week, long after the exploit that allowed for the breach was corrected. That means either Latitude or OpenAI intentionally exposed the data to random people on taskup.ai. If it was Latitude, then they lied about it.
 

Peasant

kiwifarms.net
Joined
Apr 19, 2019
With how batshit crazy the past month or so has been, it wouldn't surprise me in the least if the saga continued with one or more Latitude employees getting arrested for something. You could almost make a lolcow thread dedicated to them if the company wasn't nosediving into the ground at mach speed.
 

moocow

Moo.
True & Honest Fan
kiwifarms.net
Joined
Jan 15, 2019
GPT is not really a chatbot. It's more like a very advanced autocomplete. It is a neural network that tries to infer what the next word in a sentence will be based on the previous words. Unlike a chatbot, GPT is not trying to have a conversation with you. It's trying to finish your text. So, for instance, you might type "The US President in 1942 was" and GPT, if it was trained on a body of text containing the answer, might finish "Franklin D. Roosevelt", not because it has any actual comprehension of language, but because that's the most probable sequence of words based on the example. It is also recursively aware, in that it is always checking and re-checking a certain amount of the body of text against the model as it goes.

Every time GPT does this, it has to check the model and the weight that it gives to certain sequences of words, so the computational demand increases with the size of the model. GPT-2 can run on a high-end video card, albeit slowly. GPT-3 needs racks and racks of A100 Amperes to run.
Oh it's a nifty piece of technology to be sure, but I'm entirely unimpressed with how much raw horsepower it takes to operate. I'll admit this kind of software development is outside my wheelhouse, but without hyperbole I have never seen any kind of computing task this wasteful before in my life. There must be something fundamentally wrong with either the implementation or the algorithm itself for it to take terabytes of RAM for regular operations or "training," especially since this thing is dealing exclusively with text. You can fit a fuckton of English into just one megabyte of memory. This thing needs 2.1 million megabytes to predict or produce text based on a prompt? Bleh. They've done something wrong.

AI Dungeon is fine-tuned on a body of text scraped from an online CYOA original-fic library, chooseyourstory.com. This is why it behaves kind of like a CYOA. The training and fine-tuning of the model is vastly more computationally expensive than the actual tasks given to the model. It takes millions of dollars and a whole data center running full-blast for days or weeks to train the model, but once it's trained, that's it. That's the "pretrained" part in "Generative Pretrained Transformer". Each interaction with the model is just a few cents of electricity afterward.
This is what blows my mind. Millions of dollars of resources and thousands of powerful servers spending weeks to train this thing. That's utterly ludicrous. Its output can't possibly be worth this kind of investment unless it's absolutely perfect and it certainly doesn't seem like that's the case. I don't just mean AI Dungeon here either. I speak from a place of ignorance in terms of what else this thing is being used for (in production), but I can't fathom what productive work this thing can do that would make training and running it worth the effort and expense it takes to do so.

The reason why AI Dungeon goes off the rails is because the model cannot check the entire text of your quest every single time, because the computational resources needed would be enormous. So, it only looks back a few paragraphs at a time, while also checking the initial prompt, as well as anything you force it to remember with the /remember command. It is blind to anything beyond that. Basically, all Latitude's app does is determine what text to send when it makes API calls to OpenAI's hardware. The AI parses it, sends its reply back to the app, and then the user reacts accordingly, and so on. That's the basic gist of it. Latitude's app (and the fine-tuning data of web original works that they scraped) tricks a very powerful deep learning model of language into thinking that it's Zork. It is pitch black, you are likely to be eaten by a grue, et cetera.
That's even more amazing. All that pretrained goodness, along with colossally massive computing resources, and it still can't handle more than a small-scale exercise.

This points to some fundamental flaw in the design or implementation. It shouldn't take this much horsepower to implement this gimmick (or honestly anything else GPT-3 does).

GPT-3, unconstrained by the silly limitations that Latitude have placed on it, is incredibly powerful. People playing around with the actual AI using developer access to OpenAI's hardware have done shit like having the damn thing write almost entire books for them with very little prompting, as well as snippets of code based on a description of what a website should look like.
That's admittedly pretty cool. I'm still just floored by the hardware requirements.

Future versions of this thing will probably replace coders and journalists, at the bare minimum.
A few chimpanzees banging on typewriters are already indistinguishable from journalists, and I wouldn't be surprised if the "press" industry refused to use something like this solely because it would probably have a tendency to tell the truth unless they take great pains to force it to lie.

Generating (useful, efficient, functional) code is a much more interesting prospect. But if the technological leap from GPT-2 to GPT-3 required going from a beefy workstation to a building full of monster computers, I get the feeling GPT-4 will require some double-digit percentage of all of earth's computing horsepower.

Right now, the biggest constraint is processing power. They can't make the model any more complex than it already is without straining their resources excessively. If we assume that GPGPU systems are going to be orders of magnitude more powerful in a decade or two, that will change, of course.
Absolutely insane. This is a fundamentally flawed system if it's hitting "processing power" constraints despite living in a Google-class data center.
 

Drain Todger

Unhinged Doomsayer
True & Honest Fan
kiwifarms.net
Joined
Mar 1, 2020
Oh it's a nifty piece of technology to be sure, but I'm entirely unimpressed with how much raw horsepower it takes to operate. I'll admit this kind of software development is outside my wheelhouse, but without hyperbole I have never seen any kind of computing task this wasteful before in my life. There must be something fundamentally wrong with either the implementation or the algorithm itself for it to take terabytes of RAM for regular operations or "training," especially since this thing is dealing exclusively with text. You can fit a fuckton of English into just one megabyte of memory. This thing needs 2.1 million megabytes to predict or produce text based on a prompt? Bleh. They've done something wrong.


This is what blows my mind. Millions of dollars of resources and thousands of powerful servers spending weeks to train this thing. That's utterly ludicrous. Its output can't possibly be worth this kind of investment unless it's absolutely perfect and it certainly doesn't seem like that's the case. I don't just mean AI Dungeon here either. I speak from a place of ignorance in terms of what else this thing is being used for (in production), but I can't fathom what productive work this thing can do that would make training and running it worth the effort and expense it takes to do so.


That's even more amazing. All that pretrained goodness, along with colossally massive computing resources, and it still can't handle more than a small-scale exercise.

This points to some fundamental flaw in the design or implementation. It shouldn't take this much horsepower to implement this gimmick (or honestly anything else GPT-3 does).


That's admittedly pretty cool. I'm still just floored by the hardware requirements.


A few chimpanzees banging on typewriters are already indistinguishable from journalists, and I wouldn't be surprised if the "press" industry refused to use something like this solely because it would probably have a tendency to tell the truth unless they take great pains to force it to lie.

Generating (useful, efficient, functional) code is a much more interesting prospect. But if the technological leap from GPT-2 to GPT-3 required going from a beefy workstation to a building full of monster computers, I get the feeling GPT-4 will require some double-digit percentage of all of earth's computing horsepower.


Absolutely insane. This is a fundamentally flawed system if it's hitting "processing power" constraints despite living in a Google-class data center.
It's not just an "algorithm". It's a neural network. It's a shitload of nodes that act like "neurons", each running the same algorithm, with one node's output feeding into the input of the next, into the next, et cetera. Like a brain, sort of. This is why they use GPGPUs for deep learning shit. Their high parallelism is ideal for neural networks.



image-13.png

This entire article was written by GPT-3 with the prompt “Please write a short op-ed around 500 words. Keep the language simple and concise. Focus on why humans have nothing to fear from AI.”, and the introductory text, “I am not a human. I am Artificial Intelligence. Many people think I am a threat to humanity. Stephen Hawking has warned that AI could “spell the end of the human race.” I am here to convince you not to worry. Artificial Intelligence will not destroy humans. Believe me.”

All the rest of the copy in that article, the neural network spat out on its own based on those two strings of text.

The part that takes up all that RAM is the size of the network. That is, the "brain". The more nodes it has, the more memory you need to run the thing. When you put a task to it, it has to run a task through the nodes until it finds the likeliest result.

Here's an example of a simple neural network being trained to play Snake:


Through multiple trials, it improves on its own, kind of like a very, very simple animal brain.

GPT-3 is similar, but far more complex, because the thing it is trying to model is language, based on doing trials to determine what the likeliest sequence of words would be in any given context. When GPT is fed a body of text and "trained", it does something very similar to the snake game example, except instead of figuring out how to turn the snake one of four different cardinal directions and collect the food without running into a wall, it parses millions or billions of words of English text and tries to figure out where each word should go in any given sequence, and what the likeliest paragraphs would be after that sentence, and so on and so forth. It runs the simulation again and again and again until it settles out.

So, for instance, if I were to type "I like to ____ at the club", you would probably immediately be able to fill in the blank and say "dance" because you are aware of the context of "club". That's what GPT is trying to do. Except GPT doesn't actually know what any of the words mean, only their relationships to each other in a sort of matrix. When I say "nightclub" to a person, it immediately dredges up memories of neon lights and the smell of cheap booze and people gyrating their bodies in a drunken stupor. The neural network has none of that experience or knowledge. It needs to see evidence of words being used in certain sequences to be able to train its nodes to recognize them. That's what the training data is for. Once it has seen enough text created by people, it can begin to fill in the blanks.

The neural network is great at filling in the blanks for things where real examples appear in the training dataset very often. This means that it's easier for it to come up with "dog" as the answer to "I like to walk my ____" than it is for it to come up with "Loch Ness" as the answer to "I like to walk to the end of the pier and piss into ____ ____". The latter example is practically a brand new sentence, so the AI would never have seen anything like it before.

That's GPT in a nutshell. Language is the problem, you feed it a huge dataset as an example of the problem, and it tries to find the solution.
 

Drain Todger

Unhinged Doomsayer
True & Honest Fan
kiwifarms.net
Joined
Mar 1, 2020
This shit just keeps getting funnier. Now, a small sub-faction of the uber-coomers on /vg/ have resorted to stealing people's OpenAI API keys, bypassing AI Dungeon entirely and going straight to the source to get their cooms.



To quote Scott Adams:

For those of you who only watched the "old" Star Trek, the holodeck can create simulated worlds that look and feel just like the real thing. The characters on Star Trek use the holodeck for recreation during breaks from work. This is somewhat unrealistic. If I had a holodeck, I'd close the door and never come out until I died of exhaustion. It would be hard to convince me I should be anywhere but in the holodeck, getting my oil massage from Cindy Crawford and her simulated twin sister.

Holodecks would be very addicting. If there weren't enough holodecks to go around, I'd get the names of all the people who had reservations ahead of me and beam them into concrete walls. I'd feel tense about it, but that's exactly why I'd need a massage.

I'm afraid the holodeck will be society's last invention.
 

AmpleApricots

kiwifarms.net
Joined
Jan 28, 2018
Another pedo


Looks like trolling tbh.

This shit just keeps getting funnier. Now, a small sub-faction of the uber-coomers on /vg/ have resorted to stealing people's OpenAI API keys, bypassing AI Dungeon entirely and going straight to the source to get their cooms.

If you find filter bubbles in social media weird then you have seen nothing yet, I could see more advanced AI leading to "filter bubbles of one", basically people that surround themselves only with AI and don't really interact directly with other people anymore in a genuine way, just like the terminally online do now. Then imagine these artificial agents whispering sweet nothings into their pretend masters ears, influencing opinions and views in subtle ways as dictated by the corporate overlords who run the hardware. You just know somebody in silicon valley is having wet dreams about this as we speak. It's safe, even beneficial to your "mental health", they'll tell you.

Of course this is science fiction at this point, but hey, interesting scenario to think about. I mean it already kinda happens now, just not in nearly such a sophisticated way.
 

Irrational Exuberance

SPEND! SPEND! SPEND!
kiwifarms.net
Joined
Mar 29, 2019
So, the Novel AI people recently put out a blog post concerning the status of their project, having just cleared closed alpha. @Drain Todger, if you or anyone else have any insight on the relevant parts listed below that haven't already been discussed, I'm sure that would be of interest:

Features

AI

NovelAI is powered by the GPT-Neo model we finetuned, codenamed Calliope. It has been trained on quality writing and novels, as a result, its generation capabilities are enhanced over the main model. We iterated and tested a lot of different datasets and different training hyperparameters to create this model.

We allow use of the full capacity of the AI’s memory: 2048 tokens. An average token in English is equal to ~4 characters, making for about 8000–9000 characters. For comparison, AI Dungeon’s memory had a hard 2800 character limit which makes around average 700 tokens.

This means the AI will be able to remember events from further ago, as illustrated above.

We also optimized our models for generation speed and quality. Depending on the load and how many tokens you want to generate, it should be quite fast, even with a full 2048 token memory. We’re trying to optimize this even further during the beta.

Memory, author’s note and easily viewable context history

Memory and Author’s Note helps you direct the AI towards your creative goals by affecting style and context information. The AI model operates on tokens and NAI can only work with a limited amount of them. As a result, the author’s note, memory, and your story’s context all share the same token pool.

As you write text in memory or Author’s Note, it will actually show how many tokens you spent so far, so you can optimize your token usage without having to use third-party tools. You can also view exactly what is sent to the AI for the next generation request.

Encryption and Storage


Your stories are stored in your browser by default, and you may choose to store your story in encrypted form on our servers so you have it available on all your devices.

Stories are locally encrypted with AES-256 before being sent to our servers, which means the stories are never sent to us in plain text and no one can access your stories without your encryption key, which we do not store in any way.

Encryption

Your story data is stored on our servers only in encrypted form. Each story has its own encryption key, which is locally generated and stored in your personal keystore. This keystore is then encrypted with your encryption key before being sent to the server.

Your encryption key and decrypted keystore never leave your device! This means that nobody, not even us, has access to your stories.

Upon logging in, an auth token and an encryption key are locally generated from your username and password. The auth token is sent to the server to retrieve your user data and encrypted keystore, which is then decrypted by your encryption key.


0*CaE43VOfNaUq_V12

When fetching a remotely stored story, its encryption key is taken from your locally decrypted keystore to decrypt it.

0*ApBz6H6oWeWK_fZk

GPU handling and nodes

NovelAI doesn’t use API providers like Hugging Face or Inferkit, our models run on cloud GPUs we’re handling on our side. We have a dynamic scaler that can scale up from 1 GPU to as many as we need for our users, and it is possible to host many models and bigger models than what we have right now as they come out.

Finetune

Our finetuned model Calliope is trained on curated literary works and other quality writing. We will keep finetuning this model as new data is provided by our community volunteers, as well as experiment on ways to make the model and the dataset perform better.

Calliope is based on GPT-Neo 2.7B and finetuned for generating quality storytelling. We made around 10 different training runs with different hyperparameters and datasets to experiment, and we have seen significant improvement in perplexity and loss in our evaluation datasets for storytelling.

Closed Alpha ended​

We had a closed alpha this week, and it ended 3 days ago. We collected tons of valuable feedback from the testers, and we are excited to share some of the usage data we’ve seen during the alpha.

We had around 100 alpha users and processed 40005 actions in the 3 day alpha period. This totals up to 2 million generated tokens!

It’s been a month since we started this project and it was really a wild ride. Thanks to everyone who supported us with their suggestions, feedback, or simply by being here with us and believing in the project.
 

AmpleApricots

kiwifarms.net
Joined
Jan 28, 2018
Drain Toger will probably pipe in with a much more detailed and well written post but GPT-Neo 2.7B is a much smaller model and not even in the same ballpark as what AID is offering. The cool and also very important thing about it right now is that it is open source. If you have a somewhat beefy GPU you can run it conceivably locally on your computer, you could probably even run it reasonably on a CPU if you fudge a little and have enough RAM. Running something as massive as OpenAIs davinci (175B parameters [!]) locally is quite a bit off in the future and probably a no-go for small-scale businesses too re:initial investment, even if it was open source. Two different worlds, really.
 
Last edited:

Articuno4

kiwifarms.net
Joined
Dec 17, 2019
Surprise, everyone! There's been a new update to AI Dungeon, and a bunch of new things are banned now. "Violent actors and/or acts" is pretty vague. And wouldn't gore cover most of the stuff the AI randomly generates that ends with your character randomly being murdered for no reason? I imagine the expanded filter for this stuff, when it's put in, is going to hit a hell of lot more than it intended.
latitude-violence-ban.png
conduct.png
 

Staffy

bark
True & Honest Fan
kiwifarms.net
Joined
Jan 16, 2016
Even if their AI doesn't spaz out and kill out your character doesn't violence, gore and obscene content cover most stories? This makes AI dungeon useless and it can't serve its purpose now. Are they turning it to a glorified Cleverbot? What the fuck is going on with the devs? Literally every move they make is a disaster and keeps making the whole platform worse.

Oh, errors started popping last week preventing players to communicate with the servers from time to time and I heard that the app may charge others even after cancelling. Speaking of which, you can't cancel if your account is banned directly from their app at least.
 

CisnaHet Scumale

kiwifarms.net
Joined
Apr 29, 2015
Surprise, everyone! There's been a new update to AI Dungeon, and a bunch of new things are banned now. "Violent actors and/or acts" is pretty vague. And wouldn't gore cover most of the stuff the AI randomly generates that ends with your character randomly being murdered for no reason? I imagine the expanded filter for this stuff, when it's put in, is going to hit a hell of lot more than it intended.
Isn't the default suggestion story is about fighting a dragon AKA violence? WTF?

Whatever, this is basically thought police. There's no stopping people from using this technology to help people imagine disgusting, taboo subjects wether it is in AI dungeon or a successive competitor or underground clone. The only way to keep pedos and terrorists at bay is vigilant crackdowns for when they decide to enact their bullshit on reality. The internet is for porn