Is AI Generated IF Discussion OK Here?

mathbrush · June 23, 2024, 6:06pm

I want to breakdown how AI works as far as I know (I was part of a data science group in 2017 and have done machine learning before).

Sound, images, and video in general are really big and take up a lot of room. One goal in machine learning is to break things down into component parts and find out which ones matter the most; the rest can be discarded.

This video explains how to do this with specific sounds, like in a synthesizer:

Each noise you hear can be broken up into different amounts of smaller waves that, when added together, recreate the original sound. You can represent those sounds as a list of numbers. The sound in this image is:

-2/1,-2/2,-2/3,-2/4,…

It goes on forever. But you can cut it off after five or six numbers. And you can associate it with a word:

‘buzzing’: -2/1,-2/2,-2/3,-2/4,-2/5

If we find a big database of sounds labelled with what they represent, we can find the five numbers that best fit each of those sounds and get a database of sounds with synonyms that match:

‘pure tone/a note/tuner’: 1,0,0,0,0,
‘buzzing/sawtooth’: -2/1,-2/2,-2/3,-2/4,-2/5

You could use this to make ‘ai sounds’. If someone types a keyword, then look at all sets of 5 numbers associated to sounds and pick one. You can fool around with how they are picked (just pick a random pre-existing one, take the average of all ones, weight them by how frequent they are in the database and then average them, etc.)

While it’s obviously more complex, the same thing works for images. You can take an image and break it up into its most important components. One old way of doing that was SVD decompisition:

(this is an image that has been broken up into important components. its most important component is displayed alone, first, and then the others are added in).

Nowadays they use more sophisticated techniques, but they still break images up into smaller, nicer chunks of data and associate those with the captions or labels of the images.

Then when people type in prompt words, it searches the database for components matching those words, and then chooses an image in that ‘space’ of components. A lot of algorithms ‘walk’ through that space, and so some websites show you how the image morphs over time.

So AI does take a lot of sources, break them down, and recombine them. Most of the original image is still in there, especially if its heavily represented in the database (like ‘Mona Lisa’). So it’s kind of like a Ship of Theseus problem. If you break down art into its R,G, and B values (for instance), send one of each to three different people, and months later they happen to reassemble them, is that stealing art? A lot of people might say yes.

On the other hand, it’s kind of like doing research, where you take bits of many other people’s work and summarize it, but that involves citation and research doesn’t usually quote or cite private work of individuals that they didn’t share intentionally.