Artificial Intelligence and Creative Works?

“It rained” only has one element. It can be a beginning, a middle or an end, but not all three at once.

Conflict is a conditional element of evaluation in a story. It can be there, it often is there, but it’s not compulsory. (Conflict that is present in a story can of course be either natural or artificial, depending on how well the story is written).

As for “stolen”, I mean “taken without permission of the copyright owner and used for something other than the copyright permits”. The use of a copyrighted material to populate an LLM without authorisation (except for certain specific exemptions that don’t cover any of the uses for text AI I’ve discussed apart from “having fun and experimenting with words”) is illegal where I live (the UK) and many other countries. By extension, that makes using an LLM with such materials handling stolen goods in the UK. The “black box” element of how an AI processes copyrighted texts doesn’t make the process any more legal in the eyes of the courts, it just makes it harder to prove the end user was aware of the wrongdoing involved.

Artists learn from each other. They do not copy from each other and then pass off their copying as original - which is what text AI is doing. (And yes, AI work often is recognisable as either a specific original work, or a combination of several original works. Especially given the way AI responds to stylistic prompts).

As for the Norway court case: this is where I wish I’d not relied so much on Twitter’s search function. For that is where I saw the thread where the artist who lost complaining about it (and searching on the generic search engines results in lots of stuff about American court cases, which is not helping).

2 Likes

Thanks for clarifying that for me; now I understand what you’re driving at. But I still disagree with the statement that the set of all English sentences NECESSARILY contains more information than the set of all novels. One novel contains immensely more information than the sum of the information content of each of its sentences, because there is information content in the sequencing of those sentence. Now, the rigorous calculation of the entropy of even one English word is problematic, and the calculation of the entropy of one English sentence is even more complicated, because various rules of syntax and semantics place constraints upon what words can follow any given word. When we consider the entropy of an entire novel – well, that goes way beyond our reach for the time being. It’s possible to come up with rough approximations, but you have to apply a LOT of assumptions to do that.

The focal point of my argument, though, is that the information content of one novel is not obtained by adding up the information content of its component sentences, because you cannot assume much about the constraints upon the sequencing of sentences. If you are given the sentences, “Where shall I go? What shall I do?”, what is the probability that the ensuing sentence will be “Frankly, my dear, I don’t give a damn.”? These extremely low probabilities for the sequence of sentences are what boost the information content of a novel by orders of magnitude over the information content of a sentence.

Of course, if the number of English sentences exceeds the number of English novels by many orders of magnitude, then it is possible that the net information content of those sentences will exceed the net information content of the novels. I’m too lazy to do the calculation, and we both know that any calculation small enough to fit into this format will likely be easy to puncture. Nevertheless, I think it fair to say that the creation of an entire novel by combinatorial methods will be immensely (and I mean COSMIC levels of immensity) more difficult than the creation of a sentence by such methods. That is the point that we are discussing.

1 Like

Good point; the vocabulary of the spider-story is much simpler than the vocabulary of the prince-story. Yet, we could easily rephrase the stories to reverse that relationship:

“Once upon a time there was a microscopic but indefatigable arachnid…” :grinning:

1 Like

No, you have it precisely backwards. Novels are more structured than arbitrary sentences, and structure reduces complexity.

You talk about how there’s a wide range of sentences which could follow any given sentence in a novel. That is true, but the relationship between successive sentences of a novel is far more structured, and is therefore less random, and therefore contains less information, than any two sentences chosen at random from the corpus of all possible sentences. Those relationships are, to a first order approximation, what makes a novel a cognizable thing, and they necessarily reduce the complexity (and therefore information in) the text compared to random statements totaling to the same nominal length.

3 Likes

Wow. Those are great resource! I forget why I passed on Hamlet on the Holodeck, but others look interesting! I got two already, but the rest is good stuff. Thank you for your reviews, too! :pray:

PS: “For sale. Baby shoes. Never used.” - Ernest Hemingway

Why not? That sentence specifies an event. Much of its content is implicit. Similarly, this sentence, consisting of only two letters, is nevertheless a proper sentence:

“Go.”

Moreover, the specification of “beginning, middle, and end” is of little utility. An encyclopedia entry has a beginning, a middle, and an end. A parking ticket has them, too. So does a menu. I don’t think that the objection has much weight.

Let’s be careful to differentiate between what meets a formal definition of a story, what is required of a “proper” story (The Prince Who Fell Into a Hole is not a proper story) and what is required of a good story. I agree that conflict is not necessary for technically adequate story, but I cannot recall experiencing any good story without conflict, and I can’t even recall any proper stories without conflict.

Your definition establishes that the use of stories by AI systems to produce new stories does not constitute stealing them. Copyright is “the right to issue copies” of a work. Reading a copyrighted work does not violate its copyright. Running a program that counts the number of q’s it contains does not violate its copyright. Running a program that analyzes its content by some algorithm does not violate its copyright.

Are you quite certain of this statement? I have read a number of pieces on this question and my conclusion is that the question has not yet been definitively answered. Let’s face it: this is a dramatically new phenomenon that copyright law was never intended to address. While courts might reach a conclusion one way or the other, they’ll be stretching the law no matter what; I therefore believe that this question will have to be addressed by explicit legislation.

I agree that an AI that produces output recognizable as copyrighted work is in violation of copyright. But such cases, even with the AI systems, are rare. It really doesn’t take much manipulation to generate something that is unquestionably different from the original – and I suspect that this will be even more salient in the matter of fiction. What should be our metric for stealing? If an AI generates a murder mystery that is solved by the realization that “the dog didn’t bark”, is that really a theft of Arthur Conan Doyle’s work? What about a soap opera in which a long, sensational sequence of events turns out to be a dream of the character? That device is used so often, is it truly copyrightable?

Do you think that you could compose rigorous language that would clearly differentiate between fair use and copyright violation? My fear is that any language that could protect your work from an AI system would simultaneously subject you to litigation for copyright violation. The law doesn’t recognize the difference between an AI generating text and a human generating text; any law must be based on what was actually generated, not who generated it.

3 Likes

Nope: you’re the one who has it backwards. I don’t know the foundation of your reasoning, but my own reasoning is based primarily on thermodynamics, with additional consideration of Shannon’s work on information theory. In thermodynamics, we talk about entropy and negentropy. Entropy is disorderliness or lack of information, negentropy is orderliness or information. Structure is a form of orderliness, hence structure adds information content.

Are you familiar with Maxwell’s Demon? He is a microscopic creature controlling a door between two chambers occupied by a gas. He opens the door when a gas molecule approaches the door from compartment A, but closes it when a gas molecule approaches it from compartment B. In this way, he concentrates molecules into compartment B, thereby creating a pressure difference between A and B that can be used to do work. This is a clear violation of the Second Law of Thermodynamics. The key observation for this discussion is that Maxwell’s Demon imposes structure on what would otherwise be a random situation, thereby increasing the negentropy.

I am curious to learn the principle that leads you to believe that a random configuration contains more information than a structured configuration. I suspect that it has something to do with the number of bits required to specify that configuration, but I remind you that any set of data can be defined either by data or by process, and process is not without information content. We can have a set of paired data such as:

{1,2} {2,4} {3,6} {4,8}

or we can have the algorithm y = 2x

Your reasoning suggests that the former contains information, while the latter contains little or none. You are ignoring the information content of the algorithm as well as the execution costs required for the algorithm.

I could probably come up with a rigorous mathematical proof, using Shannon’s definition of information content, of the proposition that a simple algorithm (structure) contains more information than a set of data, but I’m too lazy.

1 Like

Once again, no. You have this backwards. Entropy is not a “lack of information”, in either statistical thermodynamics or in information theory. It’s not how Shannon originally defined it in “A Mathematical Theory of Communication” and it’s not how it’s used today. Random citation: literally the first sentence of the wikipedia article on entropy in information theory:

6 Likes

Look, a lot of people are confused about entropy, negentropy, and their relationship to information. For example, here’s a portion of a book on information: https://www.fil.ion.ucl.ac.uk/~wpenny/course/info.pdf

In section 4.3 it states: “The entropy is the average information content”. That seems to settle the matter in favor of your interpretation, doesn’t it?

But wait! Just a few lines further it declares that “Entropy measures uncertainty.”

My point here is that there’s a great deal of confusion about the terminology, which makes it especially easy to get confused about what all these things mean. That’s why I prefer to found my thinking about entropy in terms of thermodynamics, which is based on physical phenomena, and is therefore easier to stay grounded with.

Think in terms of the Second Law of Thermodynamics:

The entropy of an isolated system can never decrease.

And in fact we know that the entropy of isolated systems almost always increases, the only exceptions being absolutely static systems, such as those near absolute zero.

Now think about what this means about our information about an isolated system, such as a bacterium isolated in a container. We can know a great deal about that bacterium, because we know that in order to be alive, it has to be carrying out lots of biochemical reactions. We also know that all its molecules are confined within its cell membrane.

But because the bacterium is isolated, it has no food, no sustenance. It will die. Its cell membrane will break down and its volatile molecules will evaporate and spread through the volume of the container. Thus, with the passage of time, the entropy of the system increases – AND we know less about the system. Where previously we could identify the positions of many of the atoms in the cell, now some of those atoms are now randomly scattered through the container. This is not a system of greater information, as you believe; it is a system about which we know less. It is more random, with higher entropy and we know less about it.

Your belief that entropy equals information content, when combined with Second Thermo means that, with the passage of time, the information content of every system (including the universe) increases. At the Big Bang, according to your interpretation, the entire universe had zero information content, but the information content has been steadily increasing as the universe has aged. All we have to do to gain more information about the universe is to merely wait for a while and new information will simply appear out of nowhere.

That’s not how the universe works. With the passage of time, we know less and less, because information degrades with time. This should be obvious from the Uncertainty Principle.

A system with high entropy is one that we have little information about. A system with low entropy has greater information content.

I suggest that the source of your confusion might come from mixing together two immiscible concepts: the information content of a system and the amount of data required to specify its state. To completely specify the state of a gas in a container, we must specify the position and momentum of every particle in the container. If the gas is at maximum entropy, then it is spread randomly through the container, and it will take a lot of RAM to store all different positions and momenta. But one low-entropy state would have all the gas packed into one corner of the container, so that the positions of all the particles could be specified by a much smaller statement along the lines of (x < a) & (y < b) & (z < c). Indeed, in the case of theoretically minimum entropy, we can specify the entire system with statements such as x = a, y = b, z = c, px = 0, py = 0, and pz = 0.

Information theory has equations that parallel the equations used in thermodynamics, but because information is a purely mathematical construct, it is impossible to think about those equations in anything other than purely mathematical terms, so there’s no way to intuitively understand those equations – you can only prove them.

This is why you misunderstand the meaning of Shannon’s paper. His formulation is highly abstract and easily misunderstood. You’d do better to read his later paper “Prediction and Entropy of Printed English”, in which he directly addresses the points we are considering here. Here’s a relevant quotation from that paper:

“From this analysis it appears that, in ordinary literary English, the long range statistical effects (up to 100 letters) reduce the entropy to something of the order of one bit per letter”

In other words, as you take into account constraints arising from semantic and syntactical factors extending over a longer length of text, the entropy of each character is reduced – that is, the information content of the passage is increased by the additional structural constraints, even as the amount of information required to STORE the text is reduced. Remember, the process elements are just as important as the data elements. Shannon carried out a fascinating experiment with his wife in predicting letters in a text. Nowadays this kind of experiment can be executed on vastly larger scales with computers.

There are quite a few academic papers discussing the information content of text in various languages. I have not followed this discipline, but the few papers I have read all make it clear that procedural constraints (semantic, syntactic, and thematic) decrease the number of bits required to store each letter of text – BUT only if you incorporate those constraints algorithmically. In other words, you substitute process for data. That doesn’t reduce information content; because process is universally applicable, it increases information content.

1 Like

You appear to be confusing “information” and “knowledge”.

Increasing the amount of disorder in a system strictly increases its information, regardless of how much of that information any given observer happens to have access to. Increasing the amount of order in a system strictly decreases its information, regardless of how much of that information any given observer happens to have access to.

2 Likes

Shannon’s entropy is relating to the transmitting of messages. If the message is transmitted perfectly, there is no change in entropy, no change in information.

If the message is garbled, the entropy in the message has increased, but there is less information in it than there was originally.

Therefore increasing entropy means less information.

Of course if you define information differently, you can get any result you want. But in the Shannon sense entropy is lack of information.

3 Likes

Ah, good! You are using a definition of information that does indeed match your interpretation. Your definition is the inverse of my own definition. There’s no point in arguing over whose definition is correct. I will only claim that my own definition matches the usage in thermodynamics and information theory.

And with that, let us conclude this discussion.

Specifying an event only works as a story if there’s a beginning, a middle and an end, which “It rained” still does not have. It also does not have a cause and effect, the other essential requirements for something to be a story (formal, “proper” or otherwise). An encyclopaedia, parking ticket and menu do have beginnings, middles and ends, all three have implied stories in them and I’ve seen people weave stories into all of these things. (Indeed, the police weaving a story into a parking ticket of what happens if it is not paid is a legal requirement where I live. Not all stories are fiction). Though I would agree that none of those three things is automatically a story since none of them contain a cause and effect by default in every instance. (There are places where parking tickets are allowed to not state - or even necessarily have - consequences of not paying them, for example).

“Go.” is not a proper sentence as it is far too ambiguous - it’s not even clear whether it’s a verb or a noun without context, let alone whether there’s a noun-verb linkage. Dialogue gets away with it more often than not because there’s surrounding context, but that does not make it a proper sentence.

You do not appear to understand that the very moment the copyrighted information hits the AI system without authorisation, the theft has already commenced (since there already wasn’t the right to issue the copy). Further use of it becomes further theft, since fragments of the copy still constitute copyright as long as the copy can be traced (something quite trivial to do given systems’ exposure of the source corpus material, as ChatGPT does). The moment one hits “generate”, that’s processing copies into further copies (multiple further copies, in fact, since they’re typically using multiple sources within the corpus). Mixing in interpolative calculations doesn’t change the legality or lack of same of the process. I’m coming to the conclusion that you don’t entirely understand how generative AIs work. (Reading something someone else hit “generate” on doesn’t breach copyright, at least not in the UK).

I am 100% certain of the statement that the UK bans the use of copyrighted material to populate an LLM without authorisation, especially given that public-facing composition systems in other fields (human or machine) are required to abide by the same law, and that nothing in UK law exempts AI. (Note that in the UK, most image-based memes are also illegal, contrary to the impression one would get from looking at many UK social media accounts, even in circumstances where they would be legal in the USA). In short, I hold my position because what I have described is already UK law for human creation and non-AI computer-assisted human creation.

I see it a bit differently, having been on both sides of the software engineering hiring process. The interview process at a lot of companies is terrible and focusses more on how much you have reviewed your undergraduate algorithms course, despite the fact that all of those algorithms are provided for you in standard libraries! Better questions focus on how you can find the underlying problem in an unfamiliar problem (I’m not talking about stupid little puzzles about ferrying the members of U2 across a river or anything). For instance, I pose a networking problem to you with language from a different domain – can you apply basic ideas in new ways?

And as to experience, if AI has made experience irrelevant (after all, it has all the experience accumulated on the internet and in GitHub), then creative and smart people will have the leg up in the hiring process.

1 Like

You define “story” to mean whatever you believe, and I’ll define it to mean what I believe. We disagree. The definition of “story” is not unlike a definition of “art” – highly subjective.

Look up “imperative mood” in English. Wikipedia has a nice article on the topic, including this statement:

" An example of a verb used in the imperative mood is the English phrase “Go.”"

Please remove the first seven words from similarly phrased comments in the future. The copyrighted information has to be publicly accessible on the web in order for it to be scraped off the web by the software building the database. When any person reads a work of fiction on the web, that person is actually reading a copy of the work on their own browser. So what, then, is the moral or legal difference between a person reading a copy of the work and an AI scraper reading a copy of the work?

I am confused by this statement. If a copyrighted work exists on the web and a user reads part of it, then stops and returns later to continue reading, is that theft? How does such a case differ from the case of a scraper reading the text?

I agree that, if the AI program retains a direct copy of the copyrighted work internally, then this constitutes a violation of copyright. Would you agree that, if the AI program does NOT retain a direct copy of the copyrighted work, but instead retains an algorithmically modified representation of that copyrighted work, it is not violating copyright?

Here’s a simplified example of what I mean: suppose that the AI compiles and retains a list of how many a’s, b’s, c’s, and so forth appear in the article. Is that a copyright violation? What if the AI compiles and retains a list of how many diagrams (pairs of letters) of each type appear? Is that a copyright violation? What about trigrams, and so on? Would such material constitute a copyright violation? Remember, the AI can also retain a link to the source without actually retaining its contents, and use that link for future reference.

Now, now, let’s all play nice, OK?

This means that ChatGPT, DALL-E, Midjourney, and none of their cousins can legally be accessed in the UK. Is this in fact the case? How is this law enforced? Does the UK have some sort of firewall against all such services? Have there been court cases in which any of these sources have been prevailed against? Could you cite any such cases?

3 Likes

It is not.

Shannon entropy is by definition the expectation value of the self-information of the variable being considered. By definition. In form it is identical to the formulation of entropy in other fields, e.g. the Gibbs formula from statistical mechanics/thermo. It is a logarithmic measure, and if you choose log2, then it is equivalent to the number of bits required on average to encode the variable.

In Shannon’s own original paper, which I previously linked, on page 11 you can find his original derivation:

(Emphasis mine).

In information theory, entropy is not the lack of information. By definition, it is a measure of information. Again, I can’t stress this enough, by definition.

There are a few other kinds of information (and entropy) used in information theory (mutual information and relative entropy, when there’s more than one variable, and Fischer information which is a different but related measure of the minimum error in any estimation of a given parameter of a distribution, for example). But if you’re talking about “information” in relationship to entropy in information theory, then absent any qualification it’s self-information (sometimes called “information content” in contexts where relative information isn’t involved).

I would offer to work the derivation from first principles if a) I knew how to format equations in posts here, and b) it wasn’t already getting pretty wildly off topic. If other readers are interested I would be happy to contribute to a different thread dedicated to information theory.

5 Likes

This exchange has disintegrated into argument as opposed to discussion, so I will make one point and then invite you to have the last word. The quote you provide declares that entropy is a measure of information, choice, and uncertainty." The word “measure” does not imply concomitance. “Measure” means only that the indicated value can be deduced from the measured value. For example, the ratio of oxygen isotopes – O18 to O16 – in an old sample of ice is a measure of the overall temperature when the water evaporated. But the ratio of O18 to O16 is INVERSELY related to the temperature: a higher ratio indicates a lower temperature. Thus, the fact that entropy is a measure of information does not in any way imply that entropy is the same thing as information.

This point is made clear by the fact that the statement you quote declares that entropy is a measure of “information, choice, and uncertainty”. Surely you will agree that information and uncertainty are inversely related. If “measure” implies concomitance, then the grouping of ‘information’ and ‘uncertainty’ together would surely be oxymoronic. Therefore, “measure” does not imply concomitance, and the quote you provide does not support your interpretation.

I invite you to have the last word and I offer my best wishes.

2 Likes

The syntactical gymnastics in this thread on every side are getting rather silly, frankly. No discussion progress is being made so I’ve closed the topic.

15 Likes