The Divination Markup Project (& how you can contribute)
For *cough* some years now, I’ve been working on a treatise about generative randomness. Though it plays out in entertainment and creativity, the first part entails a dive into divination in which some tokens are randomized, selected, and read in a specialized grammar for new meaning.
That’s pretty dense, so let look at a few examples like (the divinatory and non-white-supremacist version of) runes, I Ching, and Tarot. There are many ways to practice with these systems, but for purposes of this study, we’ll go with a layperson’s practice.
- In runes, the tokens are the runes themselves. They are shuffled in a bag and drawn, and placed in a spread. The meaning comes from individual runes, the set of drawn runes, and the runes’ locations in the spread.
- In I Ching, the tokens are coins or stalks of yarrow root that form trigrams and hexagrams, whose texts are considered as answers to a question posed by the querent.
- In Tarot, the tokens are cards. They are shuffled in a deck and drawn, and placed in a spread. Like runes, the root meaning comes from individual cards, the set of drawn cards, and cards’ locations in the spread.
In such sortilege systems, the interpretation text is critical. The ᚠ (FEHU) rune, for instance, probably means nothing to most people without referencing a book of runes, where they can learn that it represents wealth, abundance, success, security, and fertility.
The genesis of this project came about a long time ago when I noticed that across divination systems separated in provenance by time and place, many of the same meanings keep appearing over and over. In Tarot, wealth is indicated for The Empress and many of the cards in the Pentacles suit. In the I Ching, you’ll find 14: Affluence focuses on it. These repetitions raise several interesting questions at the macro level:
- What does each system care most about?
- What does each system care about uniquely?
- What do they care about as an aggregate?
- What are the core concepts shared by the most popular systems?
- What don’t they care about?
For people who like thinking about the human experience and human universals broadly, these are interesting in and of themselves. But they will also be useful as a basis for making new systems, either as raw materials to build from or as an illustration of blind spots—gaps that could be filled by new systems.
Wait, isn’t this about some kind of markup?
Yes. Hold on, I’m getting there.
These are not easy questions to answer. But I listed them above in the order they must be answered, and I do intend to answer them. A first step was to analyze one system, and for that system, I picked Tarot.
Analyzing the Rider-Waite Tarot
I completed this in 2022 and shared it around. It’s meant to be seen as a poster, so the text here is too small to read. If you want to read it, there’s a big image on imgur. (Oh and if you want to have one for your wall, I’ve posted it to Redbubble.)
Cool, right? Well, I think so, anyway.
The graphic lists the concepts that were found in the most cards, in descending order. The lines connect the concept to the individual card, and the sentence from the card text containing the concept appears along the line in teeny-tiny type. This gives readers the opportunity to go from concepts to cards or cards back up to their concepts, and reference the exact words from the text to back it all up.
And I’m eager to do more. As of this writing I’m working on the I Ching: The Book of Changes.
How I made it
The text in the copy of the poster explains a bit of how I went about it, but for purposes of this post, I’ll re-summarize it. I selected a text to analyze. I wanted one in the public domain and that felt authoritative, so Arthur Edward Waite’s The Pictorial Key to the Tarot, first published in 1910, seemed like the best choice. Some version of that is included in the most popular deck that is sold, the Rider–Waite Tarot deck as illustrated by Pamela Colman Smith.
To analyze it, I found a copy of the text online, marked it up with some code that described what the senses of the noun phrases were (more on this below), and then tallied the resulting concepts. I could have just tallied the senses I came across, but in the interest of good science, I want the source data to be interrogatable and improvable in the future. To that end, I wanted to add markup to the text that both allows a tight coupling between source and interpretation.
But that’s just a high-level a summary of the process. In order to tell you more about the markup, we have to go over it again in more detail and get into some fairly arcane lexical detail.
Concepts are slippery, and to do any kind of tally of the concepts in a text we have to nail those slippery suckers down, and that takes some steps.
First, we want to look at only the descriptions of what the cards mean, as opposed to descriptions of the cards themselves. In fancy academic texts, the token is called a “protasis” and the meaning is called an “apodosis.” For instance, the protasis of Colman-Smith’s The Magician would describe the man, his red costume, the bower, his position holding up a wand, the table before him, etc. The apodosis from the Rider-Waite text is the list of things to consider when the card shows up in a spread: “the divine motive in humanity,” that sort of thing.
Analysis of protases would be interesting, but that’s not part of this project. I marked them up for Rider-Waite, but from here on out let’s just worry about apodoses. Otherwise you’ll see tags for…
- the metadata about the card, like name, suit, and number
- the phrases describing the apodosis
The markup looks like HTML or XML, with named tags and attributes.
Since I’m going to be comparing apodoses across divinatory systems, I have to be more abstract about the container of the meaning. Sure, in Tarot it’s “card” but elsewhere it might be a hexagram, or a vein on a liver, or a rune; so I chose the abstraction “token” for it.
Once I had the apodoses distinguished, I still had more work to do to get to core concepts I could tally.
- I needed to merge plurals, like women and woman to the same core concept.
- We want to count close synonyms like wedding and marriage as the same. (I know, one is an event and the other can describe either the event, or the duration of the pairing, or the interpersonal dynamics.)
- We want to be loose with accepting multiple possible interpretations, but reject ones that are very unlikely, such as when the text mentions an “end” to not include the sense of “tight end,” the position in American football, and the 8th sense for “end” in WordNet.
- Longer phrases like “the divine motive in man” from the Magician aren’t really about “divinity,” “motive,” and “man” but something more like virtue, selflessness, and communalism. So the markup allows you to be specific, and tag phrases for the concepts they describe.
Yes, the process is a little bit lossy, like all of data science, but we’re not doing engineering here. A Fermi-esque answer is fine.
The good news is that I didn’t have to create a “dictionary” of the core senses of words. That’s been done a number of times. The one I’m most familiar with and used for this was Princeton’s WordNet. (I have a linguist friend who for good reasons has urged me to use another, but it didn’t have libraries in Python, so WordNet it is.) This database is freely available, and provides unique descriptors for each sense of the overwhelming majority of English words. I use WordNet in the Python code extensively, but we’ll only need it for part of the markup work.
This irritates some practitioners
If you’re a serious practitioner of Tarot, you might be incredulous right now. That’s so old! That’s not what Tarot means now! That’s not how anyone does it now! It’s like trying to understand what Congress cares about by examining the Constitution!
I’ve certainly been waylaid by more than my share of aggressively earnest practitioners who want to correct me, to inform me of modern practice. After all, the poster clearly says, “What does Tarot care about?” at the top.
But I’m working within many constraints. The first is practical—there’s not enough space to say, “What does the The Pictorial Key to the Tarot by E.B. Waite (1910) Care About?” and have it be legible. Welcome to the editorial ambiguities of graphic design. The larger problem is that answering this question in any other way is multi-decade study for which one person is simply not enough. Consider all the questions it entails, and trying to answer any of them to any modern practitioner’s satisfaction, much less a majority of them.
- Which is the “correct” modern text to analyze? For which practitioners?
- Why is one text authoritative and others less so?
- Should all modern texts be analyzed and compare/contrast the results?
- How are we to deal with discrepancies across texts, and determine the “correct” set?
- Can I get permission to use those texts?
- Are digital texts available?
- Are translations of digital texts reliable?
- Should we take into account all of the “readings” that people make of these systems, and analyze those, too?
To a great extent, trying to answer these questions becomes a massive exercise in ethnography, and a several orders of magnitude more work, and for little benefit to my primary thesis. It would be interesting, especially to compare an analysis of modern texts to this old Waite one, but this is not that project, and I am not that guy.
But hang on.
That’s where you might come in. Maybe you’re indignant about my choices and want to include a different, preferred text. Maybe you’re excited and want it to progress faster than the snail’s pace I’m managing. Whatever the reason, if you want to help, below you will find the instructions for you to mark up a text and send it to me. I’ll run it through the same algorithms I used for the Tarot analysis.
The cool news is that it doesn’t have to just be Tarot, either, this same set of instructions should work for any major sortilege interpretation text. If you know one that you’d like to do, check with me to see if I’m already working on it, to save duplication of effort, and if not, awesome, run with it.
The remainder of this article tells you how you can go about marking up a divinatory text using tools I have created for the purpose, and share it back with me for inclusion in the larger analysis, if that’s the sort of thing you want to do.
Imagine a world…
I’m not the only person interested in these concepts. I’ll bet a lot of mind-blowing analysis can be done on these texts once they’re marked up properly. (You might be able to do it with raw text and LLMs, but it introduces a degree of ambiguity that is less convincing for being less certain, IMHO.) Off the top of my head…
- How have core divinatory texts changed over time? What might the “geological eras” of divination be?
- How have they been adapted when passed across cultures?
- What do the unique topics of a given divinatory practice imply about the cultures and times in which they were created? What need did they fulfill?
- How long does it take divination to incorporate new concepts in the world, like evolution and computers?
A collection of marked-up divinatory texts that enabled analyses like this would be an amazing thing to have in the world, so to that end I’ll be licensing the marked up texts for Creative Commons use by anyone.
How you can contribute
In this section I’ll describe the steps you can take to markup a text.
First, find a divinatory text (and even more specifically, a sortilege one). I’m pretty much monolingual, so the text should be in English to allow me to validate it. Don’t worry about repeating work. If you do one that’s already been done, I’ll do a diff against the other one and use it to make improvements. If you really want to only work on a system that hasn’t been done before, reach out to me on social media and we can discuss your interest.
Whatever text you choose, make sure you have permission to work with it. I am not a lawyer, and I think this is squarely a fair use case, but if it’s not in the public domain, cover your bases and get the author’s permission in writing to be sure and as a consideration.
Mark up the tokens
For this it’s best to have a text editor that is good at editing markup like HTML or XML, because all of the markup uses tags. A key feature of the tool you select is that it needs to color code the elements of the XML differently, so it’s easy when looking at a block of XML to distinguish the tags from the attributes from the main text. Also, if it helps you keep your tags valid, that will save you a world of hurt. BBEdit is my tool of choice for this, but work with what fits you, you budget, and your preferences.
Open the text in the editor, and surround each token in <token> tags. The token is the thing that carries the protases and apodoses, like a card in Tarot, or a hexagram in the I Ching, or rune in runes, etc. Add the token’s unique name as an attribute named “id” in the token. The example below shows that the ID I’ve assigned to The Magician is, as you might imagine, “the_magician.” The exact way you choose the ID is not important, as long as it’s unique compared to the others in the same system, spelled identically every time you use it, and intuitive for a person to read. Note that longer texts may discuss the token in several places. Each of these should have the token tags and share the unique token ID.
Decide: Mark apodoses manually or with programmatic help?
Once you have manually marked up the tokens in the text, you need to make a decision whether to mark up the apodoses manually or using a Python tool I wrote.
I did the Tarot text manually. It was a royal pain, and took far too much time than it should have—on the order of months. If you want to save yourself that work and are comfortable with Python, skip down to “Mark it up — using Python” and skip this section. But if you don’t want to wrangle Python, you can use a text editor, the process for which is described in the next section.
Marking up apodoses manually
Next, within the tokens, surround the apodoses in <apodosis> tags. (If you want to, you can also tag the protases, but I’m not processing these yet.) For efficiency, you’ll be tempted to surround as much text as possible in your tags, but it’s better to keep your tags around phrases that describe a single concept. For example, in the I Ching you could surround the entire sentence “The strength of the Creative and the mildness of the Receptive unite” with tags indicating strength and gentleness, but it’s better to zero in on the individual concepts, as in the example below.
The <apodosis wn_only="strength.n.01">strength</apodosis> of the Creative and the <apodosis wn_only="gentleness.n.02">mildness</apodosis> of the Receptive unite.
Some of these single concepts will be word phrases, such as “the divine motive in man” for The Magician, but you’ll still be finding individual concepts that match.
Pro-tip: After you mark up, say, three tokens, send it to me. I’ll run it through the Python to make sure it’s working correctly. If there’s something amiss, let’s catch it early so you don’t wind up wasting time only to have to redo it.
Next you’ll need to identify WordNet senses and include them as a quoted comma-separated list in an attribute named “wn_only”. See the example from the I Ching below as an example.
When I did the Tarot I was a pretty loose with the spans I would select, and had a handful of differently-nuanced tags. If I felt like all of the senses for a word might work, I’d just leave the apodosis tag without any parameters. For future work I’m going to be more directive, and include WordNet sense IDs for each and every apodosis tag. So I’m asking you to do the same.
It’s not enough to just tag the words and phrases that are apodoses. For one thing, words can have senses that don’t make sense as a divinatory interpretation, and the florid phrases in most divinatory texts don’t have direct counterparts in Wordnet. So we have to be more specific and tag individual WordNet markers. They are unique “names” for a synset that are more human-readable than the numerical identifiers that are also available. They look something like this.n.4, a format that describes its lemma, part of speech, and sense number. How do you figure this out if you’re not using Python? It’s a few steps.
1. Head to Princeton’s WordNet lookup website. It’s over 10 years old, and slow, and not the most well-designed thing, but you can look up words you’re thinking about there. In the display options, be sure to turn on “sense numbers”. “Gloss” (a lexicographer’s term for definitions) is turned on by default, but if it’s not, be sure and show it. Then look up the word you’re thinking about for your apodosis. Your results should look something like this.
2. Use the S: tools to the left of words to browse up and down the hypernym/hyponym tree to really zero in on the senses that matches your apodosis. Keep in mind that there will almost always be several possible interpretations, and we want to capture the right ones. Also keep in mind we’re only interested in nouns, so don’t sweat the verbs and adjectives except as they are entailed in the noun. Finding the right WordNet sense is its own craft. I find it rewarding and exhausting.
3. Once you’ve found a sense that fits your apodosis, note the sense number to the right of the part-of-speech marker. The word there is the first part of the sense marker. The number after the # character is its index.
4. Construct the full sense marker with the word, a period, the letter n for noun, another period, and the index. So, for example, if you wanted to tag an apodosis for the sense of “power” that means “possession of the qualities (especially mental qualities) required to do something or get something done” you would use “ability” from “ability#2”, then “n”, then the “2” from “ability#2”. The final sense marker would be “ability.n.2”. That’s the sense marker for that particular sense of “power”.
5. Add that sense marker to the wn_only attribute of the <apodosis> tag as a comma-delimited list, like this: <apodosis wn_only="power.n.1, power.n.2, ability.n.2">
6. Repeat for each sense that you think could be associated with the apodosis. Be sure and close the apodosis tag! You can always check the validity of your markup by pasting the entire document into a site like this one, and correct any errors you may have made. I recommend doing this after every token. Doing it for an entire document is a nightmare—ask me how I know.
Greedy or lazy?
When deciding which senses to include and which senses to leave out, I hope that you err on the side of being lazy. You may be familiar with these terms from regular expression programming, and I’m using them similarly here. Greedy means “include as much as you can” and lazy means “include just as much is needed.”
To explain, it’s best to use an example. Let’s use the example “end” from the Death card in Tarot.
- end.n.01: either extremity of something that has length
- end.n.02: the point in time at which something ends
- end.n.03: the concluding part of any performance
- death.n.01: the event of dying or departure from life
- end.n.05: a boundary marking the extremities of something
- end.n.06: the surface at either extremity of a three-dimensional object
- end.n.07: one of two places from which people are communicating to each other
- end.n.08: (American football) a position on the line of scrimmage
- end.n.09: a final part or section
- end.n.10: a final state
- goal.n.01: the state of affairs that a plan is intended to achieve and that (when achieved) terminates behavior intended to achieve it
- ending.n.02: the last section of a communication
- end.n.13: a final point or proposition that is settled
- end.n.14: the part you are expected to play
- end.v.01: bring to an end or halt
A greedy approach would include all of these senses. And while divinatory practice might consider the vast majority of these—after all part of its purpose is to encourage the querent to consider new possibilities—it stretches credibility to believe that the card would ever be referring to end.n.08, a tight end from American football. So the sense value list for this apodosis tag would not include that. That’s what I mean by lazy. You’ll need to do a similar review and culling of senses for each apodosis to get to a set that really fit the likely possibilities while omitting the unlikely ones. That’s a judgment call you will have to make.
Go through the entire document tagging the apodoses for WordNet senses and if all the tags are properly formatted, you’re done and can send it to me via this Google Form.
There’s still a little more to talk about like, “Hey, what do I get for all this work?” and “What are the usage rights?” which will be at the end of this document. But for now, I need to talk to the folks who want to use the Python tools I wrote that make this manual process a lot easier.
Mark it up — using Python
Because the manual markup was such an enormous hassle, in early 2024 I wrote some imperfect Python that helps automate much of it, and even optionally uses chatGPT for suggestions to help you along.
Please keep in mind these many caveats: I am not a professional programmer. The code is messy and inefficient even though I commented a lot. These instructions will not walk you through every step. I went as bare bones as I could (it’s pretty much command line.) to get it done rather than giving it a GUI. It may break on you. I can’t help you debug problems with your environment and I won’t teach you Python. There is no undo implemented. I’ll almost certainly improve the code over time without thinking to cross-check these instructions and update as necessary. (I’ll try to do so for major changes.) I doubt I’ll get any takers for this, so I haven’t done work to make it widely accessible. (I can work on this if needed.) If you’re fairly competent in Python, none of this should be a problem.
So, if you’re undaunted and want to proceed, head to the github project page and download all the files there. It will work without a chatGPT account, but if you have one, add your secret key to the file openai_requester.py.
The libraries you’ll need are:
From the Python default-installation libraries you’ll also need: sys, re, json, os, ast, datetime, difflib, and string.
The main file to run is divmarkup_MAIN.py.
When you run this file, it will ask you to load your source text document. To support working in sprints, it will parse the file you’ve selected to find the last one with an <apodosis> tag and offer to begin analysis at the next line.
It will skip over blank lines and any lines with apodoses. Note that that means if a line needs multiple apodosis tags, you will need to add them manually.
Thereafter, it will take you through each sentence in the file, going through cycles of identifying apodoses and selecting senses for each one.
Identifying apodoses
Once the program shows you a sentence, you can either just enter the apodisis by typing it in, or—much more time-saving—use Pythonic slice notation to select the apodosis in the sentence. So for the sentence, “Symbolically this connotes holding together and the laws that regulate it.” if you type “hold:regulate it” it will select everything but “Symbolically this”. It will prompt you to disambiguate if a slice command could mean multiple things.
If you have added an openAI secret key, you can request the program pass the sentence via API to get the GPT best guess on what the apodosis is. It’s right maybe 50% of the time for me.
Update: I originally had the code check chatGPT by default. But for efficacy and environmental reasons, I have deprecated all chatGPT functions to be opt-in.
Once you have a selection, it will display the sentence with the apodosis in all caps so you can modify or confirm. At this prompt you can also manually save the file, but know that it automatically writes an autosave file so no work is lost if there’s a program error.
If you hit ENTER, it tells the program that the apodosis in caps is correct and you’re ready to select senses. Here’s an example of a simple apodosis interaction.
Selecting senses
The program will display the sentence and prompt you to name some words, whose senses might match. It will provide a reminder of the controls, but an overview follows.
To do this, it expects a command character followed by control words. A description of each is below. The program only understands one command character per line.
You can look up the senses for words (without adding them to the set) via the command ‘?’ or have it list synonyms for a word with the command character ‘s’. These can help you explore what words you might want to add. Neither of these will refresh the synonym list for you, but all the others will.
You can add words with ‘&’ and remove words, along with all their senses, with ‘x’. You can provide a single word or a comma-separated list as the control words. Note that by default, any new word you add will have its first sense selected and the others deselected. This is meant to help you save time as WordNet puts the most common sense first.
For each word you add, it will enumerate a list the senses with their sense markers beneath. Emoji to the right of the enumeration tell you whether it is selected ✅ or unselected ❌. These emoji are followed by a sense marker and the gloss from WordNet.
You can tell it to select senses using the command character ‘+’ followed by the number of the sense in the currently-displayed list. It can handle comma-separated list of numbers or slice notation for ranges. Deselect using ‘-’. Note that the enumeration is germane to the current listing. As you add and delete words, the number that is associated with the sense will change. The sense markers are a hassle to type so are not used as control words. With “+” and “-” you can also use “all” to mean every sense displayed. You can show or hide deselected senses with the “/” command.
Note that the many-to-many relationship of words to senses mean that some senses appear under multiple words. Selecting one of these repeated senses will affect all others with the same sense marker.
If you have added an openAI secret key, you can request that the program pass the apodosis via API to get the chatGPT best guess on what the senses might be.
Update: I originally had the code check chatGPT by default. But for efficacy and environmental reasons, I have deprecated all chatGPT functions to be opt-in.
Once you like the selected senses, you can press [ENTER] and the program will write out the autosave and move on to the next sentence to identify the next apodosis. The autosave will be marked with the date and time.
For deciding which senses to include and not, please see the “Greedy or lazy?” section above.
If you’re marking up a big text like the I Ching or Tarot, it will take you many sittings. At the moment, the program just keeps going until the file is done. As mentioned above the program auto-saves after every sense selection commitment, but you can also force it to write out a file with the “w” command. To end before you have reached the end of the file, you have to manually quit the program.
Go through the entire document tagging the apodoses for WordNet senses and once you’re done and can send it to me via this Google Form.
Rights and responsibilities to the data
My intent is to use submissions individually and aggregated with other files as part of analysis, writings, visuals, and presentations. I may represent the results in text or data files or visuals. I eventually intend to write a book, and the results of this material will be included in a chapter. I plan to share marked-up files in online repositories for others to examine and build on.
If you have opted to grant a CC BY 4.0 license, I will credit you in documents and on slides in presentations where the data or derivative works made with the data (like visualizations) are included. To do that I’ll use the name you provide in the form.
If you have opted to grant a CC0 1.0 Universal license, I understand that to mean you wish to remain anonymous and will not credit you.
Both of these licenses permit me to modify submissions for things like correcting errors or adding metadata, like provenance and license, and to merge it with other data for aggregate studies, etc.
Thank you
Finally, I know it’s a lot of work. (I’m a little relieved I can share all the effort that has gone on behind the scenes!) Should you choose to, your help can help get these giant questions answered, and you can help ensure that you can get the content you think is important into the survey.