Interestingness

Roughly speaking, the best version of an “interesting” lecture or book is something that changes how we think, answers a question we’ve wondered about, or is simply an expression of novelty. An article we find “interesting” because it agrees with us is arguably an exception, but for this essay I would argue that even in this case what makes the content interesting is that it is a novel expression of an idea, clearer than we ourselves have, which fits my above criteria.

In contrast, non interesting materials may cover content we’ve seen before, are not in language we understand, or seem currently irrelevant to our lives.

Whether or not we consider a book, article, or lecture to be interesting is clearly driven by context: the language we speak, our careers, personal history, hopes for the future, and so on.

By design, most academic materials are curated to have possible future value to the student, even though the student may disagree that the material is useful, as often happens when young people are first introduced to math.

If we were to design a search engine to find books, lectures, or articles that we would find “interesting,” we have a few choices to make. Clearly we need variety, or we’ll see too much derivative material. There are likely fantastic materials out there that we could look for, if only we knew what words to use.

Amazon addresses this problem by feeding on other people’s knowledge: if you buy a book, you might proceed to the next book in the series, because a lot of other people did this already. If Amazon maintained metadata about books published by academics, they would find other clues, like whether the author presents at conferences, whether someone has written a book about them or citing them, and so on.

Amazon is also unable to judge the relationship an author has to their material. An author who is deaf or hard of hearing has the ability to engage with the subject of deaf history with a sense of ownership that someone else might not.

One way to solve these problems is clearly to bring expert knowledge to each topic, curating a list of materials to something appropriate. I would consider this to be a responsibility of university professors. Fortunately their selections are becoming more visible, as many now have class website that list required textbooks.

A second approach is that of library science: working from a taxonomy of topics, curate a list of materials, using measurable features to bring in or remove materials (e.g. patron requests and circulation).

With this orientation in mind, I think there are a few interesting offerings from the discipline of computer science which may offer assistance. Much research into “artificial intelligence” involves collecting a list of items, and finding those which are most related (or unrelated).

Google pioneered the approach of using citations to pass credibility from one webpage to another. This could be applied to books with footnotes. It could also apply to a network of conferences, universities, and speakers, and co-authorships, with each relationship passing credibility from one to the next. If you had access to such a network, you could find people who’ve had productive careers, and work out who they were grooming to replace them by who they are working with currently.

Natural language processing provides interesting insights as well: techniques now exist to determine the sentiment of a piece of text, and associate emotions with it, which would allow you to filter things that aren’t appropriate to your goals, like depressing talks from a company lunch-and-learn program.

Full-text search software is tuned to generate a distance score between two pieces of text, which could be used to compare an author’s biography to their book description. Alternately, entity recognition can be used to find people, places, and concepts referenced by text, which would allow you to determine where an author is from, and allow this to influence your selection of materials.

As software engineers have learned to handle ever larger volumes of data, efficient methods of sampling data have been developed. A high quality database of books or lectures should easily contain more than one person could consume in their entire life.

Consequently, sampling this for a good variety of materials is highly desirable. Ideally this would also give you authors from different backgrounds (nationality, academic/industry background, etc) to get the best variety of perspectives.

Interested? Stay tuned…