college studentsI learned about the Open Syllabus Project in The New York Times in a piece by the project’s founders, Joe Karaganis and David McClure. It’s fascinating. The authors set out to scrape college Web sites and have put together the metadata for over 1 million syllabuses. You can find the open service they have put online here.  The project identifies all of the materials assigned to undergraduates and ranks them by frequency. It’s still in beta. More information is forthcoming, but like many projects of this kind, it is being put out into the broader community for comment and further exploration. I am eager to see where this will lead. A new dataset can lead to new insights, many of which were not anticipated by the people who initiated the project.

The authors make a point about altmetrics and how this dataset and analysis could lead to a better understanding of the impact of publications. The “standard” metric, Journal Impact Factor (JIF), is no stranger to readers of the Kitchen, of course, and its many limitations have been rehearsed endlessly (including the differences between STM and HSS fields and between journals and books). When a scholar cites another, that tells us something; but it also says something about a work when it becomes adopted in course after course around the country; and it says more and different things when it is used in classrooms in different courses and even different fields. Play with the service a bit and you will be intrigued. Daniel Bell’s The Cultural Contradictions of Capitalism appears on 69 syllabuses; Barbara Kingsolver’s The Poisonwood Bible appears on 265. There is some noise in the data: Kingsolver’s book has a second listing, which would add another 12 instances of classroom usage. That will get scrubbed in time. It’s probably fair to say that the data is highly suggestive but not definitive. Metrics mavens will want more.

For my part, I continue to be puzzled as to why JIF is such a source of controversy. Of course there are different metrics, and some of the new metrics are very good indeed at identifying some impacts. The question is what you are measuring for and to whom. The value of JIF lies primarily in the fact that it is an administrative shorthand used by promotion committees. It has value in itself, but its outsized value lies outside the journals themselves and their publishers (who of course are blamed for everything). There are many currencies in the world, but what do you do when you pull into a gas station and they only take American? Different metrics measure different things for different audiences. A new thing, a new metric, can have value without negating the things that came before it. I am reminded of TED founder Richard Saul Wurman’s remark that we live in “the age of also.” Thus we can have JIF and also a count of Twitter followers, Facebook “likes,” and a number derived from how often a text is used in a classroom. And I would add another metric, now maligned by one and all: How many copies did it sell?

I suspect, though, that the greatest value of The Open Syllabus Project is not in its assertion of a new metric, and certainly not because it is open, but because it gathers a great deal of information together in one place, where it can be analyzed. Book publishers who don’t have access to this data otherwise (the big college textbook publishers all have proprietary databases of course adoptions) will find some of the information to be eye-opening. For example, I would bet that Kingsolver’s publisher has no idea how many courses assign her books; and that same publisher probably knows the identity of only a handful of the instructors who require it. Extending this line of thinking, the publisher of a book that is similar to Kingsolver’s in some ways may want to find out who is using Kingsolver in the classroom. The syllabus database, in other words, can measure a form of impact, but it can also serve as a marketing database.

Other uses will be made of this data. Let’s unleash the data scientists, and while we are at it, let’s aim to enrich the data and to make it fully compliant with the requirements of text and data mining. Kudos to Karaganis and McClure for this. Now let’s see what happens next.


Joseph Esposito

Joseph Esposito

Joe Esposito is a management consultant for the publishing and digital services industries. Joe focuses on organizational strategy and new business development. He is active in both the for-profit and not-for-profit areas.

View All Posts by Joseph Esposito


13 Thoughts on "The Open Syllabus Project, Altmetrics, and a New Dataset"

Good discussion. The point about a book’s value as reflected in how many courses it is adopted (or so often, “required reading”) is excellent. The question about “how many copies did it sell?” is more often a publisher’s guarded proprietary information. However, some tools now exist that assist competitive surveillance for at least a portion of college adoptions.

What I find so interesting is that the text book list has not changed that much since 1989!

“I would bet that Kingsolver’s publisher has no idea how many courses assign her books; and that same publisher probably knows the identity of only a handful of the instructors who require it….the publisher of a book that is similar to Kingsolver’s in some ways may want to find out who is using Kingsolver in the classroom….syllabus database can also serve as a marketing database.” Actually, any editor working at any publisher who had any interest whatsoever in where his or her books got adopted could do this very easily pre-Open Syllabus. Just do a Google search with the book’s title in quotes, plus the word “syllabus.” I’ve been doing this for years.

One possible limitation for metric and market research alike: isn’t there a tendency among academics to list everything relevant to a topic under further reading in order to offer depth to the eager student — only the “essential reading” is a real mark of recommendation that something is important (as opposed to “correct”). I’m not sure that fact is reflected currently.

Fascinating project though, thanks for sharing.

You use the most amusing Internet handle I can think of. I jumped when I saw your comment in my In box.

This is fascinating, especially in the context of a mythical literary canon. When we analyzed required reading in Spanish, through the contents of reading lists from all PhD-granting institutions in the United States, the findings showed very little agreement. Only one work and two authors were on 100% of the lists (not counting Anonymous), and conversely over 1,000 items appeared on one list only, out of close to 50 lists. In my book Confronting our Canons I argue that we, the experts, need to work out some (continually evolving) agreement about what should be required reading in the field, for two reasons: (1) to strengthen and perpetuate our discipline and (2) to prevent external entities from imposing a literary canon on us.

Yes, we need to know whether the books listed in these syllabi are under “required reading” or “suggested reading.” The distinction is very important for sales.

There are two other possible applications of these data. (1) Can they reveal how difficult it is to predict which books, published as monographs, get adopted? The answer to this question is very important to university presses as they make decisions about which books on their list to include in ebook aggregations licensed to libraries. If presses guess wrong too often, they risk sacrificing substantial income from lost paperback sales. (2) The ARL has proposed the idea, in its “Code of Best Practices” for fair use, that a book (whether a scholarly monograph or a trade book like Kingsolver’s) can be digitized and used under “transformative use” because the student classroom audience is different from the audience for which the work was originally written and therefore the work has been “re-purposed.” If this argument succeeds under court review, then very substantial sales could be lost, including to the publisher of Kingsolver!

I’m a bit skeptical about this.

We publish an introduction to linguistics book that has been widely adopted for years. The title is Language Files and the printrun for the 11th edition released in 2011 was 45,000 copies, and we just sold the last copy in that run. No, that’s not a typo. In the first year of sales of that edition the book sold over 10,000 copies. It’s a ridiculously successful book for us. When you try the hack Philip mentioned above: “do a Google search with the book’s title in quotes, plus the word “syllabus.””, you come up with over 4,000 results. Searching for this title using the Open Syllabus Explorer results in zero matches. I even scrolled through the entire list of linguistics books just to be sure I hadn’t missed it and it simply wasn’t there. If it can miss a book that widely adopted, what else might be missing?

Thanks for sharing this Tony. Interesting that my method turns up so many hits while Open Syllabus turns up nothing. If someone wants to criticize my method for turning up too many false positives, it can be further refined by including the author’s last name, the name of the publisher, or by limiting the search to only .edu domains or a specific range of years. I’d be curious to know the quality of the search results for your book using those further refinements.

As the chair of our Faculty Advisory Committee for our University Bookstore (still run independently – not outsourced), I can’t help but wonder if there aren’t other and better sources of textbook adoption than mining syllabi if that is the goal. The Higher Education Opportunity Act (2008 I think) required that institutions receiving certain federal aid (essentially all institutions) provide textbook information as part of the class schedule when students register starting July 1, 2010. Seems it would be easier to sort out the required texts from a schedule listing than from the text on syllabi? Especially since faculty may or may not be required to even have a syllabus much less list the textbooks on it? I of course have never tried to get at this textbook listing data in any systematic way so I’m sure there are challenges. But, definitely seems the better source.

I think there are two points to be made here. First, to get at a good record of textbook adoption and use, there are several methods, and there is no reason to think we should not use them all. Second, the syllabus database may have value beyond simply tracking textbook adoption. It is a resource whose utility is still under investigation.

Leave a Comment