July 26, 2016

When it comes to making a bestseller, algorithms are good, but money is better


9781250088277Neil Balthaser, whose start-up Intellogo we’ve covered on MobyLives before, has heard about The Bestseller Code, the recent book by book-tech duo Jodie Archer and Matthew Jockers that takes as its subject the formula behind bestselling books. (We’ve covered this, too.)  And Balthaser, himself a book-tech gimme-data guru, agrees: machine learning can help predict a bestseller. This, despite the fact that Archer and Jocker’s “bestseller-ometer” awarded The Circle by Dave Eggers, not a bestseller, its highest bestseller-likelihood rating to date. Which, okay.

In any event, Balthaser took to Digital Book World last week to defend his compatriots’ cause, the bulk of which was a reiteration of stuff we’ve heard before (namely, that publishers are out of touch and don’t know what readers want and that’s a problem). He began with a point of clarification in reply to another response to The Bestseller Code. In a piece from last month entitled “Full text examination by computer is very unlikely to predict bestsellers,” Mike Shatzkin of The Idea Logical Company bluntly contests the likelihood of any one text analysis method, no matter how sophisticated, reliably predicting bestsellers. Balthaser:

While I agree in theory with Shatzkin that an algorithm alone cannot predict whether a book will be a bestseller or not, that isn’t precisely what The Bestseller Code claims, nor what our experience working with machine learning at Intellogo defines. What we aim to do is identify similar tones, moods, topics and writing styles to those books that are topping bestseller lists—as we can only do through algorithms—and, in this way, better understand the reading audiences’ desires. Machine learning allows us to do just that.

Fine. But let’s first back up to Shatzkin. Shatzkin is right to question the “97% certainty” of predicting a bestseller advertised by the publisher of The Bestseller Code, largely because of the sheer number of hurdles that face books in the marketplace unrelated to the factors the algorithm considers: plot, setting, characters, punctuation, etc. Instead, Shatzkin identifies the real bestseller-making forces that a text analysis algorithm simply cannot take into account:

[I]t is much more predictive of bestsellers to look at the number of copies shipped and how many stores the book goes into. The Nora Roberts or James Patterson title that ships tens of thousands of copies with some going to every Barnes & Noble store will become a bestseller, regardless of the plot structure. And the greatest book in the world that ships 5,000 copies and only goes to a handful of B&N’s almost certainly won’t.

It isn’t just “books in stores.” Amazon orders printed books too. And there are ebook pre-orders (although damn few for any unknown author). From a publisher’s perspective, the book for which they can get an advance commitment from the supply chain (which today means “get it out in quantity”) will always have a better chance than the greatest book in the world for which they can’t.

Shatzkin is right on this. A title’s high score doesn’t guarantee its future on any list, especially if it’s self-published (a market, it’s worth mentioning, that nearly all of these book-tech discoveries aim to make hay of) or published by a small house. Why? Because publishing is a slow-moving ship. There are big gears below deck that need massaging. Few parties have the keys to the nether chambers of the hull, and only a few more can talk their way down. For example, without sales reps in the field across the country (something many indie presses and all self-published authors lack) it’s very unlikely that your Triple A-rated book will appear in even a single Barnes & Noble (though you can try). But then, even still, just being there is not enough. Should you make it into the Barnes & Noble, it’ll cost your publisher a great deal more to secure your book the visibility of the large display tables (at Barnes & Noble this is known as The Octagon)—and that’s only if you’re invited. Then, only then do you have a real, predictable chance of becoming a bestseller.

Balthaser’s somewhat utopian response is somewhat unsurprising (this is tech, after all): “In this digital future, using machine learning platforms can provide publishers with opportunities to get real-time information about their readers, figure out what is working in the marketplace, and, perhaps, make the bestseller lists more of an accurate depiction of what readers want to read, not simply what is available.” (Emphasis mine.)

And he’s right: it’s a bummer that bestseller lists don’t necessarily reflect consumer demands. He’s also right to imply that it’s big publishing’s fault. But his conclusion here is unsatisfactory: “Though ‘big data’ can be a taboo subject when we talk about the romance of publishing, there are undeniable benefits to be had from using platforms that give publishers and authors information from which they can make informed decisions on how to invest their time and money.” These are benefits that most likely assist those who don’t actually need them (big publishers with a track record of publishing bestsellers), and not those whom the book-tech industry most frequently targets: self-publishers. Which is to say that while there is some merit to Balthaser’s assessment that algorithms can in fact assist the market in achieving its ultimate goal of being a mirror for consumer demands, they still do little good for small presses (who are usually small because they publish not what we want but what we need) and those with the fewest resources of all: the self-published.



Chad Felix is the Director of Library and Academic Marketing at Melville House, and a former bookseller.