Thursday, August 31, 2017

AgilOne Adds New Flexibility to An Already-Powerful Customer Data Platform

It’s more than four years since my original review of AgilOne, a pioneering Customer Data Platform. As you might imagine, the system has evolved quite a bit since then. In fact, the core data management portions have been entirely rebuilt, replacing the original fixed data model with a fully configurable model that lets the system easily adapt to each customer.

The new version uses a bouquet of colorfully-named big data technologies (Kafka, Parquet, Impala, Spark, Elastic Search, etc.) to support streaming inputs, machine learning, real time queries, ad hoc analytics, SQL access, and other things that don’t come naturally to Hadoop. It also runs on distributed processors that allow fast scaling to meet peak demands. That’s especially important to AgilOne since most of its clients are retailers whose business can spike sharply on days like Black Friday.

In other ways, though, AgilOne is still similar to the system I reviewed in 2013. It still provides sophisticated data quality, postal processing, and name/address matching, which are often missing in CDPs designed primarily for online data. It still has more than 300 predefined attributes for specialized analytics and processing, although the system can function without them. It still includes predictive models and provides a powerful query builder to create audience segments. Campaigns are still designed to deliver one message, such as an email, although users could define campaigns with related audiences to deliver a sequence of messages. There’s still a “Customer360” screen to display detailed information about individual customers, including full interaction history.

But there’s plenty new as well. There are more connectors to data sources, a new interface to let users add custom fields and calculations for themselves, and workflow diagrams to manage data processing flows. Personalization has been enhanced and the system exposes message-related data elements including product recommendations and the last products browsed, purchased, and abandoned. AgilOne now supports Web, mobile, and social channels and offers more options for email delivery. A/b tests have been added while analytics and reporting have been enhanced.

What should be clear is that AgilOne has an exceptionally broad (and deep) set of features. This puts it at one end of the spectrum of Customer Data Platforms. At the other end are CDPs that build a unified, sharable customer database and do nothing else. In between are CDPs that offer some subset of what AgilOne offers: advanced identity management, offline data support, predictive analytics, segmentation, multi-channel campaigns, real time interactions, advanced analytics, and high scalability. This variety is good for buyers, since it means there’s a better chance they can find a system that matches their needs. But it’s also confusing, especially for buyers who are just learning about CDPs and don’t realize how much they can differ. That confusion is something we’re worrying about a lot at the CDP Institute right now. If you have ideas for how to deal with it, let me know.

Friday, August 25, 2017

Self-Driving Marketing Campaigns: Possible But Not Easy

A recent Forrester study found that most marketers expect artificial intelligence to take over the more routine parts of their jobs, allowing them to focus on creative and strategic work.

That’s been my attitude as well. More precisely, I see AI enabling marketers to provide the highly tailored experiences that customers now demand. Without AI, it would be impossible to make the number of decisions necessary to do this. In short, complexity is the problem, AI is the solution, and we all get Friday afternoons off. Happy ending.

But maybe it's not so simple.

Here’s the thing: we all know that AI works because it can learn from data. That lets it make the best choice in each situation, taking into account many more factors than humans can build into conventional decision rules. We also all know that machines can automatically adjust their choices as they learn from new data, allowing them to continuously adapt to new situations.

Anyone who's dug a bit deeper knows two more things:

  • self-adjustment only works in circumstances similar to the initial training conditions. AI systems don’t know what to do when they’re faced with something totally unexpected. Smart developers build their systems to recognize such situations, alert human supervisors, and fail gracefully by taking an action that is likely to be safe. (This isn’t as easy as it sounds: a self-driving car shouldn’t stop in the middle of an intersection when it gets confused.)

  • AI systems of today and the near future are specialists. Each is trained to do a specific task like play chess, look for cancer in an X-ray, or bid on display ads. This means that something like a marketing campaign, which involves many specialized tasks, will require cooperation of many AIs. That’s not new: most marketing work today is done by human specialists, who also need to cooperate. But while cooperation comes naturally to (most) humans, it needs to be purposely added as a skill to an AI.*

By itself, this more nuanced picture isn’t especially problematic. Yes, marketers will need multiple AIs and those AIs will need to cooperate. Maintaining that cooperation will be work but presumably can itself eventually be managed by yet another specialized AI.

But let’s put that picture in a larger context.

The dominant feature of today’s business environment is accelerating change. AI itself is part of that change but there are other forces at play: notably, the “personal network effect” that drives companies like Facebook, Google, and Amazon to hoard increasing amounts of data about individual consumers. These forces will impose radical change on marketers’ relations with customers. And radical change is exactly what the marketers’ AI systems will be unable to handle.

So now we have a problem. It’s easy – and fun – to envision a complex collection of AI-driven components collaborating to create fully automated, perfectly personalized customer experiences. But that system will be prone to frequent failures as one or another component finds itself facing conditions it wasn’t trained to handle. If the systems are well designed (and we’re lucky), the components will shut themselves down when that happens. If we’re not so lucky, they’ll keep running and return increasingly inappropriate results. Yikes.

Where do we go from here? One conclusion would be that there’s a practical limit to how much of the marketing process can really be taken over by AI. Some people might find that comforting, at least for job security. Others would be sad.

A more positive conclusion is it’s still possible to build a completely AI-driven marketing process but it’s going to be harder than we thought. We’ll need to add a few more chores to the project plan:

  • build a coordination framework. We need to teach the different components to talk to each other, preferably in a language that humans can understand. They'll have to share information about what they’re doing and about the results they’re getting, so each component can learn from the experience of the others and can see the impact its choices have elsewhere.  It seems likely there will be an AI dedicated specifically to understanding and predicting those impacts throughout the system. Training that AI will be especially challenging. In keeping with the new tradition of naming AIs after famous people, let's call this one John Wanamaker. 

  • learn to monitor effectively. Someone has to keep an eye on the AIs to make sure they’re making good choices and otherwise generally functioning correctly. Each component needs to be monitored in its own terms and the coordination framework needs to be monitored as a whole. Yes, an AI could do that but it would be dangerous to remove humans from the loop entirely. This is one reason it’s important the coordination language be human-friendly.  Fortunately, result monitoring is a concern for all AI systems, so marketers should be able to piggyback on solutions built elsewhere. At the risk of seeming overly paranoid, I'd suggest the monitoring component be kept as separate as possible from the rest of the system.

  • build swappable components.  Different components will become obsolete or need retraining at different times, depending on when changes happen in the particular bits of marketing that they control. So we need to make it easy to take any given component offline or to substitute a new one. If we’ve built our coordination framework properly, this should be reasonably doable. Similarly, a proper framework will make it easy to inject new components when necessary: say, to manage a new output channel or take advantage of a new data source.  (This is starting to sound more like a backbone than a framework.  I guess it's both.)  There will be considerable art in deciding how what work to assign to a single component and what to split among different components. 

  • gather lots of data.  More data is almost always better, but there's a specific reason to do this for AI: when things change you might need data you didn’t need before, and you’ll be able to retrain your system more quickly if you’ve been capturing that data all along. Remember that AI is based on training sets, so building new training sets is a core activity.  The faster you can build new training sets the faster your systems will be back to functioning effectively. This makes it worth investing in data that has no immediate use. Of course, it may also turn out that deeper analysis finds new uses for data even when there hasn’t been a fundamental change. So storing lots of data would be useful for AI even in a stable world.

  • be flexible, be agile, expect the unexpected, look out black swans, etc.  This is the principle underlying all the previous items, but it's worth stating explicitly because there are surely other methods I haven't listed. If there’s a true black swan event – unpredictable, rare, and transformative – you might end up scrapping your system entirely. That, in itself, is a contingency to plan for. But you can also expect lots of smaller changes and want your system to be robust while giving up as little performance as possible during periods of stability.

Are there steps you should take right now to get ready for the AI-driven future? You betcha. I’ll be talking about them at the MarTech Conference in Boston in October.  I hope you’ll be there!

*Of course, separate AIs privately cooperating with each other is also the stuff of nightmares. But the story that Facebook shut down a chatbot experiment when the chatbots developed their own language is apparently overblown.**

** On the other hand, the Facebook incident was the second time in the past year that AIs were reported to have created a private language.  And that’s just what I found on the first page of Google search. Who knows what the Google search AI is hiding????

Sunday, August 20, 2017

Treasure Data Offers An Easy-to-Deploy Customer Data Platform

One of my favorite objections from potential buyers of Customer Data Platforms is that CDPs are simply “too good to be true”.   It’s a reasonable response from people who hear CDP vendors say they can quickly build a unified customer database but have seen many similar-seeming projects fail in the past.  I like the objection because I can so easily refute it by pointing to real-world case histories where CDPs have actually delivered on their promise.

One of the vendors I have in mind when I’m referring to those histories is Treasure Data. They’ve posted several case studies on the CDP Institute Library, including one where data was available within one month and another where it was ready in two hours.  Your mileage may vary, of course, but these cases illustrate the core CDP advantage of using preassembled components to ingest, organize, access, and analyze data. Without that preassembly, accessing just one source can take days, weeks, or even months to complete.

Even in the context of other CDP systems, Treasure Data stands out for its ability to connect with massive data sources quickly. The key is a proprietary data format that lets access new data sources with little explicit mapping: in slightly more technical terms, Treasure Data uses a columnar data structure where new attributes automatically appear as new columns. It also helps that the system runs on Amazon S3, so little time is spent setting up new clients or adding resources as existing clients grow.

Treasure Data ingests data using open source connectors Fluentd for streaming inputs and embulk  for batch transfers. It provides deterministic and probabilistic identity matching, integrated machine learning, always-on encryption, and precise control over which users can access which pieces of data. One caveat is there’s no user interface to manage this sort of processing: users basically write scripts and query statements. Treasure Data is working on a user interface to make this easier and to support complex workflows.

Data loaded into Treasure Data can be accessed through an integrated reporting tool and an interface that shows the set of events associated with a customer.  But most users will rely on prebuilt connectors for Python, R, Tableau, and Power BI.  Other SQL access is available using Hive, Presto and ODBC. While there’s no user interface for creating audiences, Treasure Data does provide the functions needed to assign customers to segments and then push those segments to email, Facebook, or Google. It also has an API that lets external systems retrieve the list of all segments associated with a single customer.  

Treasure Data clearly isn’t an all-in-one solution for customer data management.  But organizations with the necessary technical skills and systems can find it hugely increases the productivity of their resources.  The company was founded in 2011 and now has over 250 clients, about half from the data-intensive worlds of games, ecommerce, and ad tech. Annual cost starts around $100,000 per year.  The actual pricing models vary with the situation but are usually based on either the number of customer profiles being managed or total resource consumption.