Lots of opinions for those who want to dispense knowledge digitally ...

Monday, June 27, 2011

The Little (Search) Engine that Could

I'm guilty, and maybe many of us are, of taking search engines for granted. We expect that when we put in a search phrase -- we will get back what we want. And we can definitely identify bad search: The notorious sites that never yield anything close to what is being looked for, or those which vomit back way too much to be useful. To be certain, having strong semantic metadata including a rich taxonomy will help yield the right results. But when sites work really well -- we often don't consider that there is a marvelous little engine pulling all the kibbles and bits together to make our user experience engaging.

When a company like Microsoft buys up a search engine -- like Fast Search and Transfer (FAST ESP) -- the install-base, or at least the guys and gals in charge, gulps. It has to wait for the inevitable day when the large company picks a favorite operating system and says, sorry we aren't going to support the rest -- but don't worry, we'll give you a year to sort it out.

Here's the problem. Ripping and replacing an entire search system is not a one-year effort. From the day a company begins a search for a new search engine, to when contracts are signed, SOWs are signed off on, and the plumbing, permissions, logistics, business rules and UX are considered -- well over a year if not two have passed.

When my boss, Mike Makeley, asked me to do an educational webinar on how MarkLogic server can easily swap out the FAST ESP search engine to keep all that plumbing in place, I was only lukewarm on the idea. Our educational webinars are usually geared to the execs and offer strategies on "the art of what is possible" when it comes to online interaction. Talking about the swap out of some boring old search engine sounded like changing the motor on the sump pump, necessary, but hardly glamorous.I mean how much more could we say after we showed that the MarkLogic server replacement program leaves all the FAST “plumbing” in place – and just replaces the FAST ESP engine. A very quick and relatively painless strategy -- that gives all the lift of MarkLogic – and none of the agita of a total replacement. But what I underestimated was the magnitude of what that total lift would be.

I learned from my two webcast guests -- Seth Shearer, MarkLogic's director of technical development, and Jagannath Saha, lead consultant for Avalon Consulting LLCs search practice, just how much enterprise search impacts all the audience engagement initiatives that a company offers. It makes perfect sense -- but I had never really stopped to think about it.

Gone are the days of the "10-blue-links" results page, said Saha -- today's search engine needs to be able to conform to any data representation -- density maps, bar charts, cloud tags, virtually any way you might want to visually present data. Here are some other things a search engine must do well:

The search box needs to support "typeahead" sometimes called "autocomplete" -- a fairly difficult-to-execute-well maneuver that presents readers with real-time suggestions. To be useful, explained Saha, these suggestions this must be executed in milliseconds -- no small challenge if the database is large. In a series of benchmarks by Avalon, MarkLogic was the speediest of leading search engines on scouring vast amounts of data and suggesting typeaheads.

Further, said Shearer, the typeahead must only suggest words eligible to the reader. "You don't want people selecting suggestions to documents that they aren't allowed to see," he said.

Consume all types of data
If your search engine is pickier than a two-year old, time to ditch it. It needs to be able to handle different types of data from - structured to unstructured, text files to binary. No excuse, no temper tantrums.

Not in-a-minute, NOW!
Search engines typically rely on relational databases -- which rely on an index to speed up queries. Challenge is there is often data residing in a queue waiting to be re-indexed. Depending on the criticality of the business -- and the volume of the data, that reindex may be done once a day -- or even once a week. An XML database indexes real-time -- meaning all data is available the second it is ingested by the search engine.

Consume all types of data
If your search engine is pickier than a two-year old, time to ditch it. It needs to be able to handle different types of data from - structured to unstructured, text files to binary. No excuse, no temper tantrums.

Not in-a-minute, NOW!
Search engines typically rely on relational databases -- which rely on an index to speed up queries. Challenge is there is often data residing in a queue waiting to be re-indexed. Depending on the criticality of the business -- and the volume of the data, that reindex may be done once a day -- or even once a week. An XML database indexes real-time -- meaning all data is available the second it is ingested by the search engine.

It’s no longer enough for a search engine to just serve – it must log, learn and analyze too. To increase engagement, any engine must be able to log the behavior of the users: monitor which content is being consumed, the path that a user took to get there, and be on the ready to deliver trending topics. It also must be able to ascertain where the user is. With the reliance on search through mobile, the engine needs to be able to track the geospatial coordinates of the user – and match that to geospatially relevant content. And mighty speedily at that.

The reality is, the search engine needs to be evolve to satisfy the business needs of the entire enterprise.And search-based applications need a powerful entity to drive development. No longer are search engines a dumb-waiter meekly serving up matter – they are a full-blown intelligence system that need to be able to give as good as they get.

Tuesday, June 14, 2011

An XML Primer

Wow. I just did a "Bing" on "XML" and found 88,300,000 results. The third facet on the results page (with faceted search being the reason I prefer Bing) was "XML Definition." Nineteen million pages fell under the "related searches" facet of XML definition. I zapped off a few other searches of popular tech terms and three-letter acronyms (RSS, IP address, namespace, API, RDF -- and none of them had a facet called "Definition."

So what can be made of this? If you attend any digital media seminar, workshop or webinar or sit in on any content strategy, XML is de rigueur, but could be it be that people are throwing out this TLA without really knowing from whence they speak? The answer is absolutely. And solution providers, technologists and product makers are guilty of not recognizing that the community is struggling to keep up.

Cathy Palmer and I partook in a web series put on by the IDEAlliance this morning on making the case for XML. IDEAlliance is a non-profit that develops standards and best practices surrounding publishing and technology -- it offers events virtually every week of the year depending upon practice area. Cathy is a trainer from New Horizons, a nationwide IT training company. A couple hours later I was listening to a webinar by Publishing Executive -- featuring two book publishing executives. Peppered liberally throughout both webcasts was our little friend XML. And then came the question asked in a variety of ways: "But what if we don't have XML, what do we do?" Cathy did a super job explaining how you can extrapolate XML from InDesign files, while I offered that another way is to use combinations of machines (semantic analysis engines) and man (offshore) to create XML.

But how do executives create a content strategy -- determining man, machine and markup if they don't have a rudimentary understanding of what this eXtensible Markup Language is all about? The definition is easy -- the why is more complex. XML is a decade-old method of mark-up that can be used to classify and add meaning to content so that it can be organized, “sliced and diced” and repurposed.

XML tags look similar to HTML (HyperText Mark-up Language) ones, in that they both use start and end tags but that’s about it with the similarities. HTML includes a set of pre-defined formats that impact how information is rendered, eg. the command (along with it's close command ) makes a word bold. Unlike HTML, XML does not have predefined formats (although it does use the same syntax) and display commands; instead XML provides a structure so you can effectively find information again.

This format agnostic markup language means you can categorize sections of content -- find them again -- and then transform them (using style sheets) to be ready for virtually any digital channel. XML allows bodies of content to be broken down into reusable components -- for instance, maybe you would like to markup statistics within a text -- particularly if you know that you will be researching for that same type of statistic again. Or maybe you want to markup quotations by luminaries; charts by researchers, lyrics to songs, ingredients to recipes. Having the ability to search, find and reassemble these components of content is the secret to repurposing.

There's more to XML than that -- but that's the high-level basics. The key to managing is understanding what you know -- and don't know -- and filling in the gaps.

Tuesday, May 17, 2011

Letting a Picture Tell the Story

A picture is worth a thousand words -- but did you know it could also represent one thousand pieces of content?

Two years ago I wrote about Financial Time's (shortlived) Newssift, a business site that strove to create a business model around a new type of search -- with metadata exposed. With a naive go-to-market plan and too many months (and dollars spent) in R&D, the patience of the parent company were taxed, and Newssift was shuttered mere months after it was launched.

As a business Newssift failed. But the concept to use the metadata to attract and engage readers -- even if the graphics were simple pie charts -- was a bold move. Today news organizations like the Washington Post, The New York Times & PBS are using vast troves of content to drive sophisticated visualization tools as a means of telling a more complex story -- in an engaging way.

In some cases they are tapping their own archives on a given subject, in others, they are tapping the crowds. In this NYT interactive that appeared on May 3, 2011, readers were asked to weigh the significance and emotional response to the news of Osama Bin Laden's death. Each comments was semantically analyzed by sentiment, and that metadata were plotted in the multi-dimensional graph. Each blue dot represents a comment, which can be clicked on and viewed. The Times tapped the audience to create a visual that sums up the visceral feelings of its readers. 

Patrick Sullivan, veteran UX guru, founder and CEO of Modus, a digital agency that helps organizations like PBS and Bloomberg create exciting digital and mobile applications and interfaces, finds this type of visualization tool a terrific way to engage and extend the long form story. "Look at the corners of the map -- where strong polarization of feelings reside," he explained. "Of course, overwhelmingly, there is a universal feeling of satisfaction and that this was a significant event."

All three variables of information were captured and related in this graph. As you mouse over the points, a portion of the comment appears.

So how do news organizations, government agencies, museums -- anyone with large amounts of content --  do it? It's simple really: by exposing the metadata -- which plays right into the strength of the MarkLogic server. Sullivan will be joining me and MarkLogic's Principal Technologist Matt Turner on Wednesday, May 18 webcast at 11am to talk about these visualization techniques -- and what organizations need to do in order to have the content in ready-form to drive these terrific engagement tools. Sullivan assures me that this graphic depiction can work on mobile devices too!

Monday, March 28, 2011

Digital Services & Goods Purchasing Trends

Pew Internet, a division of Pew Research, asked 1300 smartphone owners across the country whether or not they had purchased any of 16 different types of digital content.

Here is the breakdown of what they paid for, with 75 percent saying they have bought multiple items:
33% internet access
33% software
21% apps for cellphones
19% digital games
18% newspapers, mags or journals
16% videos, moveis or TV shows
15% ringtones
12% digital photos
11% members-only premium content
10% ebooks
7%  podcasts
5% tools or materials to use in video or computer games
5% cheats or codes for games
5% have paid to access particular sites such as online dating
2% adult content
6% misc other content

Who Buys
There are not a lot of surprises here. Men and women behaved similarly in buying -- with the exception of men outdoing women in purchasing software. The most prolific purchasers are those 30-49, with a college degree -- and in the higher income brackets. The average monthly spend was $47 -- however most spend about $10 a month.

"Skip" Prichard on Publishing

Skip Prichard, CEO of Ingram's Content Group and a long-time executive in the publishing industry gave the keynote at eContent's Buying and Selling conference in Scottsdale, Arizona. Being an alum of Lexis-Nexis, Elsevier, Ingram Books and Proquest, his career has been an homage to digital convergence.

He recalled being asked by a reporter to speak to the future of publishing, Prichard offered these tidbits:

  • Standard Operating Procedures -- need to be revised. Disrupt the status quo because you can become a slave to tradition, versus looking at the situation with fresh eyes. Chances are your competition does not have the same constraints as your SOP. 
  • Unprecedented Opportunity in terms of building audiences (look no further than GroupOn). Of course that opportunity means constant stress and pressure... because
  • Competition is everywhere. It is global -- and it is relentless. There are no resting on laurels.
While little of what Prichard said was new or a surprise to me, he refreshingly gave real-world examples of how publishers need to embrace change by deftly calling on his own experience. Too often, pundits and executives speak in platitudes and such generalities that the subject could as easily be about March Madness as it is about Digital Publishing. By drawing from his own experience it changes blah-blah to rah-rah -- and I believe, actually works to inspire people to take action.

That would be an interesting exercise. Poll conference attendees to find out how the information they learned will be applied. 

Thursday, March 24, 2011

Aetna: Rx for Engagement

"A retained member is not necessarily a happy member," says Melissa Jones, marketing manager for Aetna's New England regions, in addressing a Custom Content Council audience on why customer engagement is more important than merely measuring customer retention. To better engage Aetna's Medicare audience -- all over the age of 65 with an average age of 75, Aetna entered the world of content marketing.

Jones' objective was to turn around a trend of Aetna Medcare members voluntarily terminating their memberships by creating editorial content that was relevant to this demographic.  "We decided to create a pilot program to test the impact of an integrated communication program on member retention," she says. The goal was modest, a 1% improvement in retention to show the pilot's success -- and save the company millions.

The group came up with Healthful magazine, which has three standard sections: Health, Enjoyment and Reference. The editorial content is positive and slanted toward maintaining the emotional, mental and physical well-being of this adult audience. Jones was surprised at how important physical fitness was to this audience. "You can't believe how popular the Wii is at senior centers," she said.

In looking back, one of the harder chores was getting buy-in from the C-suite level. The group persevered and ultimately, the company realized a 6 percent increase in retention. The success easily offsets the investment undertaken.

The group is looking to offset costs by experimenting with advertising. A cover wrap was sold to HEB Health & Wellness. The wrapper was targeted to 25,000 homes in Texas - and more than 31,000 coupons were redeemed, an outstanding success. "Soft-sell" ads dot the magazine -- a feature on hearing encourages people to call a number to find out more about hearing aids, an article on adopting pets, includes a sidebar on how PetsBest offers health insurance for pets.

I asked Melissa what the percentage of ad revenue was against the cost of creating content. She was quick to tell me that the effort was merely a pilot, and that there was no intention of turning the entire magazine into an ad-supported title -- rather, it had to remain a value add.

She better not let the C-Suite know of the success of the ads then.

What Marketers Want

Before he was the Pariah of Hollywood, Mel Gipson starred as a chauvinistic ad executive who, after a lightening strike, could suddenly "hear"the secret thoughts of women. Of course he used this for his own personal gain.

Advertisers too have been trying to tap into the secret thoughts of users, either by watching behaviors online or by analyzing user comments. In turn they are taking this intelligence to create content that engages their audiences. This evolution of marketing messages from paid advertising to information development has been a bane for some media companies - and a boon for those who have created Custom Content departments.

At its annual conference in Charleston, SC, today, the Custom Content Council (CCC), revealed an updated benchmarket survey it had first conducted in 2006. CCC, whose mission is to match the needs of marketers with the information needs of audiences, commissioned Roper Public Affairs and Corporate Communication to survey 100 CMOs and find out their inner most thoughts on Custom Content -- also called Content Marketing.

These CMOs across a spectrum of industries continued to see the upside of using content to build good relationships, and in fact, saw the cost of producing it as "not too expensive." These marketers are using 15 different content channels to reach their audiences -- notably, websites and newsletters -- both e- and print, and now, video, mobile and webinars.

Roper flipped the questions around and asked audiences their perceptions of content -- and while they admitted that they knew this content was meant to be persuasive -- encouraging people to buy, they didn't care as long as it was informative.

The research, said Andy Seibert, President & Publisher of SmartMoney (a Dow Jones Company) and CCC board member, "showed we don't have to convince marketers of the benefits of custom content."

Turns out that what readers want, is what marketers want -- which is good news for customer content creators.