The players and plays of the modern business analytics ecosystem. A survey of data storage solutions, data middleware, business intelligence vendors, data science solutions, data storytelling boutiques, and other institutions in the wider business analytics market.
This piece is an attempt to gather all of these vendors on one searchable page for convenient reference.
I'm not trying to be exhaustive, merely representative. However, do let me know if I'm missing your favourite vendor.
Last Updated: 2021-06-13
The core of business analytics is the idea that an enterprise can use data to improve both its products and services as well as its business operations. This data can come from internal processes, from customers and the marketplace, from local events and global trends, from sales and marketing efforts, from third parties, and many other sources. Companies compete on their ability to make sense and good use of all the data they have access to.
Most business analytics pipelines begin with data storage. There are relational databases for well-structured data sets and various specialist data stores for specific applications and access patterns. Individual databases often get pooled and abstracted away from applications with the help of a data middleware layer. This middleware can simplify, secure and consolidate data access patterns, but more than that, this extra layer can also accelerate data processing and improve data flows in and out of the stores. This results in a better experience for data consumers.
In the application layer, business analytics divides roughly in two: data science and business intelligence (BI). There's much more to the story, but this is a good first approximation. Data science, here including AI and machine learning, is about having the machine look at the data in order to build a model. In contrast, business intelligence is a primarily human activity, where the objective is to generate and communicate data-driven insights. Both seek to produce useful results than can be used to upgrade products and services and operations.
Data storytelling is a fresh approach to business analytics, particularly BI, with a focus on effective communication. Data storytelling is the craft of presenting data-driven insights in a compelling narrative form in order to drive action and change within an organisation. Data storytelling solves the last mile problem of business analytics.
This piece is a hopeless attempt to find some kind of order among the vendors and solutions in the wider business analytics sphere. I'll quickly run through the main names in data storage and then spend some time on data middleware solutions. For business analytics software, I decided to group vendors by revenue, to the extent that I could find recent numbers for them. I'm fairly confident I captured most major BI vendors, but the data science vendor coverage is spotty at best.
Data storytelling is still a fairly new concept, so the coverage is even more tentative. On one hand many vendors have some related functionality, and on the other hand there aren't enough true data storytelling companies in this space to even declare that a space exists yet. However, this is the category I'm most interested in, so I've given it a good chunk of attention. I'd be delighted to hear about other players and plays in this space, if you know of any.
Finally, just for completeness, I added a section on the wider ecosystem. This includes everything from data providers to consultants that can help you on your business analytics journey.
There are many ways to store enterprise data, several major vendors and numerous competing data storage approaches. Every business data problem domain, every industry niche, is saturated with solutions. Additionally, given the malleability of software, pretty much ANY data storage solution can be (ab)used for just about ANY application, up to a point. Further, companies tend to stick with their choices, unless they have a compelling reason to switch. Database solutions generally last for as long the company itself lasts — sometimes decades.
When it comes to vendors, the old guard is still very much around, but there are strong newer players as well. Both hardware and software solutions has evolved significantly over time, but the mission has remained the same: enterprise data storage is about organising and preparing data for use in business applications.
Solutions in this space compete on the efficiency with which data can be moved in and out of the store, as well as on operational convenience and ease of use. Many data storage vendors wish to be more than just a database, bundling advanced query capabilities and even basic analytics into their product. Some vendors focus on being a trusted component in a layered approach, others seek to offer complete verticals.
We'll briefly go through some of the more visible data storage names here, primarily to delineate them from the rest of the categories to follow. Within the data storage space, much of the categorising is somewhat arbitrary as all systems see many kinds of usage, and many solutions support multiple operational modes.
The DB-Engines initiative is a great resource for all things data storage.
Relational databases have been around since the 1970s. Relational database management systems (RDBMS) are typically powered by SQL, Structured Query Language. The same language is used for querying data and for maintaining the database itself.
There are both commercial and free solutions in the RDBMS space. Oracle (with Oracle Database), IBM (with Db2), and Microsoft (with SQL Server and Access) lead in the commercial space, followed by SAP (with HANA and legacy Sybase) and a whole host of smaller players.
Cloud databases are generally either virtual machine images (of database management systems) or "native" cloud data stores in a Database-as-a-service (DBaaS) setup. All major cloud platforms have their own managed or serviced offering in addition to plain server hosting.
Amazon Aurora is Amazon's cloud-native relational database, Amazon RDS is an abstraction layer on top of standard database engines. Google Cloud has a native solution, Cloud SQL, and a distributed service called Spanner. Microsoft has Azure SQL and a selection of managed database solutions.
SingleStore (formerly MemSQL) is a distributed relational database, notable for its early orientation towards in-memory workloads. SkySQL is a MariaDB powered hybrid cloud DB that aims to help developers get by with a single database system.
Key-value databases approach data storage from a rather different point of view to RDBMS, focusing on a key-based lookup query pattern. In a key-value database the key is a globally unique identifier, while the value can be anything from an unstructured document to a rich object or record with many independently addressable fields.
Where relational storage is focused on relationships between entries and supporting complex queries, key-value databases optimise for scalability and operational simplicity. Key-value stores are effectively glorified dictionaries. Operationally, key-value stores are closely associated with the NoSQL school of data management, featuring usage patterns commonly seen with real-time web applications.
For caching systems, where values are never persisted (stored permanently) memcached and Redis are popular choices. Riak, Voldemort, and Apache Cassandra are focused on low latency and high availability. etcd is used in systems with strong consistency requirements.
For key-value transaction patterns, LMDB and FoundationDB are popular choices. Aerospike is focused on distributed transactions. For pure document storage, MongoDB, CouchDB and RethinkDB are popular choices.
Most standalone key-value solutions are open source in some sense, or even free software.
Time Series Databases
Time series databases are specialised systems for timestamped record keeping. These databases are optimised for temporal queries and high volume data ingestion. The idea is that when data is organised in a temporal order, certain aggregations and window functions are easier to compute. Time series databases see lots of use in the financial sector, and in IoT for sensor data capture, as well as in all kinds of event logging and monitoring systems.
Some of the popular players in this space include InfluxDB and Apache Druid. TimescaleDB is a relational time series database built on top of Postgres. QuestDB and ClickHouse are also worth a look. Prometheus and Graphite are focused on system monitoring and metric processing.
Datomic doesn't quite fit in this group, but is worth mentioning as the leading Datalog database. Datomic approaches the problem of data storage in a fresh way by treating time as a first class feature of data entities.
Graph databases are data management solutions for cases where the data is naturally structured in the form of an entity graph of nodes and edges. Organising data in a graph makes the corresponding queries and operations intuitive and efficient, often effectively instantaneous. Graph storage makes sense when data has natural locality and queries focus on relationships between specific individual entities.
The most common modern use case for graph databases is social media and related data sets. Some enterprises have found success with a "knowledge graph" view of their master data and core operations. Within the wider data storage world, graph databases are a relatively niche product.
When it comes to query languages, there hasn't yet been as much consolidation as with relational databases. Gremlin, Cypher, and SPARQL are currently the main approaches. SPARQL is notable as a semantic query language example, for RDF systems.
Neo4j is the dominant graph database management system. Some of the more viable smaller players include TigerGraph, the RDF focused AllegroGraph, and the Linux Foundation supported JanusGraph. In the knowledge graph analytics space, we have small players like Conweaver, Cambridge Semantics, and Maana.
On the cloud-native front we have Amazon's Neptune and the multi-model Azure Cosmos DB. Both support multiple graph data query and management patterns.
GraphQL deserves a mention as well. GraphQL is a query and manipulation language focused on the application programming interface, the API. Despite the name, GraphQL is not a graph database or even a true graph language. Instead, GraphQL is a different way to think about web service architecture, and specifically all the data flows, behaviours, and system operations that happen at the service interface level.
Enterprise search engines aim to make diverse enterprise-wide data sets searchable. Solutions in this space ingest or index data from a variety of sources within the company, and then make it all available for people to discover through a common interface. The idea is the same as with web search and desktop search, but for business data.
Search engines typically support workflows where users are searching for relevant information based on keywords or, in some cases, a more complex query. The search typically spans databases, file systems, document stores, and more. Notably, searching can span both structured and moderately unstructured data sources.
Some of the most popular solutions in this space include Elasticsearch, the open source Apache Solr, and Splunk. Elasticsearch and Solr are both powered by Lucene, a free search engine library. Splunk is focused on logging, monitoring, and other machine-generated data.
In terms of languages, the approaches vary significantly. Elasticsearch makes use of a JSON based query domain language, Query DSL. Solr supports several query parsers, but is primarily a search term system. Splunk has its own bespoke query language, Search Processing Language, SPL.
With data warehouses we are slowly moving higher up the business analytics stack. An enterprise data warehouse is a central data repository that can load in data from multiple sources and then produce specialised data marts or other views for various analytics and business intelligence workflows and other analytical consumers.
Any reasonably modern approach to data warehousing is focused on the cloud: the days of on-premise warehousing are numbered, except perhaps for some absurdly sensitive or otherwise complicated data. Indeed, cloud data warehouses are a driver a for migrating companies away from on-premise setups.
The main activities with data warehouses are the same as with standard databases — moving data in and out. With data warehouses the focus is on organising data in a scalable way that makes sense for enterprise-wide data analysis. Databases are typically concerned with entities and their relationships, often in a transactional model. In contrast, a data warehouse can present data in denormalised views for easy analysis.
Major data warehouse vendors include the old guard software giants IBM (with Db2 Warehouse), Oracle, and SAP (with Data Warehouse Cloud). Some popular specialist vendors include Teradata, Yellowbrick Data, and many smaller players such as Panoply.
Apache Hive is another take on data warehousing, built on top of the now-obsolete Hadoop ecosystem.
In the managed cloud-native space we have several options. The usual suspects have their own systems, with Amazon's Redshift and Google's Bigquery leading the pack. Microsoft has the Azure SQL Data Warehouse and the more vertically integrated analytics data warehouse solution Synapse.
However, the system that everybody is talking about is a whole new entity. Snowflake is a cloud native data warehouse solution, or simply a "data cloud", as they put it themselves.
Mario Gabriele argues in a post that the recently IPO'd Snowflake effectively invented the cloud data warehouse model, well before the cloud platforms figured it out for themselves. Snowflake's key innovation was to separate storage (where the data lives) from compute (what to do with it), enabling a separation of concerns and cloud vendor agnostic "multi-cloud" setups.
In short, Snowflake has managed to become a standard abstraction layer piece in many cloud business data analytics stacks — and it sits right in the middle.
Next up, we'll look at some other data middleware solutions that, like Snowflake, sit somewhere between data stores and the analytics applications. The purpose of these layers is to hide the often messy and disparate data storage solutions from the applications that work best with well-behaved data. This improves the overall analytics data flow.
In recent years, as we all have tried to get a handle on Big Data, many new components have been added to the business analytics pipeline. In particular, many pieces fit somewhere in the middle, between where data is being generated and where it powers applications.
In this section we'll consider solutions that seek to rethink the data warehouse pattern or somehow try to accelerate data processing tasks. We'll also look at systems that try to enhance data in some way before it reaches consumers.
When it comes to data middleware, there are numerous technologies and vendors to choose from. All of the following products and services and technologies can be seen as building blocks for enterprise data platforms. The classification is somewhat arbitrary, but at the same time it's useful to have at least some kind of a basic mental model of how things fit together.
A data lake is a repository for business data, with an emphasis on scale. If the term meaningfully differs from a regular data warehouse, it does so because of the technology and the choices that go into its implementation. All in pursuit of scale. Data lakes are primarily about technology, and as such, solutions and projects in this space typically fail because of non-technical concerns.
In some circles, data lakes get derided as data swamps. The number one reason why clear data lakes turn into muddy waters is simply neglect. If people aren't actively using and managing the data being stored, data consumers in the company may find the lake inaccessible and unreliable, and quickly lose interest. And that becomes a vicious cycle.
Cloud data warehousing vendor Panoply has written a helpful guide to modern data warehousing. One of their articles drills down on the difference between data warehouses and data lakes.
In the article, Panoply argue that a data lake, unlike a data warehouse, is focused on storing raw data instead of processed data. Similarly, reads and processing are done without a fixed schema. Data lakes also tend to have more savvy users, who can make more specific queries for their data needs. In a warehouse or data mart model, business analysts work with more standardised data sets prepared for them in advance.
In other words, the warehouse data model is persisted, while a lake has a more fluid nature.
Amazon's definition of a data lake matches this description: for Amazon, a data lake is just scalable object storage (S3), with access controls and governance on top. The data lake can be consumed directly via Amazon Athena, or it can appear as just another source in a Redshift setup. Indeed, Redshift is itself more of a lake house — a data warehouse when needed, but flexible enough to be a lake as well.
The lake house concept, pioneered by Databricks, pushes the data lake idea to its logical conclusion by having the infrastructure abstract away all of the data systems. There's only a uniform API layer with which to integrate applications. Some vertically integrated analytics solutions can then take the final step and wrap the data API together with a fully featured analytics platform and sell the whole thing as a bundle.
Consulting shops like Cloudwick take all this infrastructure and build their own turnkey value added solutions on top, in a "data lake as a service" scheme. Cloudwick's solution goes by the name Amorphic.
Data integration is the practice of combining independent data sources or data sets. The idea is that connected data sets enable more sophisticated analysis. In some sense data warehouses and data lakes solve exactly this problem, but at the same time they are both fairly low level and technical in their orientation. Data integration can perhaps be understood as a more high level view of rich data set construction.
Historically, data integration has followed the development of the ETL process model — Extract, Transform, Load. ETL is a relatively heavy data process where data engineers build a data warehouse from existing databases. Nowadays data stores are perhaps more interoperable.
ETL is often contrasted with ELT integrations, where the transform and load operations are done in reverse order. In ELT data is stored raw, as is. ELT is in some sense the true Big Data orientation, as this way data transformations do not limit the scale of the data capture operation.
Fivetran is a data integration solution "focused on analytics, not engineering". In some sense Fivetran reinvents the data warehouse wheel, but with modern practices, reliability and data products as first class citizens. Stitch by Talend and Xplenty both do modern ETL as well.
Segment is perhaps the leading integration layer player for customer data. Segment takes mobile and app events, and presents them to analytics applications through a clean API. Snowplow is a smaller player in the same space, a "behavioural data middleware" solution with a focus on data quality. Qualtrics, ContentSquare, Tealium, Dynamic Yield and many others have solutions in this fairly crowded space.
If you start from customer analytics and take a step towards company wide operations, you'll soon discover the extremely crowded space of customer relationship management (CRM), starting with giants like Salesforce and strong players like Heap and Amplitude and Mixpanel.
Data integration is also a native component in many fully featured analytics platforms, which we'll consider later.
Once data has been loaded in to a warehouse or a data lake, there's still improvements to be done before the data is consumed in analytics applications. Every library needs a librarian, every gallery needs a custodian, every garden needs a gardener, and every property needs a caretaker. I call all post-load data processing activities "data curation".
dbt, "data build tool", by Fishtown Analytics is a data transformation solution for data engineering and analytics self-service. In other words, dbt is focused on the T in ELT: you can ingest raw data into your storage solution and then transform it later to meet your requirements.
Quality assurance is another post-load data curation activity. Upstart Datafold is an "observability platform", providing automated testing, data profiling and anomaly detection for enterprise data streams. Acceldata is another fairly small player in this space. All kinds of time series databases play well together with data observation tools.
When event logs and other metrics are introduced to the mix as data flows, the game changes somewhat. In the log visualisation and analytics space we have players like Grafana and Coralogix. Splunk, a search engine for logs, is a major player here. New Relic is another observability/monitoring shop, focused on telemetry. Sumo Logic is here, too.
Reltio is a master data management (MDM) tool, a solution for organisations struggling with data duplication. If the same data is stored in many places, consistency becomes a huge challenge. Master data, maintaining a "single source of truth", is the recommended approach. Riversand and Profisee also operate in this space.
There's more to master data than just ensuring consistency within an organisation. Talend and Alation have solutions not just for data integration, but for data integrity, governance, data stewardship, and more. ZenOptics is a business intelligence upstart with a catalogue product focused on analytics governance.
Early DataOps pioneer Pentaho merged with Hitaci departments to form Hitachi Vantara. Denodo believe in "data virtualisation", the practice of abstracting enterprise data repositories behind a shared interface.
Data acceleration refers to data flow enhancements that can be done to speed up data processing as data moves from storage to analytics. Data processing can also be accelerated by doing the work in batches in a distributed setup, or by doing it in real time in a continuous fashion as soon as the data arrives.
Dremio positions itself as a "data lake engine", in effect serving as an abstraction layer between storage and applications. The idea is that you can leave your data where it is, and Dremio makes it efficiently queryable through standard interfaces, appearing as just another relational database. Dremio provides a "self-service semantic layer", a flexible data platform for data consumers.
Notably, Dremio is powered by Apache Arrow, a modern standard for data storage and efficient data processing. Arrow's Flight interface is an improvement over previous database access interfaces, such as Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC).
Kyvos is another data accelerator, with quite a different approach. Kyvos target business intelligence applications directly in the form of a machine learning augmented query engine. Their solution, "Smart OLAP", provides not just an interface to cloud hosted data, but also automatically generated data aggregates and other analytics insights. The Kyvos approach can be seen as a refresh of the classical data warehouse query model.
Ververica, formerly data Artisans, sells a real time data processing platform built around Apache Flink, a stream processing engine. Stream processing can be thought of as a continuous version of a batch processing model: data comes in as a sequence, and a certain set of operations gets applied to each item. This stream view of data is highly useful for monitoring and control systems and other situations where processes need to react quickly to changes. Stream processing is closely related to even processing, where the idea is to react to events that occur irregularly.
Ververica is a small player, but one of the early companies to have bet on Apache Hadoop, a now mostly obsolete distributed data processing approach. Some of the pieces of the wider Hadoop ecosystem are still viable, but the original MapReduce programming model has been superseded by more direct approaches. MapR was one of several companies to rise and then fall with that particular tech stack. Cloudera has fared better after merging with Hortonworks. Smaller player Datameer was able to pivot away from Hadoop.
Apache Spark is another batch and streaming data solution, in a sense the king of the hill after Hadoop's demise. Several companies have built data pipelines and platforms around the technology, with Databricks naturally leading the pack. Databricks was founded by the UC Berkeley AMPLab team that produced the original Spark research.
Ascend.io is a data pipelines and data engineering focused Spark player. As engineers building for engineers, they emphasise declarative specification, automation and analytics self-service. Upstart Guavus is another Spark vendor.
Presto, originally developed by Facebook, is another data pipeline technology, another take on a data platform. The Presto approach is to use a distributed, or "federated", SQL engine to hide data storage solutions behind a common interface. A Presto cluster is notoriously difficult to maintain, and so naturally there are cloud native vendors offering it as a service. Ahana, Varada, Alluxio, and Starburst are all fairly small players in this distributed SQL niche.
Business intelligence, BI, is the main branch of business analytics. The task in BI is to generate data-driven insights and to communicate those insights effectively with the rest of the organisations. The objective is to drive action and change. So far, this has usually been done through reports and dashboards.
In this section we'll consider major business intelligence vendors grouped by total company revenue — that is, not just revenue from analytics. We'll also point at some of the smaller players in this perhaps surprisingly crowded software market. We'll also throw in a few free solutions.
Enterprise Software Giants (+$10B)
We'll start with vendors that have a wide enterprise software portfolio. At this level vendors have often multiple solutions and in a variety of software categories, but their public filings often don't break things down very far. In other words, figuring out how much revenue these vendors generate from business analytics in particular is hopeless, so we won't even try.
The strategy for all players in this space is the same: sell a whole bundle of stuff for the whole enterprise, and try to add extra value by exploiting integration opportunities between the components you own. The challenge is in getting all the integrations to work as planned with the customer's data, workflows, and existing systems.
Microsoft is a leading business analytics vendor. Power BI is the main business intelligence solution, but integrations with the Office suite are the real product. Microsoft's SQL Server and its services suite — SSRS, SSIS, SSAS — complement the modern tooling with classic data warehouse analytics flows. The Azure cloud ecosystem, with its augmented analytics products and other capabilities, is a big part Microsoft's Power Platform self-service offering.
Historically, Google hasn't had much success in the business analytics space. Google is a major player in the cloud infrastructure game and cloud services, but until quite recently their only play in consumer data and business analytics tooling was Google Data Studio. The studio supports easy integration with Google analytics and has basic functionality for data connectivity, charts and dashboarding, sharing and collaboration, remixing and so on. There's also Google Cloud Datalab for data wrangling, and services like Colab for data science.
Things changed for Google in 2020, when the company announced the acquisition of Looker, a business intelligence solution that wasn't even running on Google Cloud at the time! Looker was a $150M revenue business, a modern BI solution for dashboards and integrated insights, data-driven workflows, custom apps, and more. Looker is a solid business intelligence solution for data-driven insights, with a focus on "data-driven experiences".
Amazon is in a similar situation as Google: strong cloud offering, but less of a presence in the end user solution space. Amazon's QuickSight is a capable BI solution, leaning heavily on serverless scale. QuickSight is BI with fully managed AWS goodies, all the way to pay-per-session pricing. Dashboards, natural language queries, augmentations, embedding, mobile — it's all there. Unbeatable connectivity. QuickSight is powered by SPICE, a "super-fast, parallel, in-memory calculation engine", so you get data acceleration out of the box as well.
Other cloud vendors are not far away. Understandably, nobody wants to just do commodity infrastructure. Why stick to the platform, when you can own the application space as well? For example, Alibaba Cloud has Quick BI.
Moving on to business software powerhouses, the name of the game is to sell a comprehensive business software bundle and then throw some basic analytics on top. It doesn't have to be great: the value is in the integrations between the app and all the other solutions you provided to the different business functions. So you sell, say, a marketing or a sales system and then a product development team can connect into those insights and vice versa. Business analytics is always pitched here as an enterprise wide layer helping people generate data-driven insights.
German software giant SAP has the Analytics Cloud solution, complete with predictive analytics. "Analytics as a planning tool" is the pitch. SAP has a long history of bringing analytics to the enterprise, having acquired pioneer company BusinessObjects back in 2007, when Big Data was barely a concept. SAP is famous for its powerful, yet unwieldy enterprise resource planning system, the latest iteration of which is the insanely named SAP S/4HANA. Long implementations, often at great cost and low satisfaction, but one can gain analytics driven planning out of it.
Oracle have their own Analytics Cloud, an integrated platform for everything from "self-service visualisation and powerful inline data preparation to enterprise reporting, advanced analytics, and self-learning analytics". Everybody hates Oracle, but they sell hard, and they know what executives value. "Oracle buyers are buying peace of mind, not a piece of software." Oracle was built around their early relational database success, but nowadays Oracle wins by a thousand little cuts. Oracle got into analytics early on in 2007 by acquiring BI pioneer Hyperion.
IBM has an identical story. Their main BI brand is Cognos, a cloud analytics solution with origins in yet another BI pioneer of the same name, acquired in 2008. Cognos is pitched as an "all-inclusive BI solution for faster, more reliable data prep and reporting". IBM also have their statistical analysis toolset, SPSS, acquired in 2009.
The age of big acquisitions is far from over. Cloud CRM giant SalesForce made a bold move in 2019, acquiring the leading interactive BI toolmakers Tableau. Tableau was — and remains — the only true business intelligence operation with more than $1B in revenue.
"World's #1 CRM and #1 analytics platform come together to supercharge customers' digital transformations", the Salesforce press release read. Combining powerful BI and a comprehensive business data system is certainly a potent mix. Tableau is a sleek solution, managing to stand out in an industry saturated with OK solutions. Interestingly, Tableau isn't really cloud-native, relying on on-premises or equivalent deployment.
Finally, we have Adobe, with their customer analytics focused Experience Cloud, a collection of tools for digital marketing: content management, customer journeys, commerce, advertising, and more. This is business analytics with a focus on the customer. The main analytics tool is Adobe Analytics. Marketo, acquired by Adobe in 2018, is a leading solution for outbound B2B marketing.
Put together, the giants make up at least half of the modern business intelligence market.
Major Vendors ($1B-$10B)
In the major vendors tier, we have a selection of companies with a wide portfolio of products. Business analytics represents only a fraction of total company revenue for these players, but analytics and related activities add significant value to their platform plays. Many of these multi-billion dollar companies are fairly low profile — the brands are certainly not household names.
ServiceNow is a cloud-native enterprise software company with roots in the IT service management market. Their services power help desk and other support functions in larger organisations. This is business analytics through an event and incident lens. ServiceNow is currently expanding to cover more of the technical enterprise software market, focusing on DevOps, "developer operations". Over the years, they have made several interesting acquisitions, including Lightstep and the much hyped, yet aimless Element AI.
SAS is a privately held analytics solutions company founded back in 1976. Originally, SAS focused on statistical software for specific scientific uses, but today the company serves industries across the board. The stated mission is to "help every organization in every industry make smarter, data-driven decisions". SAS has always invested heavily on research and development, and after decades of operations, they have an extensive software suite to show for it.
Speaking of statistical analysis. Statistics and mathematics software is big business: Mathworks, the company behind Matlab and Simulink, is a major vendor. MiniTab is a veteran statistical software vendor with decades in the game. Wolfram — maker of Mathematica and related apps — and vendors like Maplesoft are smaller players in the mathematics software space. For more, see this list on statistical software.
OpenText, the largest software company in Canada, is in the information management business. Theirs is a process view of analytics and business data: the main task is organising, integrating and governing data as it passes through various business processes. They have solutions for marketing, process automation, business transactions, content management and more. Analytics, under the Magellan brand, brings it all together to drive action in the enterprise.
Infor is a major enterprise software vendor, "the amalgamation of more than 40 software acquisitions brought together by private equity backers," as Forbes put it in 2015. The main analytics product is the cloud-native Birst, acquired in 2017. Birst wasn't and isn't a leading BI platform, but it's more than enough for basic BI for customers who have already invested in Infor's data platform.
TIBCO is another major enterprise systems vendor, one of the few dotcom boom startups to make it big and survive the crash. The company was founded in 1997 and went public on NASDAQ only two years later. In 2001 the company was valued at $2B. Since then, the company has grown primarily through acquisitions, eventually getting bought out by private equity for $4.3B in 2014.
TIBCO's main solution for analytics and business intelligence is Spotfire — "Immersive, smart, real-time insights for everyone." The focus is on dashboards, augmented analytics and predictive analytics. In 2014, TIBCO also bought out Jaspersoft, a minor BI vendor focused on embedded analytics. In 2020, the company agreed to purchase veteran BI vendor ibi (Information Builders). Ibi is known for its FOCUS query language and related data solutions.
Informatica, founded 1993, is another early analytics player. Informatica also rode the bubble on NASDAQ and was then acquired in 2015 by private equity. Informatica have a data integration and data engineering focused product portfolio. Cloud analytics provides the the query and data insights layer on top.
In the security/intelligence market, with an emphasis on the US public sector, we have the consulting-driven analytics giant Palantir and the equally large surveillance tech vendor Verint. Recorded Future and Novetta are smaller players in the same security focused (business) intelligence niche.
Finally, we have the remarkable story of Hewlett-Packard and its 2011 acquisition of Big Data pioneer Autonomy, one of the crown jewels of the British software scene. H&P acquired Autonomy at an "absurdly high" $11.7B valuation, only to write down an eye-watering $8.8B related to the acquisition only a year later. The mismanagement and fraud charges legal dispute is still ongoing. Regardless, H&P Enterprise is a major IT infrastructure vendor, complete with a comprehensive portfolio of professional services and data and analytics focused software solutions.
Established Players ($500M-$1B)
The next tier is made up of more focused data and analytics enterprises. Each one is a leading player either directly in business intelligence or via data analytics infrastructure.
Cloudera is a "data cloud" company, founded in 2008 by three Silicon Valley engineers and a former Oracle executive. The company invested heavily in the Hadoop ecosystem, which gave them an early advantage, but then turned against them as the industry evolved. Cloudera merged with Hortonworks, another leading Hadoop vendor, in 2018. Today, the main Cloudera offering is the Data Platform: "An elastic cloud experience. Multi-function data analytics." Cloudera, listed on NYSE in 2017, was recently bought out of the market by private equity.
Qlik is the largest of what I would call a true business analytics SaaS vendor. The business idea is to help customers "turn raw data into remarkable outcomes" through data systems and user-friendly tools. Founded in 1993 in Sweden, the Qlik HQ was moved to the US in 2004 as part of a push to grow to international markets. After listing on the NASDAQ in 2010, the company was bought out by private equity for about $3B in 2016.
Qlik's flagship product is the Qlik Sense analytics suite, which builds on the success of the QlikView dashboarding tool. Qlik were one of the first to see the value in easy-to-use self-service analytics. Today, the company has a range of solutions for data integration and advanced analytics. Qlik is a recognised thought leader in the fairly crowded BI tools space, advocating for "data literacy" and "data democracy". Qlik's Statement of Direction (2021) is a great read.
Alteryx is another major business intelligence vendor with half a billion USD in annual revenue. Alteryx have a unified, integrated platform product, Analytic Process Automation (APA), and specific solutions for data preparation, blending and analytics. On the solution side there's perhaps a focus on automation, processes and workflows, though, like any major BI vendor, Alteryx has a holistic view of enterprise data flows. As a leading player, Alteryx has invested substantially in its community relations and the wider analytics ecosystem.
On the other end of the alphabet we have Zoho, a diversified business software vendor from India. Zoho's flagship products include its office application suite and a fully featured CRM, but the company also has a competitive BI and analytics offering. The Zoho Analytics tooling builds on the success of the predecessor Zoho Reports, and is focused on user-friendly interfaces. Self-service, embedded analytics, sharing, and drag&drop UIs all feature prominently.
MicroStrategy, founded in 1989 and listed on the NASDAQ since 1998, is one of the few old guard business intelligence vendors still standing. The company was an early innovator in data mining and BI tools, but hasn't been a leader in this space for a while. MicroStrategy is perhaps focused on usability, emphasising the ease with which data-driven insights can be surfaced in various contexts: in CRMs and ERPs, in embedded, mobile, and more. HyperIntelligence is the MicroStrategy BI data layer for powering next gen business.
Incredibly, MicroStrategy, perhaps finally seeing the writing on the wall, decided to pursue an aggressive bitcoin strategy in early 2021. To date, the company has spent over $2B on bitcoin. It remains to be seen how this play will pan out, but either way it's difficult to see MicroStrategy remaining a major BI player for long.
The challenger tier features both younger analytics companies trying to bring something new to the market, as well as some smaller established players focused on serving a particular niche.
Databricks, already mentioned a couple of times, is aiming to bring their Spark powered one stop shop to a larger audience. A recent Forbes piece outlines their ambition nicely. Databricks is aiming to be the definitive data processing layer for all kinds of applications. They have a data platform product that supports both analytics and business intelligence workflows as well as data science and machine learning / AI efforts.
Domo, founded in 2010, is a NASDAQ listed business intelligence vendor. Domo is one of those players that doesn't seem to have anything special going for it: it wins by simply executing well in every area. Domo has exceptionally clear messaging in its offer, and strong products for data integration, business intelligence and analytics throughout the enterprise. The Domo Appstore contains templates and recipes for common data integrations and connections. Domo user interfaces stand out in a positive way.
FanRuan, aka FineReport, is the largest BI vendor in China. Founded in 2006, the company operates at roughly the same scale as Domo. FanRuan's BI platform perhaps does not have the same level of polish as other players in the BI space, but it also doesn't make things needlessly complicated. FanRuan's solution has a clean layered structure: data stores, data integration, managed reports, and presentation.
Sisense is another medium-sized BI vendor, founded in Israel in 2004. The Sisense pitch is to "infuse" data and analytics into workflows and processes through smart dashboards and data integration. Nothing revolutionary here, just solid execution.
insightsoftware, founded in 2000, is a diversified business software vendor focused on the finance business function. insightsoftware has followed the Infor/TIBCO playbook, growing through several acquisitions and mergers. Over the years, the company has gobbled up several business intelligence vendors, such as CXO (of CXO-Cockpit), Longview (incl. ArcPlan), and Logi Analytics (incl. Zoomdata). The company strategy is founded on the idea that finance teams should own the applications they use. This way the company can remove inefficiencies, speed up processes, increase accuracy, and encourage wider participation.
Board is another player in the financially oriented BI submarket. Founded in 1994 in Switzerland, Board brings classic reporting, dashboard and data operations together with planning and simulation capabilities, all powered by a shared data repository. Board aims to be an all-in-one solution for enterprise decision-making, the leading software vendor for integrated data-driven business planning.
GoodData, founded in 2007, is a smaller BI vendor exploring a freemium pricing model. The company's analytics platform is a fully featured SaaS solution, with diverse data visualisation and embedding options, and tooling for data preparation and ETL. The GoodData UI library makes embedding easy. The GoodData analytics dashboarding tool comes as a cloud-native Community Edition, a container image that can be easily set up in any environment.
Datapine, founded in Berlin, Germany in 2012, is another no-nonsense modern BI vendor. The Datapine solution features polished self-service analytics dashboard tooling, smart data alerts, and numerous data connectivity options.
Thoughtspot, founded 2012, is a modern BI vendor with a fresh take on analytics. The Thoughtspot solution is built around natural language queries, simple unstructured questions, which the tool is then able to turn into SQL queries. The presentation layer then show the results in an appropriate way. This search-first approach to analytics has enabled Thoughtspot to serve a whole new groups of users.
Thoughtspot's SpotIQ feature further simplifies and enhances the analytics experience, bringing AI-driven insights into the analytics workspace. The company is also trialling a guided version of their search feature, Search Assist, and exploring ways of sharing analytics workspaces in the form of SpotApp bundles.
Yellowfin is an Australian business intelligence vendor bootstrapped in 2003. Yellowfin is an innovative, leading BI platform focused on automated analysis, data storytelling and collaboration. It's still dashboarding, but the emphasis is on driving action and on automated real time monitoring. The Signals module generates both scheduled and personalised alerts, providing contextual support for analytics. Assisted Insights is Yellowfin's machine learning powered automated analysis feature.
Long Tail Players (<$50M)
There are numerous minor business analytics players, too many to study in detail. Most of them can do basic data modelling, dashboards and reports. Some are focused on one particular workflow or niche, some are backed by diversified business software providers. Many are somewhat regional.
Some vendors with at least a bit of traction in the analytics / business intelligence space, and a few newer players:
|Entrinsik||BI||North Carolina, US||1984|
|Dundas BI||BI||Ontario, Canada||1992|
|NVivo by QSR Intl.||"Qualitative research"||Australia||1995|
|InetSoft||BI||New Jersey, US||1996|
|Bime, now Zendesk||BI||France||2009|
|Metric Insights||BI||California, US||2010|
|Chartio, now Atlassian||BI||California, US||2010|
|InsightSquared||"Revenue intelligence"||Massachusetts, US||2010|
|Cyfe, now Traject||MarTech||Massachusetts, US||2012|
|Clari||"Revenue operations"||California, US||2012|
|Mode||BI and data science||California, US||2013|
|AskData||Search BI||California, US||2014|
|Sigma Computing||Spreadsheet BI||California, US||2014|
|Yaguara||"Operations for eCommerce"||Colorado, US||2016|
|Rational BI||Notebook BI||Sweden||2018|
|Sisu Data||Search BI||California, US||2018|
|Reveal BI by Infragistics||BI||New Jersey, US||2019|
|Bold BI by SyncFusion||BI||North Carolina, US||2019|
|Alaira by Slanted Theory||VR BI||UK||2019|
In addition to the countless BI SaaS vendors, there are also many business intelligence oriented free software solutions out there. These range from complete turnkey solutions with commercial support all the way to DIY charting libraries. For example:
Flexit Analytics give out their self-service dashboard analytics solution for free: they sell their training, support, and consulting services instead.
Mozaik is a free, modular toolkit for building dashboard visuals.
If business analytics is an umbrella term, then the business intelligence domain can be meaningfully distinguished from data science and related things. If business intelligence is about humans and their data-driven insights, then data science complements this view with a machine perspective.
In this section we'll take a look at some data science platforms and some AI / machine learning focused boutiques. Data science is still a booming industry, there are too many hyped up AI shops to exhaustively survey, so we'll settle for some kind of a hopefully representative sample.
Many software giants have a portfolio of products for all things data science and machine learning. They offer both managed services and tooling for you to build your own pipelines. All the smaller players build solutions on top of commodity cloud infrastructure and focus on convenience and the end-to-end user experience.
Microsoft has Azure Cognitive Services and Azure Machine Learning, and some enhanced SQL offerings as well. Google has a wide range of AI and ML products, from AutoML to Translation. Amazon has numerous AI services and hosted infrastructure and framework options: Machine Learning for AWS is a good starting point.
Many solution vendors have products that span the whole range of business analytics, i.e., both data science and business intelligence workflows. Some notable platform plays of this kind include TIBCO, Cloudera, and Alteryx.
These days Databricks is easily the most exciting data science platform player. We have first class hosted Databricks from all the major cloud vendors: Azure Databricks, AWS Databricks, Google Cloud Databricks, Alibaba Databricks, ...
Altair, founded in 1985 and listed on NASDAQ since 2017, is another major platformer with cloud solutions for data analytics and AI. Altair specialise in simulation and high-performance computing, heavy engineering applications.
Dataiku, founded in 2013, is an AI-focused modern platform player. Dataiku is a leader in "enterprise AI" and managing the operational dimension of machine learning, also known as "MLOps". The company has a distinctively holistic view of AI as an organisational asset.
Domino Data Lab is a another player in the enterprise data science / machine learning market — "Enterprise MLOps". Like the bigger players, Domino aims to be a complete end-to-end solution for data science, abstracting away the multitude of technologies in a modern research stack. Domino has a handy data science field guide for the curious.
RapidMiner is a data science platform vendor with a focus on visual workflows and comprehensive data science automation. The company has a history of open source solutions and it still operates on an open core model. RapidMiner got started as an open learning environment in 2001, incorporating some years later. RapidMiner built an educational license program on their foundation, and today have many users at universities. There's a fairly active community forum around the platform as well.
KNIME is an open source data science platform vendor. The main KNIME analytics platform is free to use and free to build on. For enterprise users, the company offers KNIME Server, a deployment solution for analytics applications and services. Many KNIME workflows feature graphical node views and node based tooling.
There are many similar server products out there for hosting data science projects. Anaconda is a leading vendor for the Python ecosystem. RStudio is the main solution family for the R crowd. The versatile Jupyter notebooks are becoming somewhat of an industry standard for developing and sharing data projects — for better or worse.
Plotly is both an open source graphing library as well as an enterprise low-code solution for ML and data science apps. The idea is that users can take their data science and research results, wrap them in basic UI components, and then deploy as interactive applications. Plotly Dash supports Python, R, and Julia.
Observable is a platform in a different sense of the word. The Observable way is to explore and visualise data, and then share findings with others — it's a data science platform for thinking and communication. The Observable workflow is built around interactive dashboards, inlined code, and visualisation by way of D3.js.
Machine Learning / AI
Given the advances in machine learning and AI over the last 10-20 years, it's no surprise that many companies are looking for ways in which AI can transform specific industries and all kinds of business processes at large. Trying to enumerate AI companies is a pointless exercise, not least because "AI" is not a very well defined concept. Nonetheless, in the context of business analytics, there's a few trends worth pay attention to in this space.
First of all many players are looking at industrial applications of AI, partly inspired by the notion of the Internet of Things (IoT) and mountains of sensor data. This is linked with the notion of Industry 4.0 and digital transformation more broadly. All of these buzzwords are a gold mine for consultants — and AI startups.
C3.ai is a leading enterprise AI software provider, with a focus on the data pipeline. The C3 AI Suite is a "model-driven architecture" for Enterprise AI: there's an abstraction layer that sits between commodity infrastructure and C3 AI apps. Data integration and AI/ML model building is done inside an proprietary development studio. A visual no-code user interface, C3 AI Ex Machina, completes the package. As the name implies, it's a complete suite.
Other major players in the industrial AI space include Uptake and JFrog. Andrew Ng's Landing.ai is a smaller player in this space. Veteran AI shop Logility is focused on supply chain management for small and medium enterprises.
Another big trend in AI is process automation, both in applications and in the data processing pipeline.
UiPath is a major vendor in robotic process automation (RPA). The idea is to automate repetitive digital tasks that are normally performed by people. "We make software robots, so people don’t have to be robots." UiPath, founded in 2005, started trading on the NYSE in April 2021. Automation Anywhere and Blue Prism operate in this space as well.
"Robotic process automation (RPA) is a software technology that makes it easy to build, deploy, and manage software robots that emulate humans actions interacting with digital systems and software. Just like people, software robots can do things like understand what’s on a screen, complete the right keystrokes, navigate systems, identify and extract data, and perform a wide range of defined actions. But software robots can do it faster and more consistently than people, without the need to get up and stretch or take a coffee break."
Finnish startup Aito.ai is building solutions for smarter prediction in intelligent automation: machine learning for software robots. Their solution is a database with built-in machine learning functionality.
Faethm is a 2017 startup focused on the human side of the RPA revolution. The Faethm mission is to "prepare the world for the future of work". Their analytics platform helps people plan for AI scenarios. The idea is to help organisations understand the implications of emerging technologies — automation, augmentation, addition — at the job level, so organisations can evolve their workforce accordingly.
Automated machine learning, AutoML, is the other half of the automation story. With AutoML, the objective is to automate and accelerate every step of the machine learning project pipeline.
DataRobot is a leading enterprise AI platform focused on automation and "the end-to-end journey" as organisations turn raw data into ML models and deploy them in applications. H2O AI is another end-to-end AI platform vendor.
Modzy, founded in 2019, views the AI infrastructure game in a more modular fashion. The Modzy solution is focused on ModelOps, or MLOps, the operations side of machine learning. Instead of an end-to-end framework, Modzy is simply one of many components in an ML fabric, the one in charge of deployment and maintenance, among other things. They support a variety of training frameworks, tools, data pipelines, languages and monitoring systems.
Modzy also have a model marketplace, a library of "certified, pre-trained and re-trainable AI model from leading machine learning companies". These packaged models are a great way to get started on ML/AI applications.
Finally, just to show the variety of tiny AI and machine learning shops out there, a sample of smaller players:
Compellon is a minor business analytics AI vendor focused on automated predictive analytics. They have three step process for non-specialist users: load and review a dataset, select variables, and evaluate automatically generated predictions. The Compellon 20-20 product generates explainable bespoke models from the provided dataset as well as data-driven recommendations on actions to take.
Tangent is focused on automated predictive modelling for company operations optimisation. Their solution TIM, Tangent Information Modeler, analyses time series data and generates models base on detected patterns. These can then be used for forecasting and anomaly detection.
MyDataModels is a French augmented analytics startup focused on small data sets and non-specialist users. Their TADA solution solutions requires no coding or machine learning skills. Just upload your data file and immediately have AI powered predictive modelling capabilities at your disposal.
Squirro is a Swiss "augmented intelligence" vendor. Their Squirro AI studio solution is a "no-code AI interface for citizen data scientists", a project template powered model assembly tool. They have an Insights Engine solution for augmented enterprise content and data search. Their Augmented Intelligence offering comprises pre-packaged solutions for common scenarios in different business functions.
Cuddle is an AI-powered mobile-first business analytics platform built around data-driven nudges. The Cuddle system detects patterns in enterprise data and sends alerts and actionable insights to users as notifications. Users can also use the Cuddle app to ask questions about data using natural language.
Kraken by Big Squid is a no-code AutoML platform for business intelligence. Kraken includes data tooling, a range of connectors, and automation around predictive data science. Kraken's what-if scenario building helps with scenario analysis. Kraken also supports hosting the model as an API or a synchronised process. Explainability is also a first class feature of the generated AI models. Big Squid also have a data science training/consulting service called SONAR.
Data storytelling is of great interest to me, so we'll take a closer look at some business analytics vendors that have products or service that specifically support data storytelling.
Fundamentally, all you need for data storytelling is a data-driven insight and some way to communicate it. The standard business intelligence approach is to assemble dashboards and reports, have business analysts work with those, and then separately assemble story presentations based on what they see. This is surprisingly challenging to do well, so it's no wonder it's rarely done to great effect.
Just think of what the analyst needs to do — often alone — if they prepare a data story from a dashboard. Just to get started the analyst has to find and prepare the data, maybe import and integrate data into a platform or clean it up in some way. Sometimes the data is ready to go, but certainly not always. For the story the analyst has to then slice and dice the data, interpret charts and trends, draw connections between data sets and events, articulate insights in words, understand what it all means and who might be interested and how to communicate the insights and so on.
In some sense all innovation in BI has to do with making these steps easier. Today, there are currently two main approaches that try go beyond what one might call the dashboard paradigm: augmented analytics and natural language. Augmentation here typically refers to AI/ML automation and the practice of embedding these enhancements within existing dashboard tooling.
Natural-language generation, NLG, is about turning numeric data sets into a human-readable text form automatically. Natural language search is about using unstructured queries to find data sets and views on data.
Data storytelling itself is a kind of third way of thinking about analytics beyond dashboards. Data stories draw on both natural language and augmentation, but at the same time a storytelling focus represents a more comprehensive rethinking of what business intelligence could become.
June is a brand new YC-backed company focused on lightweight product analytics, all powered Segment, a leading customer data platform. The June approach is to build actionable insights through pre-built templates and reports. The company doesn't use "data storytelling" in the marketing, but I see their solution as a way to generate sharp little data stories without the fuss.
Toucan Toco, founded 2015, is a French business intelligence vendor focused on data storytelling. The Toucans emphasise the importance of delivering compelling data visualisations and insights to non-technical decision makers. This video explains the Toucan Toco approach nicely.
The Toucan solution packages stories into carefully crafted interactive story dashboards that people can explore at their own pace. The packages can be easily deployed and accessed through desktop and mobile clients. Toucan Toco provides a guided framework for building stories, ready-to-populate data models, story templates created by domain experts, a pure design mode, and extensive corporate branding options, among other features.
datastories, founded 2011, is a small Belgian data storytelling company. For DS, data stories are a form of augmented analytics, a way of leveraging AI and machine learning in data story crafting. The company has built a "computational engine" that generates predictive models and other insightful views automatically. Users simply upload data files and select variables of interest, and the system generates an interactive narrative for exploring and sharing. datastories also provides a data storytelling library for use in Python applications and, for example, Jupyter notebook workflows.
Juice Analytics, founded 2004, is a small Nashville, Tennessee based data storytelling boutique. Their no-code platform Juicebox is a simple authoring tool for data presentations. Juicebox is aiming for a cloud-native PowerPoint/Keynote level UX, with which even novice users can put together visual narratives that look clean and professional. There's a built-in presentation mode as well.
Datatelling, founded 2015, is a French data storytelling focused BI startup. As a "cloud data storytelling platform", Datatelling is built around a single page app data story experience. Authors can quickly add new section and charts, and can pull in data through a separate query module. This demo gives an overview of the product.
Livestories, founded 2013 and based in Seattle, Washington, is a data storytelling platform focused on the public sector. The Livestories tooling helps all kinds of public organisations — including cities and counties — use civic data. Their mission is to help public sector organisations communicate with and improve local communities. With data story templates both expert and non-experts can quickly author stories built on top of public data.
Nugit, founded 2013, bridges the gap between dashboards and data stories. The Singapore based company pioneered data story authoring functionality, and has been leading the story industry on automated data highlights, alerts and visualisations. Nugit users can create new stories from scratch or tailor template stories found in the Story Library.
Narrative Science, founded 2010, is an innovative data storytelling company based in Chicago, Illinois. Narrative is a leader in data storytelling driven business intelligence. Be sure to request a copy of their book on data storytelling. Narrative Science has two main products: Quill and Lexio. Quill takes an interactive dashboard and generates natural language descriptions from the data on display, capturing changes and trends and drivers. This interactive demo explains the concept really well.
Narrative's Lexio takes the same idea, but moves the analytics workflow beyond dashboards altogether. This video explains the concept. "Lexio is the first analytics tool that focuses on solving the adoption gap." Lexio is a tool for everyday analytics, almost like having a personal assistant. Lexio highlights changes in data, provides status updates, and suggests action items. It's like having a data-driven daily newspaper about your business.
Arria is a leader in the field of natural-language generation. The company has packaged their NLG expertise and research into a family of products for use in analytics and other data-driven applications. In each case Arria is able to generate concise textual representations of either event or time series data.
The main Arria offering is an authoring studio, where expert users can guide the NLG system and prepare text generating endpoints. This endpoint can then be leverage in various contexts, through integrations with BI and RPA platforms, Excel, and more. Arria provide solutions for natural language queries and question-answer functionality as well. This great demo shows Arria in action as a Power BI module.
French Yseop is another NLG pioneer, with a similar approach. Yseop's solutions focus on combining NLG with process automation, especially in the financial and medical industries. The company's business intelligence plugins bring NLG to analytics dashboards in Power BI and Tableau and other platforms.
In addition to supporting third party modules, several integrated BI platform vendors have implemented their own systems to support data story authoring.
The Yellowfin platform has built-in support for data storytelling, including a handy presentation mode. There's functionality for automatic analysis and insight generation. The Stories and Present modules make story authoring as easy as blog writing, but with deep integrations into the platform's rich data visualisation capabilities. This demo shows how Yellowfin's data storytelling modules function. Gartner rates Yellowfin's data storytelling highly.
Seeing where the industry is moving, Microsoft added smart narratives to Power BI. This component supports automatic aggregates and trend analysis, as well as powerful ways of editing the generated narrative with formulas.
Tableau's Explain Data uses "powerful Bayesian statistical methods" to generate explanations for data, identifying causes and relationships. The tool "helps you make sense of the outliers, aberrations and anomalies". These explanations are mini data stories, that can then be embedded into workbooks within the platform. In general, Tableau is very much invested in the data storytelling game.
GoodData believe in data storytelling as well, as highlighted by their recent platform upgrades. The emphasis seems to be on making the dashboard work as data story authoring tool. They have a data storytelling ebook as well.
Another approach to generating data stories is to start with a dashboard template, fill the template with your data and — lo and behold — you have a basic data story to share. Incorta Blueprints is an example of this common pattern, with the templates tailored for different platforms. Sisense has an extensive library of dashboard examples as well.
The are many ways to implement the same blueprint/template idea. The data visualisations wizards at Observable recently launched Observable Templates, which builds on Observable's forking based and community driven approach to data communication. The user again just fills in the template, and they have a basic data visualisation foundation to build on. For Observable, Templates is very much a way to get more people in to try out the platform.
"According to Gartner, data storytelling will be the most widespread means of consuming analytics by 2025. In addition, by then a full 75% of data stories will be automatically generated using augmented intelligence and machine learning rather than generated by data analysts."
To wrap things up, we'll take a look at the the wider business analytics sector. Many players in the business analytics ecosystem are not direct solution vendors. There are various entities that simply help organisations make the most of the data they have access to. In this section we'll look at some of these players and their roles in the analytics game.
Naturally, this is a mere scratch at the ecosystem — many of these parts are whole major industries in themselves.
Solution delivery is a vital part of the analytics software business. All modern business solutions are powered by the cloud, but on-premise installations have not gone away entirely. However, SaaS, software-as-a-service, is the currently dominant delivery model. Some SaaS tools are cloud-native, you simply integrate your data stores to a tool hosted elsewhere. Other tools you host yourself in a bring-your-own-licence (BYOL) scheme.
All major cloud providers have their own marketplace: there's one for AWS, the Google Cloud, and the Microsoft Azure. Among the challengers Alibaba's marketplace looks to be the one with the most traction. Other cloud players — Oracle, IBM, Tencent — may have something to offer as well.
Third party data can enhance business analytics output significantly. External data can add detail to internal data sets, and can help add context to reporting. The challenge is in finding the right data and making effective use of it. The data you want may not be available, and the data that is available might not be useful. Data quality is also a major concern.
Selling data is not a great business, but that hasn't stopped people from trying. There are data providers for every industry, and various aggregators and other players as well trying to connect data vendors and data buyers.
In the early days of social and smart mobiles, there was lots of hype around location data. Everybody was looking for any new business that could arise from mobile data. Foursquare, originally a local social app, has since come to dominate the location data business. Foursquare fused with Factual in 2020. Placed is part of Foursquare as well.
Loqate is focused on address verification, which is highly useful for cleaning up contact details from user supplied forms and other data sources. Inrix provide data and services around traffic and location data.
Social media is huge source of rich, unstructured data. The social media giants have their own data services, Twitter Enterprise being an example. LinkedIn, Facebook, Instagram — all the major players have their own advertisement data service and tools. Vendors like Keyhole, Union Metrics and TweetBinder can help you figure out what's happening in social. DataSift takes human generated data and transforms it for use in business processes.
From social to society. Several organisation are looking at world data, providing a kind of a data journalism public service. Gapminder and Our world in data in particular are worth checking out. Knoema is a major data aggregator, offering an atlas of world data drawn from a range of public sources.
Knoema also have an extensive library of so called alternative data resources, non-reporting data that could help financial analysts evaluate investment opportunities. This can mean anything from card transactions to shipping data and from satellite images to weather data. AlternativeData.org lists over 400 data providers. Consulting shops like Neudata are lubricating the ecosystem by connecting alt data vendors and buyers.
There are numerous vendors for company financial data. Bloomberg and Thomson Reuters are almost household names. Pitchbook has data on private capital markets. Credit ratings data for financial entities is a major industry, with vendors such as Fitch, Moody's and S&P. Credit is also of interest at the level of the individual: credit reference data is big business as well.
Sometimes data needs complex processing to make it useful for particular applications. Appen is in the data annotation business, providing training data, data collection, smart labelling and linguistics services.
Finally, there are countless publicly available data sets online. Kaggle, the data science hub, has a wealth of data sets and runs competitions for analysing them. Google Dataset Search is perhaps the leading search engine for data sets.
Management consultants sell you digital transformation — again and again. The may not be able to deliver on a solution, but generally are pretty good at articulating problems. The leading management consulting boutiques are almost household names, but working in the shadows, they somehow manage to evade most mainstream press.
Consultants know what business looks like today and tomorrow. They know what the leading companies are doing and where even long-standing businesses may struggle. Consultants know what companies and whole industries are going through. It is in their interest to push industries forward to new technology, new processes, and new ways of working and thinking. The idea is to sell premium services to those who seem to not be keeping up.
Focusing on business analytics consulting, we have major players on the scene. McKinsey have their Analytics function and their solutions arm Quantum Black, acquired in 2015. Boston Consulting Group, BCG, have their own approach to data and analytics, complete with their technology department GAMMA. Lighthouse, by BCG GAMMA, is a nice example of the kind of platform solution one might expect from a consulting shop.
Professional services giant Ernst & Young have their own digital department, EY Digital. The stated mission of the group is "unlocking human potential" and "accelerating new and better ways of working". Big Data and analytics is a major part of this effort. The focus is always on a holistic digital transformation of the entire business, from the front line to the board room.
KPMG want you to stay ahead with analytics in the cloud. KPMG are keen on working with established, more traditional players, with complex needs and lots of legacy data infrastructure. Converging on the cloud is the way.
Deloitte are big on alliances with well-established major analytics vendors from SAS to SAP, but without forgetting smaller players. The game is all about building insight-driven organisations and digital twins and APIs and a comprehensive digital strategy.
PricewaterhouseCoopers, PwC, have their own consulting services department, Digital, under the curious Strategy& brand. Company culture is a first class citizen at PwC: the Katzenbach Center helps organisations figure out company culture and new ways of working.
Booz Allen Hamilton have lots of experience working with the US public sector. Booz Allen have a long history of military and intelligence work contracting, the lessons of which they have encapsulated in their Experiential Analytics solution.
There are also numerous consulting shops specialising in data science and AI. Faculty, formerly ASI Data Science, is an example of a high profile data science consulting boutique. They have an interesting fellowship program, which seeks to train and place data science talent.
Consultants are not far away when political organisations seek to use data effectively in their campaigns. Cambridge Analytica made headlines back in 2018, when the extent to which Facebook data was used in political advertising and targeting was brought to light. These events increased public interest in privacy and social media's influence on politics. The Simulmatics Corporation was an early pioneer of this data science niche.
Industry wide business analytics research takes many shapes. All research organisations compete in visibility and influence, when it comes presenting their views on where the industry is and how things are changing.
For starters, the business schools of the world will tell you what you are doing wrong. Academic research in business analytics is plentiful, to say the least. Fortunately, major business and management departments typically have their own magazines and other public output, which often is far more accessible.
Harvard Business Review is probably the most influential university published magazine on business and management. HBR covers business analytics, digital transformation and related subjects surprisingly often. MIT Sloan School publication series Ideas Made To Matter is another quality resource, focused on data-driven insights.
Business press sometimes covers business analytics and related issues, particularly in the form of opinion pieces. Forbes really stands out here. Bloomberg Businessweek and The Wall Street Journal have some coverage as well.
Big Data and analytics vendors frequently make headlines in the tech and startups press. TechCrunch and VentureBeat are always at it. Outfits like BusinessInsider and Wired sometimes cover related phenomena.
On the more in-depth end of the spectrum we have industry and market research produced by independent research organisations. At their best, the reports produced by these organisations can be highly enlightening and influential. At the same time, the research from these commercial players can also be skewed due to their own interests and incentives. There are certainly some unhealthy dynamics in this space.
Gartner and IDC lead the way in business analytics market research. Smaller operators, such as Forrester, Valuates, ABI Research, Markets and Markets and Statista, all produce competing perspectives on related industries. Market research shops are pretty good at inventing new categories and spotting trends in how industries evolve. Some of these boutiques are rumoured to practice pay-to-play arrangements, which may bias their reporting.
The public sector can be another source for market data and other insights. Consider this fascinating inquiry on the Salesforce/Tableau merger back in 2019. The Consumer and Markets Authority looked at a number of factors, including market share, to determine that no substantial lessening of competition would result from the merger.
Like any other industry, the data and business analytics community has its own trade shows. Datanami, an industry news aggregator, has a handy events calendar. They also have a useful mini yellow pages section for vendors.
Market research companies frequently run their own conferences. Gartner, for example, organise numerous events for a range of interest groups — both virtual and in-person. Major cities have their own local gatherings as well. London has Big Data London, for example.
If we move on to online communities, we have lots of options. For example, people like to review and compare vendors and solutions on YouTube, and analytics is no exception. Everyone is out to make recommendations based on their own experience. At their best, these can be a nice proxy for broader customer sentiment. The coverage can be rather haphazard and people also tend to be rather loose about biases or even conflicts of interest.
Here's a sample of business analytics YouTube reviews:
- Dashboard Tools for Startups — Comparing Databox, Chartio, and Klipfolio on seven basic criteria.
- 4 BI Tools evaluated — A vampire shares his thoughts on Power BI, Tableau, Sisense, and Google Data Studio.
- Microsoft Power BI vs. Google Data Studio — Two popular solutions compared and contrasted.
Starting from market research and taking a step towards customers, you arrive in the wild plains of crowd-sourced review sites. The idea is to combine expert reviews and comments from people who claim to have first hand experience from using products, which in our case is business analytics systems. The framing is such that the content is useful for a confused prospective customer or user. As with market research, players may need to establish at least some kind of a relationship with highly visible review sites.
Finances Online have a nice review page for business intelligence software. They cover several vendors, both large and small, and some pointers on what solutions are popular, what their evaluation criteria were, what you should look out for, and so on. Technology Advice have a similar site, with a great collection of screenshots. The writing is crisp as well. BI-Survey is a slick review site dedicated to business intelligence.