What Are Columnar Databases?
A column data store is also known as a column-oriented DBMS or columnar database management system. Column storage DBMS stores data in columns rather than rows. Relational database management systems (RDBMS) store data in rows and data properties as column headers. Both row-based and column-based DBMSs use SQL as their query language, but column-oriented DBMSs may offer better performance. Imagine you need to list all names from a table based on an ID; rather than going through all the rows, you could just access a single column of the table.
Here are some of the key characteristics of column data store DBMS
- Column store DBMS uses a keyspace that is like a database schema in RDBMS.
- Column store DBMS has a concept called a column family. A column family is like a table on RDBMS. The keyspace contains all the column families in a database.
- A column family contains multiple rows. Each row has a unique key called Row Key, which is a unique identifier for that row.
- Each column in column store databases has a Name, Value, and TimeStamp fields.
- Each row can contain a different number of columns. All rows don’t have to have the same columns.
- Each column can contain multiple rows. All rows don’t have to have the same data type or size.
Keyspace
Column store DBMS uses a keyspace that is like a database schema in RDBMS. The keyspace contains all the column families. A keyspace name can be a CMS database that stores user profiles, documents, and documents metadata.
Column Family
A column family is like a table in RDBMS. A keyspace can have multiple column families. For example, a keyspace can have column families AuthorProfile, MemberProfile, Article, Blog, and Question.
Row Key
A column family contains multiple rows. Here is an example of a row of a column family. The first item in a row is a single value row key that is a unique key to identify a row.
Column
Each row can have multiple columns. A column in a column data store contains actual values. The data in a column store database is stored in a key/value pair with a timestamp. Each row can have a different number of columns.
AuthorProfile
The following is an AuthorProfile column family with three rows. Each row has a different number and type of columns. As you can see from the data, each row has a unique Row key.
Three rows of the column family are Mahesh, David, and Allen. The first row has three columns, Gender, Expertise, and Rank. The second row has two columns, Gender and Book. The third row has three columns, City, Book, and Rank.
Key benefits of column store databases
Key benefits of column store databases include faster performance in load, search, and aggregate functions. Column store databases are scalable and can read billions of records in seconds. Column store databases are also efficient in data compression and partitioning.
Popular Column databases
Some of the popular column-oriented DBMSs include Bigtable, Cassandra, HBase, Druid, Hypertable, MariaDB, Azure SQL Data Warehouse, Google BigQuery, IBM Db2, MemSQL, SQL Server, and SAP HANA.
Bigtable
Google Bigtable is a petabyte-scale, fully managed NoSQL database service for large analytical and operational workloads.
Key features include
- Low latency, massively scalable NoSQL
- Consistent sub-10ms latency
- Replication provides higher availability, higher durability, and resilience in the face of zonal failures
- Ideal for Ad Tech, Fintech, and IoT
- Storage engine for machine learning applications
- Easy integration with open-source big data tools
Cassandra
Apache Cassandra NoSQL database is highly scalable and highly available without compromising performance. Linear scalability and proven fault tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple data centers is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
HBase
Apache HBase is an open-source, distributed, scalable, NoSQL big data store that allows billions of rows of big data access in seconds.
Key features include
- Linear and modular scalability
- Strictly consistent reads and writes
- Automatic and configurable sharding of tables
- Automatic failover support between RegionServers
- Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables
- Easy to use Java API for client access
- Block cache and Bloom Filters for real-time queries
- Query predicate push down via server-side Filters
- Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
- Extensible JRuby-based (JIRB) shell
References
Next Article >> What are NoSQL Databases
To learn more about these popular graph databases, visit Most Popular Graph Databases.