An Overview To NoSQL Databases

NoSQL Database Explained

As the name implies NoSQL, also called Not-only-SQL are the databases that let the developers store/manage unstructured data and perform complex analytical operations on it as well.

Nowadays a wide range of NoSQL databases are available and can be chosen by developers according to their requirement. So the companies and developers now do not need to stay confined to a single kind of database platform.

NoSQL database was first adopted by companies such as Amazon DynamoDB, Google and others for solutions to real problems. These companies realized that SQL didn't meet their requirement and decided that they needed a solution to this problem.
Then they tried their traditional approach, they upgraded to faster hardware. When even that did not work, they tried to scale existing relational solutions by de-normalizing the schema. NoSQL stores the data in denormalize form, and follows the different model to store the data depending upon requirements, which explained further in this blog.

Key Characteristics of NoSQL Database

Due to a mismatch between the in-memory data structure and relational data structure of applications, many problems were faced by application developers. By using NoSQL databases, developers do not need to convert in-memory structure to relational structure. They also use it as an integration point to the application.

Relational databases were not designed in such a way that they can run perfectly on clusters.
The storage needs of an ERP application are very different than data storage needs of Facebook and other such applications.

The organizations are shifting to NoSQL database to achieve higher scalability, higher speed, and continuous availability.

Features of NoSQL Database

Need of Speed - Whenever a fast response time is required, the data should be placed in the memory. In this case, when the very fast response time is required we have to choose a database that stores the data in the memory.
Need of Scale - With the increased number of users and data volumes organizations requires such a databases which are easily scalable.
Need for Continuous Availability - Slow performance can drive a customer away and nothing is worse than downtime. There is a difference between high scalability approach that RDBMS offer with master-slave architecture and the continuous availability that NoSQL databases like Cassandra offer no downtime with redundant copies of data are being spread throughout a cluster across multiple locations.
Need for Location Independence - The ability to serve data quickly to multiple locations is critical. Because of fundamental master-slave design, RDBMS struggles to provide fast read access to many locations.

NoSQL databases can easily spread across multiple data centers and cloud availability.

For example, Adobe runs on Datastax enterprise using Apache Cassandra Database cluster between two data centers to ensure its customers can read and write data fast, no matter where they are located.

NoSQL database like Cassandra offers a much more flexible data model that can easily store structured, semi-structured and unstructured data.

Moving From Relational Database to NoSQL Database

New Applications

Many applications which made in SQL begin with NoSQL by creating a new application and starting from the ground up, but it creates the issue of application rewrite.

Augmentation (a process of making greater or larger in size)

Some choose to augment an existing by adding a NoSQL component to it. This often happens with applications than having outgrown RDBMS due to scaling issues, the need for better availability or other issues. Part of the application continues to use existing RDBMS, but the other components of an application are modified to utilize the NoSQL database.

Full Rip-Replace

The system that simply is proving too costly from an RDBMS perspective to keep or increase of users concurrency. A full replacement is done with NoSQL database.

Requirements To Move From RDBMS To NoSQL Database

RDBMS systems are made such that they don’t scale.
Handle things like foreign keys, maintain relations over the entire data set. The problem with this is to handle the data on a large set of machines with their foreign key relationships.
According to CAP only two properties out of three can be achieved. If the consistency is the absolute requirement we have to give up other two. Because the RDBMS follow ACID(Atomicity, Consistency, Isolation, Durability), so it is difficult to scale the RDBMS. Almost all data stores handle things like -
- Concurrency
- Queries
- Transactions
- Schema
- Replication
- Scaling

Performance and scalability - Two are odd to each other, increasing one would decrease the other. For performance, how we execute the same set of requests, over the same set of data with -

Shorter time
Few resources usage
There is also a tradeoff between resource usage and processing time. In general, we can say that we can reduce processing time by consuming more resources. Conversely, we can reduce the processing time by consuming more resources.

Types of NoSQL Databases

There are different types of data stores under NoSQL databases available which allow storage of data. These have different ways to store data. Some data stores that come under NoSQL databases are explained below :

Key-Value Database

The Key-value store or key-value based database is a database that uses an associative array(such as a map) where each key is associated with one and only one value in a collection. This kind of relationship is referred to as a key-value pair.
In a Key-value pair, each key value is represented as an arbitrary string such as a hash value.
The value is stored as a blob.
The storage of value as BLOB removes the need to index the data to improve performance so that we cannot control what's returned from a request by value.
Key value stores do not have any query language. They only allow to store, retrieve and update data using simple get, put and delete commands and the data can be retrieved by making a direct request to the object in memory or on disk.
Some examples of key-value store Databases are -

There is a difference between the databases which come under key-value databases; all databases are not the same.

For example, Memcached data is not persistent while Riak is. Using Memcached to implement the caching of user preferences will load all the data when the node goes down and refresh required from the source system.

If we use Riak we may not need to worry about losing data but we need to focus only on how to update data. It is important to not only choose a key-value database based on your requirements but also to choose which key-value database to be used.

Queries - Queries are performed only on the basis of the key.
Schema - It stores the data on the basis of key-value pair. It stores the data corresponding to the key in the format of a BLOB.
Scaling up - Keyspace is shared means that key starting with A data for this key will go to one server.Key starting with B data for this key go to another server. But it exposes the system to data loss if a server goes down.
Replication - When we write data to multiple machines. If there are two servers in the cluster then the value of key “ABC” are two different things for two different servers. Resolving this is a complex issue and during updates it creates problems.

Uses of Key-Value Store

Key/value stores are used when we have to access the following data -

session
shopping cart info

For example - In a shopping mall, information regarding a particular product is stored on the basis of a particular key. So when the product is scanned, on the basis of barcode all the information for a particular product is accessed.

This key/value database allows us to read and write values as follows -

Get (key) returns the value associated with the provided key.
Put (key, value) associates the value with the key.
Multi-get (key1, key2, .., keyN) returns the list of values associated with the list of keys.
Delete(key), removes the entry for the key from the data store.

Document Database

Document-based databases are similar to key/values databases. They store data on the basis of key/value which is similar to a key-value database. But the only difference is that it stores the values in form of XML, JSON(javascript object Notation), BSON (Binary encoding of JSON objects).
The database understands the format of data so that the operations can be performed easily.
It allows the storage of complex data. If we want to store trees, collections, and dictionaries, then it is a good choice.
It does not support relations. Each document is standalone. It can refer to other documents by storing their key, corresponding to the particular document.
Document-based databases do not support the joins, so it almost overcomes the problem of sharing the data across multiple nodes.
Some of the document based databases are -
- MongoDB
- CouchDB
- Terrastore
- OrientDB
- RavenDB

Queries - There is no other way to query the data except the key-value stores. We can also perform range queries on the basis of a key.

Transactions - Mostly document based database support transaction for a single document.

Schemaless - Schemaless means it does not require any schema to store the data. Each document can differ in the number of columns. It understands the data of JSON format only.

Scaling up - In this database, each document is an independent document. It does not support joins. So it is easily possible to share the data across multiple nodes independent of each other.