• My Account
  • solution


    NoSQL database Assignment Help

    NoSQL database Assignment Help


    Both the databases are open source where one is document oriented and other one is for larger database. These database are family for NoSQL. The NoSQL database is mainly designed to increase scalability, fast storage, fast access to data and security [11]. This database can run on large node and is capable of achieving numbers of features that was not possible with RDBMs. There won’t be conflict on reading and writing of data at once. The data are distributed over thousands of machines and are in the form of clusters and access by nodes or routers. In this paper the comparison of both the database is done in terms of performance, storage, retrieval time, scalability, reliability and security. The database model of these database varies in terms where MangoDB is used for document store and Cassandra is used for Wide column store. Cassandra was developed in 2008 by apache software foundation and MangoDB was developed by MangoDB inc. The language that uses these database are java for Cassandra and C++ for MangoDB [11]. The schema free is both the database. There is no server side script for Cassandra but for MangoDB, JavaScript is used as server side.  
    The requirement of all three of CAP can’t be fulfilled. The MangoDB flows CP where was AP is followed by Cassandra. CP states that some of data can be accessed and some of data could be accurate whereas AP sates that some data could be returned inaccurate. The application of Cassandra mostly covers IOT, recommendation engines, fraud detection application, playlists, product catalogs and messaging application. It is based on scalability (class) of NoSQL [4]. Whereas MangoDB helps businesses get transformed using harnessing the power of data that are stored. It is used by organization for startups on larger companies for creating applications that does complex tasks. The Cassandra requires minimal administration compared to MangoDB. This report presents all the aspect of both the database and its comparison is made. 
    2. MangoDB
    The MangoDB uses single instance operation and supports standalone. The performance provided by MangoDB is very high which is done using replica set which handles failures [5]. The cluster makes the division of large set of data and store in different machines. The high redundancy is provided combining replica set and clusters (sharded) and the data is found to be transparent to the applications. The main feature of MangoDB are as given below:
    a) Iterative and fast development.
    b) Data model with flexible feature.
    c) Scalability with multi-datacenter.
    d) Feature set that are integrated.
    f) TCO is lower.
    g) Commitment that is for long term.
    h) Flexibility   
    Data Management for MangoDB
    Linear scalability
    The horizontal scale out is provide by MangoDB which is cost efficient using sharding. This process is transparent to software applications. This sharding makes the data to distribute to different and multiple partitions which is also known as shards. The limitation that is occurred due to bottleneck is being solved which deployment of MangoDB in this pattern [6]. The complexity is reduced in this case. When the data get bigger the clustering of data is being done and the size of cluster is increased. This whole process is automatically maintained unlike other databases. There is no effort required for the application developer for sharding logic. There is also multiple sharding allowed in this database which makes it easy for developer to distribute data in the cluster at number of resources.  There is high scalability with workloads and they are as given below:
    Sharding in range
    As we know the MangoDB is mainly used to store documents, these documents are partitioned in number of shards which is determined by shard key and value pair. There is high possibility that if two documents have close key values being closer to each other in cluster. 
    Sharding Hash
    The encryption used in this database is MD5 hash for document distribution. It give reliability to the data to be distributed properly in the shards [7]. 
    Sharding zone
    This provides operation of defining own rules for data placement within the shard zone cluster. This provides a range to data distributions. The data refining could be done continuously by the administrator and can change the key value for data migration [9]. 
    2.1 Architecture of MangoDB
    The diagram below gives the model of MangoDB architecture. It contains application server, configuration servers and shared MangoDB which is replica set. The components that sharded cluster has are shards, configuration servers, query routers. The data are stored into shards that has replica set and it provides data consistency and availability [10]. The router in the diagram is the query router, it handles the query and provides the interface with the application used by clients. This gives direct access to the data in the shard. The main operation of router is to target the data at shards and return the data to the clients. There could be number of router that gives fast access to the data and provide high availability. 
    The config servers’ gives feature of storing metadata that are of clusters. There is mapping of the cluster and its dataset with the shards data. These metadata are used by the routers to access the particular data in the shards. There are 3 configure servers in sharded clusters as shown in the diagram.  

    During this last decade, there has been significant increase in hacking and issues with data security. By 2021, it is predicted that cybercrime might cost $6.2 trillion annually in global economy. There is always threat for the industry which is related to data security. The data plays vital role in industry for its growth and analysis of business. It is task of administrators at industry to secure all its data from being manipulated and hacked. The MangoDB consists of security measures for defending itself, controlling access to data and detection of changes in database. The diagram below gives the overview of the security.  
    There is external security measure of authentication and accessing the database. These include LDAP, Kerberos, PKI certificates and Windows Active Directory. The lightweight directory access protocol is used mostly in business computer networks which operates in distributed list [9]. The computer that wants to access LDAP must be logged into the server and follow the protocol. 
    The authentication provides much security but there is requirement for high secured authorization services as well. In MangoDB the permission for the users could set according to access mode. It could also be used within LDAP server. The auditing is provided and it can be used by the administrators for determining and tracking access in log. 
    Encryption is one of the oldest and most effective measure for data security. MangoDB uses this technique for encrypting its data on the network. There is separate engine for encryption, protection of data. These building feature in MangoDB gives proper management and performance in data access and protection. The encrypted data can only be accessed by the authorized users. 
    3. Cassandra 
    The Cassandra is column oriented database, distributed, fault tolerant, scalable and high performance [8]. It is difficult to get high availability of data with big data storage therefor the data are stored in different location and portion is done. The Cassandra provides such high availability of data and there are other more feature of this database that are given below: 
    Handles high amount of data (Big data)
    a) Access is fast and random
    b) Schema is variable 
    c) The same data is seen at the same time by all the nodes. 
    d) The processing and access of data are need to do fast. 
    e) It requires partition of data and distribution. 
    f) Availability is higher than other database. 
    All the three that is Availability, consistency and partition tolerance can’t be achieved once fully. The Cassandra gives high availability but lacks in consistency. It was developed by Avinash Lakshman for powering Facebook messaging search. In this database each and every node of the database points to the same role and it doesn’t has any change to get failed. Similarly as MangoDB, the data distribution is in clusters [6]. All the strategies associated with replication are flexible for configuration according to need by administrator. The designing for database is done according to distributed system so that there could be multiple data centers and larger nodes. 
    It is specially designed for disaster recovery. With the addition of new machine, there is significant increase in throughput for reading and writing for data. The replication of data is automatically done into number of nodes so that there could be fault-tolerance. This gives data security for cloud computing as well. The integration of hadoop including mapreduce support is on this database which supported by apache hive as well [2]. There is separate query language for Cassandra that is known as CQL. This is an alternative for SQL which gives an additional layer that hides detail about the database structure. The drivers are also available for java i.e. JDBC and other number of languages. 
    3.1 Architecture of Cassandra
    The structure of Cassandra contains node, cluster, data center, table, commit log, mem-table, and bloom filter [7]. The architecture of Cassandra is being given in this section. Before understanding the architecture, it should be known that Cassandra was developed understanding that the system failure is likely to occur and do occur. The distribution is in peer-to-peer where all the nodes are same. 
    The partition of data is done automatically when writing data into the database. Hence, these is no specific place where the data could be written sequentially but data could be anywhere. The commit log gets the data at the beginning and then the data is also written in memory structure that is mem-table [4]. The diagram below is the architecture of Cassandra, there are two Cassandra clusters which contains web client assess and numbers nodes. The cluster configuration is provided by middle tier architecture.  
    The architecture of Cassandra also supports replication of data for fault tolerance and efficiency. 

    3.2 Security  
    Security for any data is most important in today’s world. The industry always focus on data that can’t be manipulated and accessed by other 3rd party. The users can be created by the administrators who are given permission of accessing database. The command that is used is create user. The internal architecture of Cassandra manages the user and its password into its clustering database. The query language of its own can used to drop such users or alter then accordingly [4]. The permission management are in control of administrator for granting different levels of permissions to the user for accessing data. Hence for security purposes the Cassandra provides number of feature for its security and they are as given below:
    3.2.1 Encryption on client to node
    This is an extra security option that is provided by Cassandra. The SSL server provides high security for helping data not be to compromise. The communication with data cluster and client is maintained using SSL encryption. This is maintained independent in Cassandra. For addition security the setting of Cassandra.yaml file could be overridden in virtual machine. At the virtual machine level the configuration and protocol can be changes according to industry for more security. The SSL encryption is used for Cassandra database which is for client to node, node to node, server certification. The data is protected from the client machine side using secure socket layer. Similarly the data transfer is also protected in cluster. The generation of certification is carried out for all these protection. 
    3.2.2 Authentication 
    This database also follows the protocol for authentication which can be pluggable into Cassandra. The use of authenticator setting in Cassandra.yaml file enables the administrators for use these features. Allow all authenticator is at the beginning by default which acts as authentication and it doesn’t require credentials. There is also password authenticator for default use of authentication in Cassandra and the credentials are stored by encryption [8].  
    3.2.3 Authorization 
    The authorization can be configured in Cassandra using authorizer setting in Cassandra.yaml file. Its configured allow all authorizer by default that doesn’t check for permission and gives all user permission to use. The Cassandra provides options for adding security and changes it according to use. It is flexible to get level of security that is required by the industry and administrators [6].
    4.Performance Test Plan
    The test plan was arranged and performed by installing virtual machine on the machine for executing the operations. The test arrangement was done according to the benchmarked instruments. The installation of visual studio was done into the operating system as specified below. Two virtual machines were create on visual studio and Ubuntu 14 version was installed on both the virtual machines. One VM was equipped with Cassandra database whereas other one with MangoDB. The insertion operation were carried out using java code done in eclipse and time execution was recorded. The specification are as given below: 
    4.1 Machine detail:
    a) Processor: Intel(R) core(TM) i5-2410M
    b) Operating system: windows 8.1
    c) Architecture: x64 
    d) RAM: 3GB
    e) Virtualization software: Visual studio
    4.2 MangoDB virtual machine
    a) Operating system: Ubuntu 14
    b) Memory: 1GB 
    c) Hard disk: 15GB
    d) Processor: 1
    4.3 Cassandra virtual machine: 
    a) Operating system: Ubuntu 14
    b) Memory: 1GBHard disk: 15GB
    c) Processor: 1
    Step 1: installation of visual studio into the machine and installing Ubuntu 
    Step 2: installation and setting up of MangoDB and Cassandra into the virtual machine two different virtual machines. 
    5. Evaluation and Results
    The evaluation of both the database is done by inserting data into the database and recording the time into the table which is give below:
    a) Experimental results:
    b) Operation Table: 
    The chart diagram is obtained from the table above. The analysis is done on the basic of insertion of records. 
    The workload graph is obtained below that contains 50% reads and 50% update that is done on 6, 00,000 records. The value for Cassandra is low compared to MangoDB as shown below: 

    The Cassandra database executes at much faster speed when compared to MangoDB for 50% write and 50% update operation. When compared to search operation, the Cassandra database comes first as it is completely designed for such operation by apache [8]. 
    The comparison of MangoDB and Cassandra was done in this report. The installing and implementation of instance was carried out. The Ubuntu operation system was used to installing there databases in two different virtual machine. The records were inserted and the analysis was carried out for both the database. It was concluded that for hug data insertion that is writing into the record Cassandra performs much faster than MangoDB. The security measure for both the database are incomparable where Cassandra gives number of options for the administrators for level of security, the MangoDB has its default security measures. Both the database performs authorization and authentication at user levels. For more hug amount of data Cassandra should be used.