Security Challenges of Big Data
Explaining the big data
When a computer operates and transmits messages, the messages are passed and stored in the form of symbols, quantities and characters as electric signals and are recorded on optical or magnetic recording media. These are called as data or computer data. When these data are of huge size and become complex and unmaintainable in normal data storage tools and systems, they are called big data (Kar & Bharti, 2018). It needs special storage capacity to help maintain and transmit these data.
The modern world has complex transaction processes and the quantum of data exchanged is also big especially at some of the industries which are of large scale. Big data is also used for researches and huge exchange of information on certain platforms. Example may be given of any stock exchange of a robust country and according to specialists, the stock exchange of New York generates about a terabyte of data pertaining to new trades every day. Another example may be given of any social media platform(Quinn & Quinn, 2018). It is estimated that social media platforms like Snapchat and Facebook generates several hundreds of Terabytes of data which are new every single day. The data are mainly chats and blogs, videos, photo uploads and comments which are put on the social media platform. The jet engines are great in data creation in every alternate second there generates one terabyte of data when the engine flies, So with around thousand flights every day the Jet engine data touches up to several Petabytes.
Big Data can be found in three forms the structured, non-structured and the semi-structured.
When processing is easy and the data is processed, stored and available in a format that is fixed, it is called structured data. Normally the form of the data that is worked upon in computers is known from before and is well formatted. People are used to derive and decipher meaning out of such data in a quick and agile manner. These are basically the purview of the structured data and it is easy to work with those data quickly and access them conveniently. However the size of such data may become very big to manage and such data then come under the big data.
When the form of the data is unknown and is unstructured in manner, and also the expanse of such data is vast and huge almost being at the point of being non-manageable, the data comes under the head of unstructured big data in computer science. A random selection of the text files and videos and images together can make big unstructured data( www.sas.com, 2019). Currently many organisations are in requirement of such big unstructured data for their daily business and records. The search options and output generated by yahoo or any search engine are examples of unstructured big data.
Semi structured data
In a semi structured data we may find the both form, i.e. structured and non-structured. It can be seen in a structured form but not in a defined manner. A data produced in an XML file is an example of semi structured data.
Point to be noted: An unstructured data consists of transaction history files, log files etc. To work with structured data OLTP systems are made.
Features of Big Data
From the name itself we realize the size of the big data is huge. Size is an important factor in determining how important the particular data in question is. It is even required to find out if we at all can call the data big data after checking its volume. Hence, Volume is a crucial feature of Big Data.
Variety is another important characteristic of Big data. It refers to the different sources from where the data (both structured and unstructured) is received. At past, the only source of data for most of the applications used to be databases and spreadsheets. But at present photos, videos, audio, emails, PDFs etc are also taken as important source in application. It is the variety of unstructured data that brings some issues in analyzing, storing and mining of the data.
Velocity is the speed. The speed of data generation is called the velocity of the data. The actual potential of data can be determined by its velocity, i.e. the speed of generating the data and processing it. Data flows from different sources like business processes, networks, application logs, social media sites and mobiles etc.(Kar & Bharti, 2018). Velocity of big data deals with this massive and continuous flow of data.
Sometimes because of the inconsistency of the data it’s become difficult to handle and manage the data. This is what Variability of Big Data refers to.
There are various benefits of big data processing:
Important decisions taken by the companies can be supported by the outside intelligence:
Nowadays we have access to different social sites, i.e., face book, instagram, twitter. The data obtained from them enables the companies to strategize their businesses. The information received from various social platforms becomes a valid source in decision making for the organizations. Better Customer Service:
Through big data technologies the feedback system used in past is replaced by new systems. To evaluate the customer responses now big data and natural language processing technologies are used. Customer responses are evaluated and read by the organizations leading to better customer service.
Efficacy in Operation:
By using big data technologies a landing zone for new data is created. It is a staging area for new data before they are actually moved to data warehouse.
Big data requires huge parallel software running on thousands of servers. It is not possible for big data to work with relational database management systems and static packages. Advantage of big data is that it provides a number of additional information. (Quinn & Quinn, 2018) Correlation are found, business trends are identified, diseases prevented, crimes are controlled and so on.
Even governments are using Big data in decision making. Different concepts of database engineering are now used for decision making from big data .Data warehousing, OlAP, Distributed and parallel database, etc are the concepts which has been employed. (Kar & Bharti, 2018) Only big organizations and government can afford to provide the infrastructure required to analyze such large data. Big Data promises a great future and will be contributing in the field of health, crime, market analysis etc.
Challenges of Big Data:
Single layer Protection:
A single layer protection is not sufficient in today’s fast markets. Many problems arise because of single layer protection.
Movement of the non-related database is faster than one’s imaginations. Companies are vulnerable to this threat to a large extent. The six major threats of NoSQL database are:
It has a very soft approach towards transactional integrity which is a major problem.
It uses weak authentication mechanism and weak password storage method. That makes the system vulnerable to outside threats.
Often it is seen that the authorization is at higher level instead of lower level Injection Attacks: Injection attacks can corrupt the data by different ways and trouble the system .Inconsistency Users are not always delivered consistent results .Insider Attack: Weak security system leads to insider attacks.
Automated data transfers require more security and monitoring. But it is unavailable here because the process is electronically automated.
The security problem of big data
Security of the data is crucial and hence it is important to know the problems. Here the major security problems that big data faces are discussed:
1. Data generations may be false
2. Data Mappers may be from non-trusted sources
3. Protection problem of cryptography
4. Sensible data gets mined for information
5. Granular access control difficulties
6. Troubles of Data provenance
7. The security audits are absent
1. The problem of false or fake data generation
Sometimes false data or fake data is generated and becomes the cause of major concerns and a big threat to the maintenance of big data. To minimize the usable extent of big data of an organisation and show it low, cyber hackers in deliberation duplicate the data and dump it into the data lake. Examples are galore. In manufacturing houses the cyber-hackers get access to data control system and show wrong results for certain conditions and functions (Quinn & Quinn, 2018). The result is that in case of malfunctioning of system it goes unnoticed as the data does not show the change or even in case of right functioning it sometimes give false alarm and prevent the right functioning of the system. One needs a robust fraud detection system in place to check the penetration of such cyber-hackers and cyber-criminals.
2. Possibility of the presence of mappers who are not trusted
After data collection, the big data goes for a simultaneous processing system. In such a method called the paradigm of Map Reduce is used. It is the process of splitting the bulk data into smaller bits and storing the same in particular storage settings. If the access of the mapper’s code is given to a malicious outsider, they can alter the setup of the prevalent mappers, delete some or add some foreign mappers alien to the system. This way the data can be destroyed by making inappropriate series of key/value data pairs. The data will be shown by the Reduce mapping process as faulty. It can also give access to valuable and key information to outsiders and the data can be hacked or stolen. As already discussed, making the data faulty and hacking it is not out of reach because there is normally no added security given to big data. The dependence is generally on the perimeter security system and till date big data has no adequate security protection system.
3. Cryptographic Protection:
To protect the big data and the sensitive information it may hold it is required to have encrypted protection. But this security measure is often ignored. Delicate information are stored in cloud without encryption and become vulnerable to security threats. Constant encryption and decryption big data slows down the speed of the system. Hence becomes disadvantageous.
4. Mining of sensitive information:
Big data is protected by perimeter based security which ensures that every point of entry and exit are secured. But some unethical IT specialists can mine unprotected data entering into the system .It brings huge threat to the company because sensitive information gets leaked. The solution to protect the data is to add extra perimeter. Security system can also be benefitted through anonymization. There will be no harm if personal data contains absent names, addresses and phone numbers.
5. Granular access control struggles
Security is a process which needs different levels of permissions and authorizations at various levels. Granularity gives the ability to restrict certain specific actions for someone while permitting the same to some others. For example while using some applications we are able to click and read only according to the instructions given. We are not able to change their default settings or get into the technical side of the application (Zeadally & Bhadra, 2015). This is because all those are low-granularity systems where users are only allowed to read or write but not to access. In a high-granularity system the things will be vice versa which means more access is given to users.
Granular access controls limits the information people allowed to see and access which sets the degree of security needed. This requires a strong authentication process. So the decision regarding granular access control is a critical one as we have to decide the limit of access for various users at various levels for different purpose without affecting the whole systems performance. A regular granule auditing will help to recognize any malicious activity or cyber-attack which is more advantageous.
The loss of privacy in the information
Companies store data of their clients and their stakeholders in big data storage. Such systems may also be used by the government organisations for the purpose of commuting service or business. But the data stored may be sensitive in nature and data theft may put the privacy of the clients or the stakeholders at loss. It will be considered serious breach of trust by the people whose privacy are tossed off and now a days there are legal suits that follow for such infringements. Plainly the parent company from where the data has been lost cannot shirk their responsibility for negligence of data keeping and the faith on such system will be lost for ever. Once the data piracy occurs, customers and stake holders will lose faith in the system and trust once lost will take a million of efforts to regain in vain.
Fundamental steps in strong Big data security
Step 1: Get security measures from the starting itself
It's the security team's responsibility to provide their expertise at each step in the development process. The security team must be approachable, open with their knowledge, and committed to finding a custom solution for securing big data technologies. The data analytics team must recognize its obligation to incorporate robust security measures into its big data innovations (A survey on security intelligence of Big data, 2018).
Step 2: Start with objectives rather than solutions
Too often, security teams prescribe simplistic solutions as requirements for any system an organization uses, despite the fact those solutions won't successfully protect complex big data systems. Adequate big data security differs from normal operations and cannot be subjected to the same standard. In fact, a big data environment can't be secured with just one solution (Cui et al, 2016). Organizations must understand that only a customized blend of tactics has a chance at completely managing the risk. Security teams should adjust their thinking when it comes to big data security efforts. They can begin by asking what specific security objectives your team is trying to achieve. From there, they can work backwards to find the custom and alternative solutions to secure the environment.
Step 3: Customize the solution
There is no one-stop shop for big data security. Big data technologies are a bunch of open source frameworks stitched together to fill a specific need. That makes creating a security solution complicated. As big data platforms are treated more like custom applications and less like databases, you have a greater chance of using the appropriate security approach. The data analytics team and security team need to understand the low-level architecture to ensure they're taking all possible threats into account( Azmi, 2019). Currently, big data platforms are too complex to be secured with a one-size-fits-all solution. To address the complex security requirements of big data platforms, organizations need to customize a stack of tactics that address the security objectives identified at the beginning of the process.
Big data security analytics is simply a collection of security data sets so large and complex that it becomes difficult (or impossible) to process using on-hand database management tools or traditional security data processing applications The test with big data is that the unstructured nature of the knowledge executes it even difficult to classify, model and map the data when it is taken and deposited. The obstacle is made least by the fact that the data usually comes from outside sources, often creating it complex to confirm its correctness. If they apprehend all the knowledge available they risk wasting time and support processing data that will add few or no value to the industry. Another hurdle in the case of big data is that you can have a big category of users each needing access to a distinct subset of information. This means that the encryption solution you wanted to preserve the data has to exhibit this new reality. Access control to the data will also require being more granular to assure people can only obtain the information they are approved to see.
java assignment help, computer science assignment help, python assignment help, java homework help, programming homework help, programming assignment, java programming assignment help, computer science homework help, php assignment help, computer science assignment, python homework help, r programming assignment help, c++ programming assignment help, c++ assignment help, c programming assignment help