
I'd like to address a recent blog post in CloudTweaks titled, "Cloudera Not Cutting It With Big Data Security." The author makes a number of very salient and valid points about Hadoop security… or lack thereof.
Indeed the Apache Hadoop platform, which includes HDFS and MapReduce and other projects like HBase, Mahout and Hive, was not designed for security. The Hadoop name, for better or worse, is nearly synonymous with big data because it delivers the "three V's" (velocity, variety, volume) at massive scale, enabling organizations to crunch, process, analyze and retain data like never before.
Clearly there are security and compliance implications to big data. Consider the following:
The blog suggests that Cloudera, and I presume other commercial Hadoop vendors, should do more to address the security concerns in Hadoop.
I believe Cloudera absolutely has the right approach to security.
Cloudera has some of the brightest Hadoop and Apache minds in the world. They're experts in enterprise-class systems management. That's what they do. Addressing the big data needs of customers should always remain the company's primary focus.
Cloudera has also cultivated one of the most comprehensive partner ecosystems in the big data market. This is important because it enables Cloudera to focus on its core strengths, while leveraging outside expertise in analytics, BI, cloud computing, and of course, security.
Think about it this way: Would you expect the same company that built your house to install the alarm, mow your lawn and provide Internet service?
Certainly not, so why then would you demand security from a company that specializes in Hadoop?
The right approach is to look for a company with expertise and experience in securing Hadoop platforms.
There's a false narrative that traditional relational databases inherently offer cutting-edge security, but the truth is, security was (and remains) a responsibility of the end user. Encryption, authentication, policy enforcement and other security tools, are all available for CDH and Hadoop, even if they're not provided directly by Cloudera. A customer can work with Cloudera to locate the right vendor for their particular challenge and work collaboratively to build an integrated, secure Hadoop platform.
Data security - particularly Big data security - is quickly becoming a hot topic, as Hadoop-related projects migrate from pilot to production environments. We believe it's incumbent upon big data providers to develop an ecosystem of tightly integrated vendors to complete their security offerings. Cloudera has certainly done that.
Gazzang is proud to be certified Cloudera partner and will continue to provide enterprise-class data security to Hadoop users.
At Gazzang, we have a mantra that borders on religious fanaticism.
“Customers First. Always.”
It’s the reason we can claim deep expertise in securing unique, enterprise-scale big data environments. It’s the reason we know cloud encryption better than anyone else. And it’s the reason no one on our customer support team owns a bed.
Customers also have a significant impact on our product development cycles. A perfect example being today’s exciting Gazzang CloudEncrypt™ announcement.
Gazzang CloudEncrypt was designed to meet specific customer use cases for securing sensitive data at every stage of the Amazon EMR process. This is a very different challenge than encrypting data on a persisted cloud platform like Amazon EC2, which can be done with readily available solutions like Gazzang zNcrypt and zTrustee.
CloudEncrypt offers encryption and key management in ephemeral, burstable Amazon EMR processes. The solution, which you can read about in great detail in this white paper, was developed at the request of a handful of Gazzang customers that had two very clear needs in common:
More detailed customer use cases are covered in the white paper, but the top three we’ve heard thus far are as follows:
Customer feedback is a part of everything we do at Gazzang. The ability to learn from and innovate in response to what we hear from the companies we serve is a badge of honor that we wear proudly.
As always, we welcome your feedback on Gazzang CloudEncrypt, your solution for securing sensitive datasets and outputs on Amazon EMR.
Gazzang is hitting the road again. This time, we're in Atlanta. Home to the 1996 Summer Olympics, The Varsity and this week, MongoDB Atlanta. The annual conference is hosted by 10gen, the company behind MongoDB and a key partner for Gazzang. 10gen is a global leader in big data with an impressive customer list that includes Disney, Intuit, foursquare and CERN.
In advance of MongoDB Atlanta, I spoke with Matt Asay, 10gen's vice president of Corporate Strategy about use cases for MongoDB and why the Gazzang relationship is important for 10gen customers:
Gazzang: What are you and 10gen most excited about this year?
Matt Asay: Over the past few years, NoSQL went from an industry curiosity to a driving force for two of the industry's most important trends: cloud and Big Data. Along the way, MongoDB has established itself as the industry's most popular NoSQL database, with broad adoption by a range of customers. Going into 2013, we're seeing early experiments with MongoDB turn into enterprise-wide deployments for some seriously mission-critical applications. It's awesome to see.
But it's also to see how open source becoming such an integral part of how the enterprise builds and uses software. I'm particularly excited to see open source at the forefront of innovation now, and in 2013 I think we're going to see projects like MongoDB, Hadoop, Android, and the various open-source cloud projects drive huge value for consumers and enterprises alike. It's an exciting time to be involved with open source.
Gazzang: What are some of the unique big data challenges that 10gen helps customers solve?
Matt: While MongoDB is often used to manage large volumes of data, most enterprises actually think of "Big Data" in terms of data velocity and variety, as a recent NewVantage survey highlights. Looking at the results, a mere 28% of enterprises today see volume of data as a primary driver for their Big Data projects, falling to 25% in three years. Instead, a whopping 64% are motivated by the need to analyze streaming data, analyze new data types, or analyze data from diverse sources. That number rises to 68% in three years.
With MongoDB, enterprises:
Some of these involve huge quantities of data, but Big Data's value isnt' necessarily tied to volume. It's more about intelligently using one's data to engage customers or others in ways previously difficult or impossible.
Gazzang: Why do you think the health care industry has been quick to adopt MongoDB?
Matt: Few industries generate as much data as the healthcare, making the need to cost-effectively scale so important, something easily managed with MongoDB. We have also seen healthcare organizations keen to blend structured and unstructured data to improve care, and NoSQL databases like MongoDB are an excellent way to effectively embrace a wide array of data sources. I also think MongoDB's document data store is a great fit for how healthcare organizations want to structure their data.
Gazzang: Why is data security important to your customers?
Matt: Many of our customers are in highly regulated industries like Financial Services and Healthcare. Security for these industries is not only a nice-to-have, it's a firm requirement. As important as it is to be able to scale one's databases, and to accept an array of data sources, it's critical for firms in such industries to ensure customer or other sensitive data is secure. And while we build strong security features into MongoDB itself, we're also very happy to work with security solutions like Gazzang to offer an even higher level of security.
Gazzang: Why is the Gazzang relationship important for 10gen?
Matt: As I mentioned, we take our customers' data security very seriously, and have built in advanced security functionality like Kerberos Authentication help security-conscious customers rest easy. But Gazzang helps us to add an even richer layer of security to MongoDB, something especially important to customers in regulated industries.
Gazzang: What can attendees expect to see and learn at MongoDB ATL?
Matt: MongoDB ATL, like all MongoDB events, is very focused on enabling developers and IT operations to get productive with MongoDB. There are no vendor infomercials, from 10gen or any of our partners. We keep the agenda information-rich as our main concern is making sure more and more companies build exceptional applications with MongoDB.
Health care organizations are moving infrastructure and data to the cloud at a fairly rapid pace. A recent study suggests the cloud computing market in health care is expected to reach $5.4 billion by 2017. Enticing as the cloud is, when dealing with highly sensitive and regulated information, it's important to proceed with caution.
The good news for pharma companies, biotech firms and research hospitals - organizations most likely to move heavy big data payloads to the cloud- is that there are some security best practices that can protect data at rest in the cloud. Check out the Infographic below, or send us an email at info@gazzang.com.

This week, a team of security researchers pulled a list of 126-billion files from public Amazon S3 buckets.
Within a subset of these buckets was – you guessed it – plain text files, many of which contained sensitive information like sales records and employee data. This InfoWorld article does a good job explaining how simple it was to access this data and where the security breakdown occurred.
When it comes to securing data in the cloud, the customer ultimately needs to take responsibility. Network World featured a good dialogue on this very topic earlier in the week.
The cloud can be an incredibly safe place to store sensitive data and run business-critical applications; perhaps even safer than your own data center. But it’s up to the customer to make sure the right security controls are in place. Encrypting your data and maintaining control of your keys is the best place to start.
This security technique ensures your cloud provider or anyone running an unauthorized program or process cannot access the data. It’s also a necessary step toward enabling compliance.
If the aforementioned data in S3 were encrypted, even in a public bucket, the search results would have yielded nothing of value.
Our friends at the Linux Foundation just released their 2013 Enterprise End User Report, “Linux Adoption: Third Annual Survey of World’s Largest Enterprise Linux Users.” The report shows exactly what we are hearing from our customers – for enterprises using the cloud, Linux is by far the dominant platform. Of course as more information moves to the cloud, Linux data security is becoming a hot topic. Encryption and Key Management are the two most effective ways to protect your data against unauthorized access or malicious attack, while also helping to maintain regulatory compliance and reduce risk.
The full report is definitely worth a read. Some of our favorite stats include:
For more on Cloud Security, check out: http://gazzang.com/solutions/cloud-security
The countdown officially begins today for HIPAA Omnibus Rule Compliance, which includes important changes in the way the Department of Health and Human Services (HHS) handles breach penalties.
The new rules not only extend security and privacy requirements to business associates and contractors (such as billing companies and those that perform services on behalf of a health care provider), but they also give HHS greater discretion to impose substantial penalties, which in turn gives the agency increased leverage to obtain six- and seven-figure settlements to resolve potential penalty proceedings.
The rules go into effect today, but organizations have until September 23, 2013 to comply. Heed the warnings, don’t wait until it’s too late.
Encryption and key management both play a key role in achieving HIPAA compliance. They render Electronic protected health information (ePHI) unusable, unreadable, or indecipherable to unauthorized individuals. In the event of a data breach, encryption can help organizations protect sensitive PHI and may enable them to claim “Safe Harbor.”
Gazzang zNcrypt for Health Care™ can be applied easily, quickly, and economically as a solution for data privacy and security requirements defined within HIPAA and HITECH. Through AES-256 encryption, advanced key management, and process-based access controls, zNcrypt provides transparent data encryption for any database or application running on Linux, including big data environments.
Additionally, Gazzang zTrustee™ protects the Gazzang cryptographic keys with several layers of advanced techniques to ensure the key is only accessible by authorized parties.
For more information check out our HIPAA and HITECH Compliance Guide.
Nearly 15% of universities competing in this year's NCAA tournament were breached during the past year. According to a database maintained by the Privacy Rights Clearinghouse, the following schools experienced some form of unintended data disclosure since July 2012:
In some cases, the breaches exposed social security numbers, usernames and passwords and other forms of personally identifiable information (PII). A good reminder that any institution handling information on behalf of students needs to take extra precaution to secure the data and ensure it's following disclosure rules laid out in the Family Educational Rights and Privacy Act.
2012 was a big year for big data, and few organizations felt this industry shift quite like DataStax, the company driving enterprise adoption of Apache Cassandra. DataStax is an important partner of Gazzang, and we’re excited to sponsor and exhibit at their NYC* Big Data Tech Day this week.
In anticipation of the event, I spoke with Billy Bosworth, CEO of DataStax about a variety of topics including competing with Oracle, catering to customers and of course, securing big data. You can check out our Q&A below:
Larry: Big Data is a noisy space. What makes DataStax unique?
Billy: We are the first viable alternative to Oracle since Oracle. Let me explain what I mean by that. For decades, people have built their mission-critical online applications on Oracle. But the big data wave has caused a paradigm shift in the way online applications must be written. We tried sharded MySQL for a while, but that was way too complex for most businesses.
Now business continuity, scalability, and operational simplicity are “must haves.” That means distributed systems, with no single points of failure, which can span datacenters and clouds. A new bar has been set for mission critical apps, and we have the best solution on the market for those needs at enterprise scale.
Larry: How has the Big Data landscape changed for you in the past year?
Billy: It has changed in two important ways. First, the market now sees the two sides of big data: the data warehouse (Hadoop) and the online applications (NoSQL). Second, the lines of business realize the value in keeping their data hot within the context of the application.
In DataStax Enterprise, we solve all the hard challenges of scale, simplicity, and business continuity first, then we allow our customers to keep that data indexed for searching with Solr, and also allow for batch analysis with Hadoop. Finally, this is all controlled in a single, comprehensive security model. Having that type of comprehensive platform is crucial for lines of business to move quickly to attack new markets.
Larry: What is in store for you in 2013?
Billy: 2013 is the year that Apache Cassandra and DataStax Enterprise go mainstream. I think we will look back at this year as pivotal in the database industry. Our solution has reached the level of maturity where CIOs can integrate it into the enterprise and manage it like their traditional databases.
We are seeing an uptick of customers migrating applications from relational databases to Cassandra. It's no longer just for the early adopters; companies that aren't transforming their businesses through these modern databases risk being left behind.
Larry: What are some of the primary use cases for Apache Cassandra and DataStax Enterprise?
Billy: Thanks for asking that: I always say, "It's all about the use case." When you have a mission-critical, front of the business, transactional application that needs to run in real-time, and you need it to be built with disaster in mind, and you need to be able to scale it as you grow, then you need Apache Cassandra.
People quickly find that they also need to be able to operate the system simply even if one of their key people moves on, and to search and analyze data within context without slowing things down -- that's what they get from DataStax Enterprise. eBay, Adobe, Netflix, Ooyala, Healthcare Anytime and countless other organizations, big and small, use us for these reasons.
Larry: DataStax Enterprise 3.0 adds security controls unique in the NoSQL space. Why was that important for your customers?
Billy: Earlier I mentioned that this is the year of going mainstream. A significant barrier to adoption has been lack of database security in NoSQL databases. The Chief Security Officer at large enterprises has the power to stop the use of technologies that don't comply with enterprise security requirements. So while DataStax Enterprise, powered by Apache Cassandra, has been solving significant pains for the tech folks, it has been giving the security folks a bit of a headache until now with 3.0.
Larry: Why is Gazzang an important partner for DataStax?
Billy: It's important to note that we are focused on database security. There’s a big difference between security and compliance. When a customer needs a comprehensive security solution, we have found Gazzang to be a very strategic partner for us when we go into accounts that need to be fully compliant with things like PCI and HIPAA. Through our partnership we know we can address their concerns over the most stringent security and compliance standards.
Larry: What can attendees at NYC* Tech Day expect to see and learn?
Billy: We're very excited about NYC* Cassandra Tech Day, because it gives people the chance to network with their peers and learn from them, take a deep dive into different aspects of the database, discover new best practices, meet the experts in the field, and explore larger big data issues, such as use cases and the changing NoSQL landscape.
In a world flush with Big Data hype, I was pleased to read the Wall Street Journal story, How Big Data Is Changing the Whole Equation for Business. It includes some fascinating, in-production big data use cases. Real companies, using real data, for real insights. As I'm reading about Catalyst IT Systems, Zynga, Ford Motor Co. and others, I kept waiting to read about how are they actually securing big data. And by data, I'm referring to both the data input and analytical output. Where does it reside, and how is it protected?
For example, Caesars Entertainment is analyzing health insurance claim data for its 65,000 employees, and their covered family members. This includes how employees use medical services, how often they visit an emergency-room and whether they choose a generic or brand-name drugs. Tracking this data enables Caesars to find less-expensive healthcare alternatives and save millions in the process. An interesting use case to be sure, but one that involves the handling of HIPAA data.
InterContinental Hotels Group is analyzing information about its 71-million Priority Club rewards members, including income levels and travel preferences, which it then uses to run marketing campaigns. InterContental says the campaign has been a success, with a higher rate of customer conversions than a similar campaign run just a year ago. In this case, the data being collected isn't regulated, but it's still sensitive and personally identifiable.
For these companies, and others that collect and store large volumes of data, it should be a business priority to encrypt and properly manage the crypto keys.