Data Classification Process (MIS)
Data
classification is the process of analysing structured or unstructured data and
organizing it into categories based on file type, contents, and other metadata.
Data
classification helps organizations answer important questions about their data
that inform how they mitigate risk and manage data governance policies. It can
tell you where you are storing your most important data or what kinds of
sensitive data your users create most often. Comprehensive data classification
is necessary (but not enough) to comply with modern data privacy regulations.
Effective
Information Classification in Five Steps
1. Establish
a data classification policy, including objectives, workflows, data
classification scheme, data owners and handling
2. Identify
the sensitive data you store.
3. Apply
labels by tagging data.
4. Use
results to improve security and compliance.
5. Data is
dynamic, and classification is an ongoing process.
Data
Classification Process
Data
classification processes differ slightly depending on the objectives for the
project. Most data classification projects require automation to process the
astonishing amount of data that companies create every day. In general, there
are some best practices that lead to successful data classification
initiatives:
1. Define
the Objectives of the Data Classification Process
·
What are you looking for? Why?
·
Which systems are in-scope for the initial
classification phase?
·
What compliance regulations apply to your
organization?
·
Are there other business objectives you want to
tackle? (e.g., risk mitigation, storage optimization, analytics)
2.
Categorize Data Types
·
Identify what kinds of data the organization
creates (e.g., customer lists, financial records, source code, product plans)
·
Delineate proprietary data vs. public data
·
Do you expect to find GDPR, CCPA, or other
regulated data?
3.
Establish Classification Levels
·
How many classification levels do you need?
·
Document each level and provide examples
·
Train users to classify data (if manual
classification is planned)
4. Define
the Automated Classification Process
·
Define how to prioritize which data to scan first
(e.g., prioritize active over stale, open over protected)
·
Establish the frequency and resources you will
dedicate to automated data classification
5. Define
the Categories and Classification Criteria
·
Define your high-level categories and provide
examples (e.g., PII, PHI)
·
Define or enable applicable classification patterns
and labels
·
Establish a process to review and validate both
user classified and automated results
6. Define
Outcomes and Usage of Classified Data
·
Document risk mitigation steps and automated
policies (e.g., move or archive PHI if unused for 180 days, automatically
remove global access groups from folders with sensitive data)
·
Define a process to apply analytics to
classification results
·
Establish expected outcomes from the analytic
analysis
7.
Monitor and Maintain
·
Establish an ongoing workflow to classify new or
updated data
·
Review the classification process and update if
necessary due to changes in business or new regulations
Purpose
of Data Classification
In the most recent Market Guide for File Analysis Software, Gartner
lists four high-level use cases:
1. Risk Mitigation
·
Limit access to personally identifiable
information (PII)
·
Control location and access to intellectual
property (IP)
·
Reduce attack surface area to sensitive data
·
Integrate classification into DLP and other
policy-enforcing applications
2.
Governance/Compliance
·
Identify data governed by GDPR, HIPAA, CCPA,
PCI, SOX, and future regulations
·
Apply metadata tags to protected data to enable
additional tracking and controls
·
Enable quarantining, legal hold, archiving and
other regulation-required actions
·
Facilitate “Right to be Forgotten” and Data
Subject Access Requests (DSARs)
3.
Efficiency and Optimization
·
Enable efficient access to content based on
type, usage, etc.
·
Discover and eliminate stale or redundant data
·
Move heavily utilized data to faster devices or
cloud-based infrastructure
4.
Analytics
·
Enable metadata tagging to optimize business
activities
·
Inform the organization on location and usage
of data
It’s important
to note that classifying data—while a foundational first step—is not typically
enough to take meaningful action to achieve many of the above use cases. Adding
additional metadata streams, such as permissions and data usage activity can
dramatically increase your ability to use your classification results to
achieve key objectives.
Building an Effective Data Classification
Policy
A
data classification policy is a document that includes a classification
framework, a list of responsibilities for identifying sensitive data, and
descriptions of the various data classification levels.
In general terms, data classification policies are made up of
a classification framework and a list of responsibilities for identifying
sensitive data. The classification framework will usually involve a description
of the various levels of classification used. Data classification policies
should not attempt to provide restrictions for how data is handled, as this is
a separate task that requires its own detailed policy document.
Successful Data Classification Policy –
Step by Step
There are five key steps you need to take to develop and implement
a successful data classification policy. These steps are outlines below:
Step 1 – Getting help and establishing
why. You will need to ensure that you have the
approval and help of key stakeholders within the business, in particular the
board. These people need to understand the importance of data classification
and the reasons why a policy is necessary. With the help of these stakeholders,
you can develop a pitch as to why a data classification policy is required and
the goals your policy is hoping to achieve.
Step 2 – Defining the scope of the
policy. You need to define the amount of information
within your organization that will fall under the policy, what forms that data
takes and where that data is stored.
Step 3 – Define responsibilities. You
now need to determine which people within your organization will be responsible
for maintaining your classification policy and the roles each person and
department will play.
Step 4 – Define your classification
levels. We have spoken briefly about classification
levels in an earlier blog, but we will go through them again briefly. Separate
your content into four different levels based on their risk. Restricted data
poses the greatest threat, followed by high risk, medium risk and low risk.
Ensure your definition for these levels, whichever language you end up using,
is concise and unambiguous.
Step 5 – Schedule regular reviews. The
data classification policy needs to be regularly reviewed and refined to ensure
it stays current with compliance regulations and your business structure.
Define the process and timeline for these reviews in the policy.
A good classification policy:
1. Uses
criteria that are straightforward and avoid ambiguity, but that are generic
enough to apply to different data sets and circumstances
2. Is
clear and written in simple language
3. Fits
the organization’s business
4. Is
limited to 3 or 4 classification levels
5. Contains
a point of contact for clarification
6. Establishes
a review schedule
Successful Data Classification Policy
Each
organization will have their own unique data classification policy, there is no
one size fits all. However, there are some commonalities that successful
policies share. Here a few of the indicators of a successful data
classification policy:
·
You have to
get your classification criteria nailed down. They should be broad enough that
they encompass all data in some way, but specific enough to avoid ambiguity.
·
A successful
data classification policy will be written in the language of the business,
will be clear and concise, and will resonate with employees.
·
The best
policies are the simplest ones. Try and keep it to just a few pages and no more
than four classification levels.
·
It should make
employees aware of the person within the organization that is responsible for
resolving any potential problems with classification policy that might arise.
·
A good data
classification policy should make regular reviews a priority and a necessity.
Reviews should take place at least quarterly so as to keep abreast of any new
compliance regulations or changes within the company.
Data Classification Policies Fail
Unfortunately,
many data classification policies fail before they even get off the ground. If
you don’t want this happening to you then you should avoid the following
pitfalls:
·
Using overly
complicated language, jargon, abbreviations and other complexities that make it
difficult for employees to get to grips with the meaning of the document.
·
Developing
policies and practices that do not fit in with the organization’s workflows and
is not backed up by employee training for implementation.
·
If fails to
explain to employees, the importance of data classification and the reason why
the policy has to be implemented.
·
The policy is
written and then left for a long period of time without regular reviews to
update and improve.
How to Select a Data Classification
Solution
Look
for these features:
1. Compound
term search —
Improves accuracy by minimizing false positives and false negatives.
2. Index — Enables you to identify sensitive
terms without re-crawling the data.
3. Flexible
taxonomy manager —
Makes it easy to add and modify terms and rules.
4. Workflows — Automatically takes specific
actions when a document is classified in a certain way. For example, a workflow might move sensitive data
away from a public share.
5. Breadth
of coverage — Supports
both cloud and on-premises data sources, including both structured and
unstructured data.