Data Cleaner

Table of contents:

Configuration options and features description

Data Cleaner module ensures thorough and extensive search for Personal Identifiable Information (PII) through all pages in Confluence and their non-reversible anonymization, including anonymization at user's request. In addition, it has many built-in tools facilitating and automating working with vulnerable data at all stages: tracking, sending comments and notifications, reporting and ample opportunities for customization. 

Ensure the "right to be forgotten" to replace all users, whose personal data have to be removed, with a single service user, specifically created for this purpose. In this case, if you have more than 2 anonymized users it will be impossible to trace them back as both of them will appear as a "service user" and with every anonymized user the quality of protection will increase. 

Main options include:

  • pattern-based search (100+ most popular and widely used built-in patterns and custom search via regular expressions);

  • complete anonymization of vulnerable data, creating automated clean-up templates;

  • anonymization of a specific user's personal data at his request, ensuring his "right to be forgotten";

  • automated comments and notifications for other team members to keep track of PII anonymization status;

  • automated custom reports tracking the appearance of any vulnerable data.

Before you start:

  • check if you have access rights: Confluence Administrator;

  • сheck if you have defined a service user on the plugin configuration page, on behalf of which all further actions will be performed;

Please, pay attention that the anonymization process is non-reversible! We highly recommend to check it on a development (test) server first.

Configuration

Starting point: Data Cleaner Dashboard

The starting point of the working with the Data Cleaner Module is the Data Cleaner Dashboard. There are 2 ways to get to the Data Cleaner Dashboard:

  • Navigate to "Manage apps", find "Data Protection and Security Toolkit" section and click on "Data Cleaner" (at the bottom of the menu on the left).

  • Get to the Data Cleaner Dashboard from the Data Protection and Security Toolkit Home page: click on the "Data Cleaner" button.

The Data Cleaner Dashboard provides an overview of all created templates with their description and current status and allows to:

  • Start or stop search.

  • Start or stop anonymization.

"Start Search" button initiates a search through the content according to the template's configurations and makes a list of all affected items.

It is highly recommended to use this option before starting anonymization in order to understand what items will be affected and if the template's configuration needs to be changed.

  • To initiate the search, click the “Start Search” button.

  • Track the current search status at the Data Cleaner Dashboard.

  • After the task is finished, click the “Status and History” button to check the results and get more details.

Start Anonymization

The “Start Anonymization” button will initiate the processing of Confluence items according to your template's configurations.

  • To start the anonymization process, click on the “Start Anonymization” button.

Check current task status on the Data Cleaner Dashboard page. A more detailed report about Data Cleaner tasks you can find at the Data Cleaner History page.

Note: only one task can be processed, all other processes will wait in a queue. You can stop a current task anytime and start another one.

On the menu on the right of each template you can get more details on Status and History of the task, and also edit, clone or delete any template.  To navigate to the Data Cleaner Status and History page click on the Details button in front of the template in the Actions column and choose “Status and History”.

Status and History page

Status and History page provides details on search/anonymization:

  • space - to what project the personal data belong,

  • page - to what issue the personal data belong,

  • object - a part of the ticket, where the personal data were found,

  • affected content - personal data that were found,

  • status - current status of tasks in progress and finished tasks, for example, "found" for found tickets (if the search process was performed), "anonymized" for anonymized tickets (if the anonymization process was performed), 

  • rule - the rule that was used during the search/anonymization process.

Filtering processed Issues. You can filter the results by project, object, status, rule.

Sorting the results. It is possible to sort the found data by:

  • found time (sorting by default by the time of processing),

  • page create date DESC (descending order),

  • page create date ASC (ascending order).

On the Data Cleaner Status and History page, it is also possible to start or stop search or anonymization. In order to initiate a new task or cancel a currently running one, click on the button in the top right corner of the Data Cleaner History page. 

  • If you stop a current anonymization task, the status will be changed to "Anonymization canceled".

  • If you start a new anonymization task, the status will be changed to "Anonymization started".

  • If you stop a current search task, the status will be changed to "Search canceled".

  • If you start a new search task, the status will be changed to "Search started":

It is also possible to anonymize a specific piece of data right from the "Status and History" page, for that click on the "Actions" button on the right of a relevant line.

Templates

You can create your own Custom template with different rules, or either use predefined templates.

Predefined templates

User Anonymization template:

Use this predefined template to anonymize all personal data throughout Confluence.

  • Click on “Create User Anonymization template” button:

  • Fill in a Source user (for example, Jane Doe or Max Musterman) and a Target User (for example, a service user) and click “Create”.

A new template with predefined configurations for user anonymization will be created. Make changes and edit the template if necessary.

  • Click “Save” and “Back to Data Cleaner Dashboard”. You can now start to use your new “User Anonymization template”.

Predefined clean-up template: finding personal data in one click

The Data Cleaner module has 100+ built-in patterns: national IDs, SSNs, phone, and credit card numbers for the majority of EU countries.

Use a predefined template to find and anonymize personally identifiable information of most common types.

  • Click on the “Create PII clean-up template” button.

A new template with predefined configurations for PII clean-up will be created. You can edit it if necessary.

  • Click “Save” and “Back to Data Cleaner Dashboard”. You can now start to use your new “PII clean-up template”.

It is possible to create a custom template with different rules. Click on the “Create custom template” button and you will see a form that you need to fill in. There are four main sections: 

  • General configuration,

  • Data Processing Rules,

  • Post Functions,

  • Reporting options.

Part 1. General configuration

At General configuration section you will see 3 fields to fill in:

  • Template Name - any name that will separate the template among others,

  • Scope - define the scope of pages using the Confluence Query Language (CQL) to be involved in search or anonymization process,

  • Objects - Confluence page objects that will be used as a scope for search/anonymization, for example, Summary, Description, Comment, History, Creator, Reporter etc.

Sample: processing only new and updated pages. The app allows to create customized "data rules" and carry out the search only for new Confluence pages (or blog posts) using a flexible time frame for that, for example, "created after the beginning of the week" or "updated after the beginning of the week" till the current date. In other words, it is possible to track the appearance of vulnerable data once a day or once a week and not to scan all pages (or blog posts) if it has been done before. Processing only new and updated pages (or blog posts) will be very fast in comparison to processing all the existing ones.

After all the fields are filled in, click “Save” and you will be automatically redirected to the next part.

Part 2. Data processing rules

In the “Data processing rules” part, you can “Create new rule” or/and use one of the built-in rules.

In order to define your own custom rule, click on the “Create new rule” button:

Creating a new rule you have to define what to search and how to replace found data. The object for search can be defined in 3 different ways: as a plain text, regex and a user. Choose one of the replace options, fill in “Replace with” and click “Save”.

In order to use built-in rules, click on the “Add Built-in rule” button and choose one or multiple built-in rules then click on “Add selected rules”. You can tick the box "select all" and create a template with all built-in patterns or narrow down the scope by further filtering the data.

Filtering. Built-in patterns can be filtered by a particular country (for example, Germany, Austria, Italy) or personal data type (for example, phone and credit card numbers), or both. 

Image: Built-in PII patterns overview in Data Protection & Security Toolkit Confluence (DLP)

To use the filter start typing the name of the country or the personal data type, and the system automatically will show existing built-in patterns.

To check if the rules and patterns used in rules work correctly, click check issue matches and check rules for a specific issue, for example, a test issue with different types of PII.

A Confluence Administrator is able to edit, clone, disable and delete Data processing rules. Click on the Actions button next to the Data processing rule.

Note: After creation, the new Data processing rule has an “Enabled” status by default. In order to disable it click on the “Actions” button and choose “Disable”.

Part 3. Post Functions

Once PII is found, it needs processing. The app allows to leave standard comments to all vulnerable tickets and create and send automated notifications to employees responsible for these tickets. If the personal data are not important for the company or were added by mistake, they can be automatically deleted or replaced by XXX combinations.

  • Go to “Post functions” and click on “Add Post Function”.

In the opened window fill in the fields defining what to do with the found data:

  • Post function name,

  • Post Function type - add a comment or send a notification,

  • Select a notification recipient,

  • Subject,

  • Message.

After a post function is added, it is possible to edit, disable or delete it. Click on the “Actions button” next to the post function. Make changes and click "Update" to save them.

Part 4. Reporting options

Define what to do with final results after searching or anonymization is finished:

  • leave them as they are,

  • send to an email,

  • print to atlassian-confluence.log or catalina.out.

Choose the preferable option and click "Save". Now your template is ready for use. 

CCPA and GDPR references

With help of our app, you can, for example, comply to following guidelines:

Use case

CCPA

GDPR

Use case

CCPA

GDPR

You, as a business, must delete some kind of personal information according to

Section 1798.105

Requirement under “right to deletion”: Upon a valid consumer’s request to delete personal information, a business must direct any service provider to delete consumers’ personal information.

Article 17

Requirement under the “right to erasure” or “right to be forgotten”: Data subjects have a right to request erasure to the controller. Upon a valid request for erasure, controllers are obligated to take reasonable steps to have processors erase data.