Table of contents:

Configuration options and features description

Data Cleaner module ensures thorough and extensive search for Personal Identifiable Information (PII) through all pages in Confluence and their non-reversible anonymization, including anonymization at user's request. In addition, it has many built-in tools facilitating and automating working with vulnerable data at all stages: tracking, sending comments and notifications, reporting and ample opportunities for customization.

Ensure the "right to be forgotten" to replace all users, whose personal data have to be removed, with a single service user, specifically created for this purpose. In this case, if you have more than 2 anonymized users it will be impossible to trace them back as both of them will appear as a "service user" and with every anonymized user the quality of protection will increase.

Main options include:

pattern-based search (100+ most popular and widely used built-in patterns and custom search via regular expressions);
complete anonymization of vulnerable data, creating automated clean-up templates;
anonymization of a specific user's personal data at his request, ensuring his "right to be forgotten";
automated comments and notifications for other team members to keep track of PII anonymization status;
automated custom reports tracking the appearance of any vulnerable data.

Before you start:

check if you have access rights: Confluence Administrator;
сheck if you have defined a service user on the plugin configuration page, on behalf of which all further actions will be performed;

Please, pay attention that the anonymization process is non-reversible! We highly recommend to check it on a development (test) server first.

Configuration

Starting point: Data Cleaner Dashboard

The starting point of the working with the Data Cleaner Module is the Data Cleaner Dashboard. There are 2 ways to get to the Data Cleaner Dashboard:

Navigate to "Manage apps", find "GDPR and Security" section and click on "Data Cleaner" (at the bottom of the menu on the left).
Get to the Data Cleaner Dashboard from the GDPR and Security Home page: click on the "Data Cleaner" button.

The Data Cleaner Dashboard provides an overview of all created templates with their description and current status and allows to:

Start or stop search
Start or stop anonymization.

Start Search

"Start Search" button initiates a search through the content according to the template's configurations and makes a list of all affected items.

It is highly recommended to use this option before starting anonymization in order to understand what items will be affected and if the template's configuration needs to be changed.

To initiate the search, click the “Start Search” button.
Track the current search status at the Data Cleaner Dashboard.
After the task is finished, click the “Status and History” button to check the results and get more details.

Start Anonymization

The “Start Anonymization” button will initiate the processing of Confluence items according to your template's configurations.

To start the anonymization process, click on the “Start Anonymization” button.

Check current task status on the Data Cleaner Dashboard page. A more detailed report about Data Cleaner tasks you can find at the Data Cleaner History page.

Note: only one task can be processed, all other processes will wait in a queue. You can stop a current task anytime and start another one.

On the menu on the right of each template you can get more details on Status and History of the task, and also edit, clone or delete any template. To navigate to the Data Cleaner Status and History page click on the Details button in front of the template in the Actions column and choose “Status and History”.

Status and History page

Status and History page provides details on search/anonymization:

space - to what project the personal data belong,
page - to what issue the personal data belong,
object - a part of the ticket, where the personal data were found,
affected content - personal data that were found,
status - current status of tasks in progress and finished tasks, for example, "found" for found tickets (if the search process was performed), "anonymized" for anonymized tickets (if the anonymization process was performed),
rule - the rule that was used during the search/anonymization process.

Filtering processed Issues. You can filter the results by project, object, status, rule.

Sorting the results. It is possible to sort the found data by:

found time (sorting by default by the time of processing),
page create date DESC (descending order),
page create date ASC (ascending order).

On the Data Cleaner Status and History page it is also possible to start or stop search or anonymization. In order to initiate a new task or cancel a currently running one, click on the button at the top right corner of the Data Cleaner History page.

If you stop a current anonymization task the status will be changed to "Anonymization canceled".
If you start a new anonymization task the status will be changed to "Anonymization started".
If you stop a current search task the status will be changed to "Search canceled".
If you start a new search task the status will be changed to "Search started":

It is also possible to anonymize a specific piece of data right from the "Status and History" page, for that click on the "Actions" button on the right of a relevant line.

Templates

You can create your own Custom template with different rules or either use Predefined templates.

Predefined templates

User Anonymization template:

Use this predefined template to anonymize all personal data throughout Confluence.

Click on “Create User Anonymization template” button:
Fill in a Source user (for example, Jane Doe or Max Musterman) and a Target User (for example, a service user) and click “Create”.

A new template with predefined configurations for user anonymization will be created. Make changes and edit the template if necessary.

Click “Save” and “Back to Data Cleaner Dashboard”. You can now start to use your new “User Anonymization template”.

Predefined clean-up template: finding personal data in one click

The Data Cleaner module has 100+ built-in patterns: national IDs, SSNs, phone and credit card numbers for the majority of EU countries.

Use a predefined template to find and anonymize personally identifiable information of most common types.

Click on the “Create PII clean-up template” button.

A new template with predefined configurations for PII clean-up will be created. You can edit it if necessary.

Click “Save” and “Back to Data Cleaner Dashboard”. You can now start to use your new “PII clean-up template”.

Custom template: finding personal data with pattern-based search

It is possible to create a custom template with different rules. Click on the “Create custom template” button and you will see a form that you need to fill in. There are four main sections:

General configuration,
Data Processing Rules,
Post Functions,
Reporting options.

Part 1. General configuration

At General configuration section you will see 3 fields to fill in:

Template Name - any name that will separate the template among others,
Scope - define the scope of pages using the Confluence Query Language (CQL) to be involved in search or anonymization process,
Objects - Confluence page objects that will be used as a scope for search/anonymization, for example, Summary, Description, Comment, History, Creator, Reporter etc.

Sample: processing only new and updated pages. The app allows to create customized "data rules" and carry out the search only for new Confluence pages (or blog posts) using a flexible time frame for that, for example, "created after the beginning of the week" or "updated after the beginning of the week" till the current date. In other words, it is possible to track the appearance of vulnerable data once a day or once a week and not to scan all pages (or blog posts) if it has been done before. Processing only new and updated pages (or blog posts) will be very fast in comparison to processing all the existing ones.

After all the fields are filled in, click “Save” and you will be automatically redirected to the next part.

Part 2. Data processing rules

In the “Data processing rules” part, you can “Create new rule” or/and use one of the built-in rules.

In order to define your own custom rule, click on the “Create new rule” button:

Creating a new rule you have to define what to search and how to replace found data. The object for search can be defined in 3 different ways: as a plain text, regex and a user. Choose one of the replace options, fill in “Replace with” and click “Save”.

In order to use built-in rules, click on the “Add Built-in rule” button and choose one or multiple built-in rules then click on “Add selected rules”. You can tick the box "select all" and create a template with all built-in patterns or narrow down the scope by further filtering the data.

Filtering. Built-in patterns can be filtered by a particular country (for example, Germany, Austria, Italy) or personal data type (for example, phone and credit card numbers), or both.

Image: Built-in PII patterns overview in GDPR (DSGVO) and Security for Confluence

To use the filter start typing the name of the country or the personal data type, and the system automatically will show existing built-in patterns.

To check if the rules and patterns used in rules work correctly, click check issue matches and check rules for a specific issue, for example, a test issue with different types of PII.

A Confluence Administrator is able to edit, clone, disable and delete Data processing rules. Click on the Actions button next to the Data processing rule.

Note: After creation, the new Data processing rule has an “Enabled” status by default. In order to disable it click on the “Actions” button and choose “Disable”.

Part 3. Post Functions

Once PII is found, it needs processing. The app allows to leave standard comments to all vulnerable tickets and create and send automated notifications to employees responsible for these tickets. If the personal data are not important for the company or were added by mistake, they can be automatically deleted or replaced by XXX combinations.

Go to “Post functions” and click on “Add Post Function”.

In the opened window fill in the fields defining what to do with the found data:

Post function name,
Post Function type - add a comment or send a notification,
Select a notification recipient,
Subject,
Message.

After a post function is added, it is possible to edit, disable or delete it. Click on the “Actions button” next to the post function. Make changes and click "Update" to save them.

Part 4. Reporting options

Define what to do with final results after searching or anonymization is finished:

leave them as they are,
send to an email,
print to atlassian-confluence.log or catalina.out.

Choose the preferable option and click "Save". Now your template is ready for use.

GDPR references

The main GDPR principles include “Data minimization” and “Storage limitation”. According to Article 5, personal data must be:

“kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed; personal data may be stored for longer periods insofar as the personal data will be processed solely for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) subject to implementation of the appropriate technical and organizational measures required by this Regulation in order to safeguard the rights and freedoms of the data subject (‘storage limitation’)".

This quote highlights the importance of having personal data usage under control. In other words, data protection officers or other employees responsible for data protection must know exactly when such data appears, how to monitor it, and what further actions may need to be taken. For example, you need to know how deletion of personal data works (if it is outdated, deadlines have expired, or the purpose of the data processing is no longer relevant). In addition, notifying employees when they are not GDPR-compliant is also important.

Of great importance in the GDPR guidelines is also the "right to be forgotten", More information to that can be found in Article 17 of the GDPR:

"The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay, and the controller shall have the obligation to erase personal data without undue delay where one of the following grounds applies[1]:

the personal data are no longer necessary in relation to the purposes for which they were collected or otherwise processed;

the data subject withdraws consent on which the processing is based according to point (a) of Article 6(1), or point (a) of Article 9(2), and where there is no other legal ground for the processing;
the data subject objects to the processing pursuant to Article 21(1) and there are no overriding legitimate grounds for the processing, or the data subject objects to the processing pursuant to Article 21(2);
the personal data have been unlawfully processed;
the personal data have to be erased for compliance with a legal obligation in Union or Member State law to which the controller is subject;
the personal data have been collected in relation to the offer of information society services referred to in Article 8(1)".

The "right to be forgotten" means that if an EU citizen requests to have their personal data erased, a company should be able to track and delete them within the established time frame – unless there are legal grounds to keep this information.