Data Cleaner for Confluence: "How to use" guide

Overview

Our guide shows you what the Data Cleaner for Confluence can be used for. After all, data protection comes with plenty of aspects that need to be taken into account. Our numerous features offer you many ways to use Confluence in a data privacy compliant way. Here we explain the best practices.

1 Overview
2 Data Cleaner Dashboard
3 Search templates creation page
- 3.1 Predefined PII cleanup template creation
  - 3.1.1 General Configuration
  - 3.1.2 Data Processing Rules
  - 3.1.3 Post Functions
  - 3.1.4 Total Results Post Functions
  - 3.1.5 Scheduling
- 3.2 Custom template creation
  - 3.2.1 General Configuration
  - 3.2.2 Data Processing Rules
  - 3.2.3 Post Functions
  - 3.2.4 Total Results Post Functions
  - 3.2.5 Scheduling
4 Search and Anonymization
- 4.1 Search
- 4.2 Anonymization
5 Quick Anonymization
- 5.1 Anonymize Users
- 5.2 Anonymize Content
6 Use cases

The best way to get started with Data Protection and Security Toolkit for Confluence is to create simple search templates to find personal user data with just a few clicks. Afterwards, you can anonymize them, for example.

Data Cleaner Dashboard

Let’s start with opening the Confluence administration section using the settings button at the right top corner. Then, choose any item in the dropdown menu. After that, you will see the sidebar – find the Data Cleaner module and click on it. Now, you’ll see the Data Cleaner Dashboard page.

Data Cleaner Dashboard page

Search templates creation page

Find Custom anonymization templates section at the Dashboard.

You can create your own template with the Create Custom template button, or use a predefined template to search for Personally Identifiable Information (PII) by clicking on the Create PII cleanup template button.

Predefined PII cleanup template creation

Click on the Create PII cleanup template button, notice our announcement about rules and click Create.

Announcement window

Take a look at the Edit Template page. As the template is predefined, all its parameters are already set up, and you can directly start using it.

General Configuration

Let’s take a look at the General configuration parameters in detail:

Parameter Name	Default Value	Description

Parameter Name	Default Value	Description
Template name	Personally Identifiable Information Template	The name is predefined and will be displayed at your Dashboard.
Scope (CQL)	type in (page, blogpost)	Here, you can set where the search should go through. This CQL request involves all the pages and blog posts in Confluence.
Objects	Title, Content body, Comment	Here, you can set which Objects the search should go through. Title, Content body, Comment are predefined.

Data Processing Rules

At this tab, you can define what exactly you are going to search for. Rules for Standard international phone numbers, Email addresses and Standard credit card numbers are already added and enabled.

Post Functions

At this tab, it’s set how you can react to search results. See the disabled parameter Add comment to the ticket. You can enable it with the Actions menu/Enable option if you want to leave some comments or you can delete it with Actions menu/Delete option if you just want to find the personal data and analyze the scope of the affected objects.

Total Results Post Functions

Total Results Post Functions tab for this template is empty by default.

Scheduling

Scheduling for this template is disabled by default.

Click the Save button at the top of the page to save your changes.

Custom template creation

It is possible to create a custom template with different rules. Click on the Create custom template button and you will see a form that you need to fill in. There are the same sections and fields as for the predefined template:

General configuration,
Data Processing Rules,
Post Functions,
Total Results Post Functions,
Scheduling.

General Configuration

Here are the General configuration parameters which you can set:

Parameter Name	Default Value	Description

Parameter Name	Default Value	Description
Template name	Empty	Enter any name that will separate the template among others.
Scope (CQL)	Empty	Here, you can set where the search should go through. Use CQL format for this field. Find hints about the CQL format by clicking on Advanced Searching using CQL link under the field.
Objects	Empty	Here, you can set which objects the search should go through. It can be fields of the text type (Title, Content body, Comment) or fields of the user type (Content creator/updater, Comment author, Attachment author) – just choose one of them.

Data Processing Rules

In the Data processing rules section, you can create a new rule or/and use one of the built-in rules.

In order to define your own custom rule, click on the Create new rule button and view the Data rule window.

Here is an overview of the Data rule parameters which you can set:

Parameter Name	Default Value	Description

Parameter Name	Default Value	Description
Rule name	Empty	Enter any name that will separate the rule among others.
Search type	Plain text	The object for search can be defined as plain text, regex, user and any user – just choose one of them.
What to search	Empty	Enter the text, regex or select user from the dropdown menu which you want to find.
How to replace	None	Choose how the found data will be replaced: None: in this case, found data will not be replaced. Empty string: in this case, found data will be replaced with the empty string. User: this option is available only for user type of search and in this case, one user will be replaced with another one. Plain text: in this case, the found text will be replaced with a new one. Symbols (replace just “###“-string): in this case, found data will be replaced with the symbols ###. Symbol respect string (size of “#“-symbols will be equals size string): in this case, found data will be replaced with a number of symbols “#” depending on the length of the found string.
Replaces with	Empty	Enter the text you want to replace the found text with or select the user to replace the found user with from the drop-down menu.

Click the Save button to save your settings.

In order to use built-in rules, click on the Add Built-in rule button and choose one or multiple built-in rules then click on Add selected rules. You can tick the box Select all and create a template with all built-in patterns or narrow down the scope by further filtering the data.

Built-in patterns can be filtered by a particular country (for example, Germany, Austria, Italy) or personal data type (for example, phone and credit card numbers), or both.

To use the filter start typing the name of the country or the personal data type, and the system automatically will show existing built-in patterns.

A Confluence Administrator is able to edit, clone, disable and delete Data processing rules – just click on the Actions button next to the Data processing rule.

Note: After the creation, the new Data processing rule has an Enabled status by default. In order to disable it, click on the Actions button and choose Disable.

Post Functions

The app allows to leave standard comments on found objects and create and send automated notifications to employees responsible for these objects. Click on the Add Post Function button at the right top corner of the page and view the Post function window:

Let’s take a look at the Post function's parameters in detail:

Parameter Name	Default Value	Description

Parameter Name	Default Value	Description
Post function name	Empty	Enter the post function name in this field.
Work with rules	Empty	Choose the rule from the dropdown menu for which the post function must perform.
Applied for any Rule	Disabled	Enable the checkbox if you want to apply the post function for all rules.
Post function type	Empty	Choose one of the post function’s type from the dropdown menu: Add comment to the Ticket: when you choose this option, you can set the comment text in a Comment field and this text will be displayed for all objects of the scope. Send email notification to: when you choose this option, you can select the notification recipient (last updater or watchers), enter the subject of your message, and text of your message in a Message field.

Click the Save button to save your settings.

Total Results Post Functions

At this tab, you can set the reporting type if you want to receive the notification about search results. Choose the reporting type:

Send Notification: In this type, reports are sent by e-mail. Enter emails where reports must be sent to the Emails field.
Print to atlassian-confluence.log: This type means writing the information to the server log.

Click the Save button to save your changes.

Scheduling

Use the scheduling functionality when it’s necessary to provide recurring actions for the Confluence objects. Let’s look at the Scheduling parameters in the template which you can set:

Parameter Name	Default Value	Description

Parameter Name

Default Value

Description

Enabled

Disabled

Enable it to execute the template by cron.

Cron Expression

0 0 1 * * ?

Set the periodicity of the task in cron format. Default value executes the template every day at 1 a.m. You can find hints about cron expressions when you hover the “?“ sign.

Task execution type

Search

Choose one of the task types:

Search: in this case, the template will provide recurring search only.

Anonymization: in this case, the template will provide recurring search of the objects and their anonymization.

Click the Save button after edited everything. Now, your template is created.

Search and Anonymization

At the Dashboard, you can see the created template.

Search

Click on the Search button at Action's column to start the search of personal user data. It can take some time depending on the amount of content in your Confluence.

When the search is done, you see its dates and the message “Task finished“ at the Status column. Click on it, and you’ll see the results on the Data Cleaner – History page.

On this page, you can see all names of the affected pages or blog posts (Content object column), fields where the data was found (Object column), and the data itself (Affected content column).

All found elements can be directly anonymized with the Start anonymization button at the top of the Data Cleaner – History page (find it circled in red on the picture above).

On the 3d-party addons anonymizers tab, you can see the anonymization results when the anonymization executes in third-party add-ons.

After everything is set up, you can go back to the Data Cleaner – History page and start searching again with a Search button (loupe picture at the top of the page). It can be very useful if you need to check the differences and find new existing content which matches your search pattern.

Anonymization

Open the Data Cleaner Dashboard, find your template, and click on its Anonymize button at Action's column to start the anonymization of personal user data. Confirm your action before starting in the popup window:

When the anonymization is done, you see its dates and the message “Task finished“ at the Status column. Click on it, and you’ll see the results on the Data Cleaner – History page.

On this page, you can see all names of the affected pages or blog posts (Content object column), fields where the data was found (Object column), and the status Anonymized. Now, personal user data is not posted anywhere in Confluence, and the column Affected content is empty. You can check the content pages to make sure the data is anonymized.

Quick Anonymization

You can start quick user or content anonymization without the creation of any template just using the dialog window.

Anonymize Users

Step 1: Open the Date Cleaner Dashboard and select New User Anonymizer.

Step 2: In the next screen, you will see this Dashboard:

Step 3: Then, you can start typing any user’s name (here, Lucas Miller) to select the Source user, the person whose content you want to anonymize. You can also select already deactivated users. After that, you can choose a Target user to take the place of the source user after anonymization.

As target user, you can either select an already existing user, or you can create your own dummy user, which will then take over the ownership.

Now, you can either start the anonymization as it is our you can select the custom anonymization:

Step 4: Now, you can again select the dry run or the direct anonymization and click on Next.

Step 5: In this step, you’re able to pick some post anonymization options should select your Source User and Target User again. Click on Next.

Step 6: It’s time to further define the Scope of the projects, which will then be anonymized. Make sure to hit the checkmark to Include all ticket related content, to not miss any relevant content. You can also search for users in projects that are already achieved!

These options will work properly only for internal users. If users are synchronized with an Active Directory, there may be complications due to the permissions of the Active Directory. All user data on the Jira side will be deleted, but there is no option to delete the data on Active Directory sides as well. Please keep this in mind when selecting these options.

Step 7: Before the process begins, we'll give you an overview of everything that will happen afterwards. Transparently, you can see which user will be replaced by which user and which scope you have selected for the project.

Please keep in mind that anonymization cannot simply be undone. Only if you are absolutely sure about this process, you should perform the anonymization.

At this point, it is necessary to check the box Yes, I confirm that I want to anonymize the data irreversibly.

Hit Anonymize to start the process.

Step 8: When the process is finished, you will see the results like this:

Congratulations, with a few clicks, you have transferred the ownership of one user to another and deactivated him at the same time!

Anonymize Content

Open the Data Cleaner Dashboard and navigate to the Anonymize Content section. Click on it, and you’ll see the dialog window where you must follow some steps:

Step 1: Choose the task type — Search data or Anonymize data, and click on the Continue button.

Then, if you’ve chosen the Search task follow the next steps:

Step 2: Choose a user in the Target user field to substitute any other user in the Confluence objects.

Step 3: Define the scope of the Confluence pages for which the task must run. Use CQL format for this field. The Include all related content checkbox is enabled by default for this type of search.

Step 4: Check if you’ve set everything correctly and click on the Search button.

When the search is done, you see the results on the Data Cleaner — History page.

If you’ve chosen the Anonymize task follow the next steps:

Step 2: Choose a user in the Target user field to substitute any other user in the Confluence objects.

Step 3: Define the scope of the Confluence pages for which the task must run. Use CQL format for this field. The Include all related content checkbox is enabled by default for this type of search.

Step 4: Review and confirm that you want to anonymize the data irreversibly using the checkbox at the bottom of the dialog window, and click on the Anonymize button.

When the anonymization is done, you see the results on the Data Cleaner — History page. You can check the content pages to make sure the data is anonymized.

Click on the View History section at the Data Cleaner Dashboard to view history logs of quick anonymization processes.

Use cases

Data Cleaner for Confluence can be useful for:

deleting of personal data of an employee, e.g. for an employee who left the company,
anonymizing users while importing or sharing the content,
clean data when someone requested their right to “opt-out”,
anonymizing specific areas in Confluence.