Demystifying Pega Blob and UDF: Unraveling the Core Concepts-Part 1

Prior to delving into the subject, it's important to understand the importance of integrating Blobs in Pega.

In Pega, Blobs (Binary Large Objects) serve a crucial purpose due to the nature of the data they are designed to store. Here are several reasons why Blobs are essential in Pega:

1.Storage of Multimedia and Large Data: Blobs are specifically designed to handle binary data, such as images, audio files, videos, or other multimedia objects. These types of data can be quite large and complex, requiring a dedicated storage mechanism. Blobs efficiently manage the storage and retrieval of such large and unstructured data.

2.Versatility in Data Handling: Pega applications often deal with diverse types of content beyond simple text or numerical data. Blobs provide a versatile way to handle a wide range of data formats, allowing applications to work with multimedia content seamlessly.

3.Integration with Processes and Cases: In many business processes or cases managed by Pega applications, there is a need to associate and work with documents, images, or other multimedia elements. Blobs enable the seamless integration of these elements into the application workflow, facilitating a comprehensive approach to data management.

4.Enhanced User Experience: Pega applications often involve user interactions with various forms of media. Whether it's uploading images, attaching documents, or working with multimedia content, Blobs contribute to an enhanced user experience by providing a smooth and efficient way to handle such data.

5.Efficient Data Handling: Blobs are designed for efficient storage and retrieval of binary data, ensuring optimal performance when dealing with large files. This efficiency is crucial for maintaining application responsiveness and scalability, especially in scenarios where substantial amounts of multimedia data are involved.

6.Support for Unstructured Data: Blobs are particularly useful for managing unstructured data, which doesn't fit neatly into traditional databases. This is common for multimedia content where the structure can vary widely, and Blobs provide a flexible solution for handling such data.

In summary, Blobs in Pega address the unique requirements of handling large, unstructured binary data, providing a robust and versatile solution for applications that involve multimedia content or other forms of complex data.

What is Blob?

A Blob (Binary Large Object) is a data type designed for storing binary data, commonly associated with multimedia objects like images or audio files. Due to their nature, Blobs usually demand more storage space compared to other data types such as integers, characters, or strings. These types of data fall into the category of unstructured data, in contrast to semi-structured data like XML files, which Pega utilizes. It's important to note that both unstructured and structured data are typically not directly interpretable by the Database Management System (DBMS). Instead, the application is responsible for creating and editing the content, while the DBMS serves as the storage repository.

How does Pega utilize Blob?

Within Pega, certain tables, like those found in the CustomerDATA schema such as the Declare Index table or Data Type table, lack a Blob column. However, a majority of tables incorporate a Blob column, specifically named "pzPVStream," signifying its pivotal role as a key technology supporting Pega's data management capabilities.

Below is the process how Pega stores data into the database.





1.Obfuscation

While Pega data is presented in XML form, it undergoes obfuscation, making it difficult for someone to understand. Essentially, "obfuscation" serves as a way to mask data, adding an extra layer of security by making its meaning unclear. While it's similar to "encryption," obfuscation doesn't need a secret key and doesn't provide the same high level of security as "true" encryption, since a skilled hacker could potentially decode it. It's worth noting that the obfuscation algorithm used by Pega is their own and isn't publicly disclosed.

2. Compression

Following obfuscation, Pega employs the java.util.zip libraries to compress data, potentially reducing the size of Blobs by a third or more. Compression is activated by default in the DeflateStreams settings, although it can be disabled through Dynamic System Settings if desired.

3.Encryption

In the Pega Platform, you have the option to encrypt the storage stream (BLOB) by utilizing either a platform or custom cipher. 
To implement this, update the class forms by following these actions: 
  • Open each class form intended to hold instances with encrypted Storage Stream values, then select the "Encrypt BLOB" checkbox located on the General tab. 
  • Save the class form.
  • Repeat these steps for all classes.
The benefits of employing Blobs include:
  • Reduced storage overhead through Compressed Blob.
  • Absence of size constraints.
  • Simplified management of complex or nested structures.
  • Elimination of the requirement for a Database Administrator (DBA) to execute intricate changes to the database schema.
  • Swift access to single objects.
  • Permits the evolution of the object model.
  • Relationally maps only the columns that are strictly required, eliminating the necessity for extensive SQL construction.
  • Enhances agility in data handling.
What is the need of UDF in Pega?

Pega utilizes Java User-Defined Functions (UDF) to facilitate reporting, eliminating the necessity to optimize each property individually. The UDF is integrated into SQL statements wherever the property is referenced.

What is UDF?

A Java User-Defined Function (UDF) is a function crafted in the Java programming language, serving to execute customized calculations or operations on data within a database management system or big data technology supporting Java UDFs.

In databases such as Apache Hive and Apache Impala, Java UDFs offer an extension of the query language's functionality. This extension allows for the execution of custom operations on data stored in Hadoop or other prominent big data systems. Java UDFs are invoked similarly to built-in functions, providing the capability to perform intricate calculations and manipulate or transform data in ways not achievable with standard SQL commands.

Commencing from PRPC6.2 SP2, a collection of User-Defined Functions (UDFs), also referred to as "Blob reader" or "DirectStreamReader," has been introduced. These UDFs serve the purpose of enabling the direct retrieval of scalar property values from Blob in the database.

There are three User-Defined Functions (UDFs) installed in the database:

(1) pr_read_from_stream

(2) pr_read_int_from_stream

(3) pr_read_decimal_from_stream

These functions share identical structures, differing only in the data type of the returned value (String, Int, Decimal). Installation encompasses both the rules and data schemas. The UDFs are uniquely defined and loaded across our four supported databases. On Oracle, PostgreSQL, and DB2 LUW/ZOS, they are implemented using Java, while on Microsoft SQL Server, they are implemented using C#. Notably, these are the inaugural UDFs we distribute, marking our initial venture into executing Java (not to mention C#) directly within the database.

How to make use of UDF in Pega?

1.Prerequisites

To install UDF, database environments like Oracle, PostgreSQL, and DB2 require Java to be enabled, while for Microsoft SQL Server, CLR (Common Language Runtime) must be enabled.

2.Installation

It is automatically installed as part of the standard installation and upgrade processes. If you wish to opt out of the UDF installation, you can modify the setupDatabase.properties file in the ./scripts directory of the distribution media. The default value is blank, implying false and resulting in UDF installation. To skip UDF installation, set it to true. This configuration applies not only to command-line installation but also GUI (IUA) installation. It's important to note that UDF installation is optional, and all other functionalities operate seamlessly without UDF.

Note: Starting from Pega 8.8 onwards, the default value is configured as true. Consequently, UDF is not installed by default, eliminating the need to modify the file to skip UDF installation.



3.Parameters for UDF

  • Reference (ref): A property reference indicating the scalar property to return. The property specification must start with a "." (period).
  • Instance Key (insKey): The handle (pzInsKey) of the instance from which you wish to obtain the value, or NULL.
  • Blob Column (stream): The name of the Blob column (pzPVStream).
Sample Query using UDF:

SELECT pr_read_from_stream(".PropertyName", "InstanceKey", "BlobColumnName") FROM tableName WHERE conditions;

Replace:
  • ".PropertyName" with the desired scalar property reference.
  • "InstanceKey" with the handle (pzInsKey) of the instance or NULL.
  • "BlobColumnName" with the Blob column name (pzPVStream).
  • "tableName" with the actual name of the table.
  • "conditions" with any applicable conditions for filtering.


Concluding part 1 here as the current article has extended in length; the remaining content will be addressed in part 2 here Demystifying Pega Blob and UDF: Unraveling the Core Concepts-Part 2

Happy Learning :):)

Comments