Informix

 View Only
Expand all | Collapse all

Data Masking in Informix

  • 1.  Data Masking in Informix

    Posted Thu March 09, 2023 04:46 AM

    Dear All,

    Is "data masking" feature  available in INFORMIX ? If it is not available pl. let us know the similar features in INFORMIX ? 

    Thanks !

    Best Regards,

    Indika 



    ------------------------------
    Indika Jinadasa
    ------------------------------


  • 2.  RE: Data Masking in Informix

    Posted Thu March 09, 2023 06:01 AM

    There is not a "data masking" feature in INFORMIX.

    A request for enhancement was created months ago at:

    https://ibm-data-and-ai.ideas.ibm.com/ideas/INFX-I-467

    IBM has the InfoSphere Optim family of products that can achieve this and work with several databases:
    https://www.ibm.com/analytics/optim

    This is a feature I advocate for. Even external "plug-ins" cannot solve the real problem: In systems with sensitive data like healthcare environments, Even DBAs / Informix should not be able to "see" some data. But still able to perform selects on database.



    ------------------------------
    Vicente Salvador Cubedo
    ------------------------------



  • 3.  RE: Data Masking in Informix

    IBM Champion
    Posted Thu March 09, 2023 06:13 AM
    Hi,

    what is your understanding of "data masking" ?
    Should a specific column be "masked" in terms that the original content is overwritten with a number of "xxxxxxx" to make it unreadable ?
    Should it be depending on the role of the user ? (e.g. only for specific users at query time, but the original content is stored unmasked) 
    Should data of a table be stored encrypted ? (this is not masking)

    MARCUS HAARMANN






  • 4.  RE: Data Masking in Informix

    Posted Thu March 09, 2023 08:39 AM

    In my experience, data should be stored unmasked as some "client" should retrieve original data and others masked data.

    There should be a definition of masked columns: tablename + columname 

    By default, if a column is defined as masked depending of the column type:

    CHAR,VARCHAR,LVARCHAR -> should retrieve a basic hash E.G. MD5 of the content

    DATE OR DATETIME should change aleatory the day of the date and change hour minute or second if timestamp

    NUMERIC COLUMN (FLOAT, DECIMAL, INTEGER) should retrieve a numeric hash of the column using some internal algorithm so same original number always masks to same destination number.

    If a column is defined as hashed we need to know who is able to see the original data. All other connections should mask data in the resultset.

    So, here is the tricky think: implementation

    1. You can add a list of users able to see original data without masking
      Cons of this model is that a dba can add himself to this list, retrieve "protected" data and then remove himself from the authorization list. In this "sensible" info no unauthorized users should be able to get unmasked data
    2. You can define a password for clients to see unmasked data
      This password can ge defined at the database level and then only sessions sending proper password are able to see unmasked data (E.G. SET ENVIRONMENT mask_passwd "ABC123";)
      Then the installer of the application can set the database mask pasword and nobody (even the DBAs) are able to get unmasked data. Only at the application level, the client session is able to send the password and nobody knows it.

    Possible issues: in first implementation, I would allow only CHAR columns to be masked/hashed. The best ways is to allow lauch proper sqls without knowing the details, that is the reason I propose using a hashing algorith, so you can select how many names have your patients without knowing the real name.
    Cons of hashing vs masking is that some column sizes should change for example MD5 hashing of a CHAR(15) column, returns a CHAR(32) string

    This can be a mess, so maybe Informix should allow to mask/hash only columns larger than 32 or cut the hashing if column size is smaller.

    E.G.

    original CHAR(15) with content "ABCD" will return cb08ca4a7bb5f96

    original CHAR(32) with content "ABCD" will return cb08ca4a7bb5f9683c19133a84872ca7

    That's my two cents.



    ------------------------------
    Vicente Salvador Cubedo
    ------------------------------



  • 5.  RE: Data Masking in Informix

    IBM Champion
    Posted Thu March 09, 2023 09:33 AM
    If by masking the OP means:

    To create test data for non-production environments, such as application development and QA, while protecting personally identifiable, personally sensitive or commercially sensitive data. Development and QA department personnel usually do not have required security clearance to access actual production data. However, in order to create and test robust business applications, they need to work with data sets that closely resemble real-world data.

    Then check out AGS's Server Studio's Data Management Module which supports this feature.


    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.









  • 6.  RE: Data Masking in Informix

    Posted Fri March 10, 2023 02:30 AM

    As Art says. See S2 E2 here:

    https://www.serverstudio.co.uk/video-demos



    ------------------------------
    Doug Lawry
    Oninit Consulting
    ------------------------------



  • 7.  RE: Data Masking in Informix

    Posted Fri March 10, 2023 04:39 AM
    Hi Vicente,

    I'm not sure about the hashing. I think, "masking" simply should not show any meaningful data, i.e. something like just "XXXX..." or the zero-digit for numerical data (possibly 0 for integers and 0.0 for decimals/floats).

    Wouldn't hashing make the data somewhat "readable"?
    Once the hashing algorithm is known, any user could himself hash a value he's interested in, then use this hash value in a where clause to get all the rows with the value he's interested in ... no? You only could use "=" in the where clause, i.e. no ">" or "<", but still a hash value provides too much info, I think.

    Regards,
    Martin
    --

    Martin Fuerderer

    Software Engineer

    hcltechsw.com

    ::DISCLAIMER::

    The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.






  • 8.  RE: Data Masking in Informix

    IBM Champion
    Posted Fri March 10, 2023 05:58 AM

    Martin:

    The problem with just replacing all PII with 'xxxxx', if this is for the purpose of masking production data for test use, is that the queries will not behave as they would with the real data because every identifier will have the same value of 'xxxxx'! You have to use a masking or obfuscation that leaves the distribution of values similar to production.

    Art



    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 9.  RE: Data Masking in Informix

    Posted Fri March 10, 2023 08:47 AM
    Hi Art,

    agree. The purpose for using masking should be clarified.

    I never used LBAC. It may provide a solution to some aspects of the requirement? I just know that setting up and using LBAC is not the easiest thing to do.

    Hi Andreas,

    I don't think that the intention of masking is to change the underlying data. I assume that it should display non-identifiable data to non-authorized users, while it should still be possible to display the original data to authorized users.

    Regards, Martin

    --

    Martin Fuerderer

    Software Engineer

    hcltechsw.com

    ::DISCLAIMER::

    The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.






  • 10.  RE: Data Masking in Informix

    IBM Champion
    Posted Sat March 11, 2023 02:39 PM

    Hi,

    Hashing is not used for masking as you can build a raibbow tables to crack that.

    SQL Server has various methods but they do not appear to maintain the data distribution

    https://learn.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking?view=sql-server-ver16

    DB2 has the CREATE MASK option https://www.ibm.com/docs/en/db2/11.5?topic=statements-create-mask

    where you define how to want to mask the data and you can even use group permissions to determine what is seen.

    David.



    ------------------------------
    David Williams
    ------------------------------



  • 11.  RE: Data Masking in Informix

    IBM Champion
    Posted Fri March 10, 2023 05:37 AM

    I'm not clear at which level such 'masking' is desired:  should data in storage be altered, physically/permanently (which would make it unusable for e.g. WHERE clauses, indices, etc.), or should it only be obfuscated when actually delivered to a client, i.e. at the point where it's being put onto the wire, or anything in between?  Should e.g. stored procedures also not be able to 'see' the real data? And probably more aspects ...



    ------------------------------
    Andreas Legner
    ------------------------------



  • 12.  RE: Data Masking in Informix

    IBM Champion
    Posted Sat March 11, 2023 02:43 PM

    Hi,

    Generally if ti is built into the database product  the real value is stored.

    That can be a problem, in the easrly versions of dynamic data masking for sql server people found ways to decode the underlying data.

    https://www.red-gate.com/simple-talk/blogs/unmasking-the-dynamic-data-masking/

    and there were defect logged e.g. https://support.microsoft.com/en-us/topic/kb4505726-fix-dynamic-data-masking-does-not-function-as-expected-in-sql-server-2017-1011127f-8c96-d5ea-6280-59ae9bc5b98a

    If it is an application that transfers and masks the data then the database gets the masked version only.

    David.



    ------------------------------
    David Williams
    ------------------------------