Informix

 View Only
Expand all | Collapse all

column level encryption and SQL performance

  • 1.  column level encryption and SQL performance

    Posted Thu August 06, 2020 09:29 AM
    Hello

    We are being asked about encrypting SSNs and see that a simple sql such as

    select id from t_contact where ssn = '350507267'

    goes from lightning quick on returning a result and using an existing index to becoming very slow and doing a table scan to return a row after encryption (looks like it must be un-encrypting all rows to get to the one needed)

     SELECT id FROM t_contact WHERE decrypt_char(ssn, 'XXXXXXXXXXXX') = '350507267'

    what are the options to quicken the return result - or are there none?

    thanks

    ------------------------------
    tom lehr
    ------------------------------

    #Informix


  • 2.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Thu August 06, 2020 10:09 AM

    Have you tried a functional index on the SSN column?

     

    Cheers

    Paul

     






  • 3.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 10:04 AM

    Once you wrap a function around a column, you render the index unusable, because how could it use the index? There's no point in indexing an encrypted column, by the way, because it won't encrypt the same way twice. Any index on that column should be dropped.

    The initial temptation would be to create a functional index that decrypts, but that completely defeats the security purpose, as the encrypted data would now be stored unencrypted in the index.

    The correct way to do this is to store a CRC of the unencrypted data in a separate column. So you've got two things stored in the table: The encrypted data, which is decryptable but not searchable; and the CRC, which is searchable but not decryptable. If someone wants to search by SSN, you generate the CRC for that SSN and search the CRC column for that; then, if necessary, decrypt the encrypted field.  Thing to note here is that it's not possible to do any sort of substring search here. Has to be an exact match.

    Hope this helps.



    ------------------------------
    TOM GIRSCH
    ------------------------------



  • 4.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 10:24 AM

    Surely by definition a functional index uses a function on the column

     

    But yep, the index would be stored in 'clear'.

     

    As a SSN is a fixed length then you can easily (ish) reverse the CRC back to the SSN

     

    Cheers

    Paul

     






  • 5.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 10:38 AM
    I suppose it's hypothetically possible? But you'd have to know what CRC algorithm was used and what kind of salt was involved

    ------------------------------
    TOM GIRSCH
    ------------------------------



  • 6.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 10:39 AM
    Another way to put it: Create and store a hash, and index the hash.

    ------------------------------
    TOM GIRSCH
    ------------------------------



  • 7.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 11:54 AM
    You could index a hash of the unencrypted SSN using a functional index and not store the hashed value in the table at all. Then encrypt the index by placing it in an EAR dbspace so it cannot be hacked. You would just have to use the hash function to search for the SSN you are looking for!

    create index my_ssn_hash_idx on mytable( hashfunc( decrypt( SSN ) ) in ear_dbspace;

    select * from mytable 
    where hashfunc( '123456789' ) = hashfunc( decrypt( ssn ) );

    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 8.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 12:03 PM

    This approach would work, but for simplicity, I'd suggest including the decrypt-and-hash logic in the one function, both for simplicity's sake and because I'm not sure if decrypt_char is defined as NOT VARIANT (a requirement for inclusion in a functional index).

    So:

    create index my_ssn_hash_idx on mytable( decryptandhashfunc( encrypted_ssn ) in ear_dbspace;


    In our case, we found it easier to just store/index the hash and eschew the functional index. But your use case may be different.

    Also, to Art's point, if you're not using encryption at rest for dbspaces containing potentially sensitive data, you should be.



    ------------------------------
    TOM GIRSCH
    ------------------------------



  • 9.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 12:10 PM
    Tom:

    Two issues:
    1. If you store the hashed value and index it then searching that column with a hash function return of the search filter will not be able to use the index which will not be a functional index so any search will be a sequential scan. OK if the table is small, but ...
    2. The problem with using a unified hash_and_decrypt function to build the index is the search filter has to use the same function or the optimizer will not be able to use the index. How do you have the function correctly know that the value you are passing in is not encrypted? I guess a second, constant, argument could be used, but ...
    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 10.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 01:15 PM

    In both cases, I think we may be talking past one another. In case 1, I know it works because we do it here. The hash function exists entirely outside the database. In our case, it's an e-mail address rather than an SSN, but it's the same idea. We have two columns, roughly enc_email and crc_email, stored in the table. The crc_email column is indexed and searchable. When the record is initially stored (or the e-mail address updated), we encrypt the e-mail address using ENCRYPT_AES, and use a code library to generate the hash/crc value for that e-mail. If, later, we want to search on, e.g., tom@girsch.com, the code knows that it has to generate the crc for that and search on that.  So, e.g.:

    SELECT *
      FROM customer
     WHERE crc_email = '[crc-for-tom-at-girsch-dot-com]';


    crc_email is a char column, because the number generated by the CRC function is too big to efficiently store in a numeric data type. The query above uses a straight equality index search. As simple as can be, from a database perspective.

    Now what you can't do in our case is an ad-hoc SQL using "tom@girsch.com" because that's not what's indexed. The CRC is. But that, IMO, is a feature and not a bug. The whole point here is to disable unauthorized access. So you're accessing/searching protected information through the application or not at all.

    Side Note One: It's not strictly necessary to do the hash function external to the DB for this to work.  You could build a gen_crc function/procedure and do the following and it would work fine:

    SELECT *
      FROM customer
     WHERE crc_email = gen_crc('tom@girsch.com');

    Side Note Two: For any of this to work, of course, the algorithm that generates the hash/crc must always produce the same output given the same input. That's why a CRC is used here rather than encryption. If I plug the same string into the encryption function, I get 50 different outputs. If I plug the same string into the CRC algorithm 50 times, I get the exact same result 50 times.



    ------------------------------
    TOM GIRSCH
    ------------------------------



  • 11.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 01:18 PM
    To point 2, I see your point, but I think it's also a problem with using DECRYPT in a functional index period. You need to provide the encryption key in order to decrypt; if you specify that as a parameter to the function, you can no longer build a functional index around it. And for security reasons you certainly don't want to hard-code the encryption key, nor do you want to store it with the record. So a functional index around decryption is, IMO, not feasible at all without creating a pretty glaring security hole.

    ------------------------------
    TOM GIRSCH
    ------------------------------



  • 12.  RE: column level encryption and SQL performance

    Posted Fri August 07, 2020 12:23 PM

    Just wanted to chime in with slightly different angle.

    Supposedly, the requirement for encrypting SSN aside from literal interpretation has a goal of not being able to connect personal information with the data collected in the database.

    Therefore, instead of encrypting and using actual SSN as a key - it's clear the proper secure implementation will include conversion of (encrypted) SSN to some internal "newkey" and then using it for data retrieval / operations. 

    Typically it should be done once through strongly audited process, kind of like 

    newkey = setcontext ( xxxxxx, xxxxx, xxxxx, ....)

    and encrypted information should not be limited to only SSN, but also include name, address, etc.

    The side effect is to reduce using SSN (or it's simple hashes) in the normal application flow, replacing it with "meaningless" key.

    Getting actual personal information from the "context" (newkey) - should be also tightly controlled and audited process, and application must be designed so that it's not normally needed.

    My 2 cents.



    ------------------------------
    Vladimir Kolobrodov
    ------------------------------



  • 13.  RE: column level encryption and SQL performance

    IBM Champion
    Posted Fri August 07, 2020 01:22 PM

    Vladimir:

    Agreed 100%

    We have [effectively] a meaningless customer ID (it's a SERIAL column) that we use to tie everything together. When a customer identifies themselves to us, we use the methods I described above to find and retrieve their customer ID, and use that for everything else moving forward.

    But worth pointing out, as you have, that using SSN (or any PII) as a key (especially a join key) is Very Bad.



    ------------------------------
    TOM GIRSCH
    ------------------------------



  • 14.  RE: column level encryption and SQL performance

    Posted Mon August 10, 2020 09:19 AM
    Thanks  to all for your input!
    Tom

    ------------------------------
    tom lehr
    ------------------------------