Test Data Manager

  • 1.  Issues with HASHLOV Function

    Posted May 31, 2018 08:17 AM

    Hi,

     

    I am using mainframe data maker to mask files .I have below issues with the HASHLOV function.

     

    I am using VSAM seedlist to mask a field. Seedlist has all unique values.I see for some of the records even though the values to be masked  are different  HASHLOV is hashing to the same value.Please see the example below

     

    eg:-

    Before making

    001

    003

    005

    001

     

    After masking

    010

    011

    011

    010

     

    values 003,005 are hashed to 011.

    I tried using XREF option also with HASHLOV but no luck. I can not use SEQLOV as I need consistent data post masking also.

     

    Could you please assist us to fix above mentioned issues with HASHLOV function for mainframe? Or suggest an alternate function ?

     

    Thanks,

    Abhinav 



  • 2.  Re: Issues with HASHLOV Function

    Posted May 31, 2018 08:26 AM

    Hi Abhinav,

     

    HASHLOV will work as below. 

     

    1)original value is hashed to a number. 

    2) total number of values in seed list is divided by number obatined in #1. This will give index number. 

    3) By using index number from #2, value is choosen from seedlist. 

     

    Please explain the masking required so that we can suggest a other function.

     

    some suggestions

    1) Increase the values in seed list. 

    2) HASHLOV will work with mostly string literals. 

    3) You can also use XREF by constructing a table for old and new values.

     

    Please let me know your thoughts after this, so we can discuss further.

     

    Regards,

    Raju Devolla.



  • 3.  Re: Issues with HASHLOV Function
    Best Answer

    Posted May 31, 2018 10:03 AM

    abhinav.bhatnagar

     

    FORMATENCRYPT might be a better masking function for your needs. Should be faster as it doesn't require a seedlist.

     

    However, if you aren't getting the necessary results, and still feel HASHLOV will get you what you need, then as RajuD mentioned, increasing the size of the seedlist will be the best help to avoid this scenario. In addition, setting "MD5HASHLOV=Y" may change the output enough to meet your needs. Using this value would be suggested in addition to increasing the seedlist size. However, this has possibility of increasing collisions (AKA non-unique values / 2 or more values being the same - exactly what you're encountering).

     

    Avoid using XREFs if possible. This will significantly slow down your masking. If it's unavoidable, use it where necessary of course.

     

    I've confirmed with the dev team and the description provided by RajuD on how HASHLOV works isn't 100% accurate. I cannot give full details to correct this as this is proprietary information. However, I can say that we use a MODULUS by the seelist size - this helps us ensure that we return an indexed/hashed value between 1 and the total size of the seedlist. This is also why increasing your seedlist is the best option that you could use, should you decide to stick with HASHLOV.

     

    Hope that helps!

    Sean



  • 4.  Re: Issues with HASHLOV Function

    Posted Jun 06, 2018 10:46 AM

    Hi,

     

    We used IGNORE function using XREF

     

    Thanks