-
Notifications
You must be signed in to change notification settings - Fork 2
Doing Complex Stuff ...
Note: this article is somewhat complicated. It might help (but isn't necessary) if you review the other articles (mainly "Working with Boolean Data") in this wiki.
Great, you have a multi-column (this isn't necessary), multi-line (seriously?—you have a one-line .csv file ... 😩 ) .csv file and you want to encrypt it with differential privacy so devious data firms 👿 can't get their hands on individual data. First, you'll need to visit the lap-mechanism.py in the master branch or in the latest release. Then, call lapmech and pass in the following parameters:
This is somewhat self-explanatory, but you should pass in something like this open('random_file.csv', 'r') which is the file to be made differentially-private. Make sure this file has reading permissions.
This should be the name of the file of which the differentially-private version of data will be dropped into. This file does not need to exist before running the program! It should look something like this: 'random_file_name.csv'
This is one of the most important parameters in the entire algorithm. Unlike the random response mechanism, epsilon allows us to quantify privacy loss. This number should be a positive float like 3.14 or 2.71. Play around with different values to find which one works best but remember that smaller values will cause significant data changes while larger values may compromise privacy. Find the value that best works for your survey!
Another benefit of the Laplace mechanism is that it allows for query-adaptive differential privacy! This means that depending on the statistics that you intend to show from your study, the data can be configured differently to allow for accurate results yet minimal privacy loss. f should be a function that takes in an array (all the elements in a certain column of the data) and produce an output (like the average, standard deviation, etc.). This should be a valid python function!
There are two other parameters—sample_size and delta_f. If you're an expert 👨💻 you may recognize this as the sensitivity of f. If you understand the basics behind the Laplace mechanism, feel free to add in the correct value of that parameter if known ahead of time. If you're not an expert or you don't know what this parameter should be, just pass in an integer to sample_size by writing something like sample_size=15. Make sure this is an integer. This represents the sample used to generate an idea of what delta_f should be. The more examples sampled the more accurate the resulting dataset will be, but this will add more computational cost. If you're unsure of what to set this parameter to, just leave it blank.
This is complicated, and you can see an example of using this algorithm on the lap-mechanism-ex.py. Make sure to download the example .csv files, too. Good luck and happy coding! Made with ❤️