*************
** Problem **
*************
Like in the lectures, we will discuss privacy.
We want to create a database of people's addresses open to queries by researchers.
We want to ensure that this database doesn't let researchers deduce sensitive information
about individuals, such as that individual's address with any certainty.
For this problem, we can assume that there are 4 people: Euler, Gauss, Noether, Jane
Each person can be in one of two locations: Norway, Argentina
Researchers can submit queries to our databaes in the form of a function:
>> T attack(database)
We will construct a privacy policy which prevents attackers from knowing too much about a specific
individual:
forall X : Name.
forall Y : Address.
forall o : T.
Pr[ (forall person in pop. person.name == X => person.address == Y)
| pop in Belief() and attack(pop) = o ] < 0.9
We reject any program which for which we can't prove this privacy policy holds for a certain "belief"
distribution.
*********
** PSI **
*********
Get from: https://github.com/eth-srl/psi
Run with "--expectation" as the argument for readable results.
** Builtins: 5 builtin functions we'll be using today:
0. infer(f : 1 -> T) : Distribution[T] - perform nested inference on the program f and return the distribution of the results.
1. observe(e : number) : void - conditions the joint probability distribution of the program variables
on the event that the expression is not 0 (booleans are assumed to be numbers with 0 as false). This only sets the
conditional probability within the scope of an infer.
2. expectation(p : Distribution[number]) : number - returns the expected value of a distribution
3. assert(e : number) : void - does nothing if e doesn't evaluate to 0, and throws an error otherwise.
4. flip(p : number) : number - returns 1 with probability p, and 0 with probability (1 - p).
** Loops:
We only have bounded loops, and their syntax is much like python's. Instead of "range" and "xrange"
there is "[a..b]", "[a..b)", "(a..b]", , "(a..b)" which specifies a range using mathematical open/closed
set notation. "[a..b]" for example means "a, a+1, ... , b - 1, b" whereas "(a..b]" means "a+1, ... , b - 1, b".
**************
** Modeling **
**************
In PSI we can model the database as a length four array of integers with possible values either 0 or 1 to represent
being in Norway or Argentina.
The predicate mentioned above can be encoded with the following PSI program:
>> def main() {
>> for o in [0..4] {
>> for p in [0..4) {
>> for l in [0..1] {
>> def one(){
>> people := belief();
>>
>> i := attack(people);
>> observe(i == o);
>> return people[p] == l;
>> }
>> assert(expectation(infer(one)) < 0.9);
>> }
>> }
>> }
>> }
***************
** Exercises **
***************
1. How would you program a uniform belief distribution in PSI?
2. How would you program the attack query which returns how many people live in Norway?
3. Is this attack considered safe with the uniform belief via the privacy policy?
4. Is the query which returns true if more than half the people live in norway safe?
5. Suppose you have program states A=[0,0], B=[0,1], C=[1,0], D=[1,1] corresponding to variables x,y, with probabilities
0.1, 0.2, 0.3, 0.4 respectively. Suppose x = Bernoulli(0.2), what are the probabilities of the updated states?
6. Bonus: How would we encode the additional predicate that from a location, you can not infer who lives there?