Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compliance] GDPR / Privacy Compliance of indefinitely stored addresses can not be achieved #6108

Open
fthobe opened this issue Feb 8, 2025 · 12 comments

Comments

@fthobe
Copy link
Contributor

fthobe commented Feb 8, 2025

Concerns open PR: #3852 #3234 solidus_braintree #226

I am trying to wrap my head around this:

EU says clearly that personal data needs to be deleted upon customer request.

I do not see right now how we can make the address situation fit with that.

In Europe if the address is not needed for fiscal reason there's no reason to store it.

This means in Europe we need a solution to:

  • Delete an address after 10 + 1 years (in some countries it might even be less) for an address used within an order if, and only if, the address was used for a fiscal receipt created here (invoices are not included);
  • Delete an address immediately upon customer request if the address was not used on a fiscal receipt;
  • if an address reaches the age of consented storage (which can also be indefinitely if the privacy terms of a website are written accordingly) if not requested by a user earlier.

I don't see how to avoid that to comply with European regulations.

Solidus Version:
All

@fthobe
Copy link
Contributor Author

fthobe commented Feb 8, 2025

@tvdeyen @jarednorman @kennyadsl

I would like to directly put the finger in the flesh here:
Is policy regulations for the EU / Canadian market relevant or not? Because the current implementation of addresses is not compliant. Not even when the address is removed from the join table, deleting a record means deleting a record, not deleting the traces of or access to a record and it hasn't been for almost 6 years now (I think neither in Italy nor Germany for many years before that given regional laws predating EU standards).

I assume that's the reason addresses are made the way they are was to leave them for completed orders and tax records. That behaviour is by today's standards not legal in the EU and neither in Canada I believe.

I tried to leave additional comments in the PR, but assume @mamhoff is needing time to love me more and unblock me on his PRs.

Citing @kennyadsl

On the next major, we can remove deprecated code and the column safely.

I saw #3852 has been pushed across two major releases by now. We will have to implement this PR now on our fork as a hotfix or an alternative of it in any case and build on it to get in the end to a drop in replacement of current address model that deletes addresses, for real. It's not only unavoidable to get to deletable addresses, I can't see how privacy compliance in Europe is currently even a given if we don't.

So my question is what's the plan to tackle this and can we contribute anything to it.

For me personally this is a can't wait issue and I don't assume I am the only company all across Europe using this system. Honestly this is for me personally an issue that should be solved out of the community funds and the priority here is 120% and there's no way to push this topic any further down the road.

I assume that the discussion has been made internally between you already on how to solve it (as I can imagine this issue has been on all of your bucket lists and you have probably already solved it for some of your customers). If i'd be you I'd consider this an expense from the community fund and feel like @mamhoff would be up to the task given that he already achieved wonders on promotions (of course depending on his personal will and availability to do that).

Me personally I think this issue warrants a warning in the documentation given the legal consequences failures to comply can have in Europe.

Honestly I am not willing to touch it mostly because I had to fight with you over 4 months to get anywhere on meta data and this topic goes probably very deep into the logic of addresses on the platform itself without any clear guidance.

Given that this issue has implications for all merchants in Europe, Canada and UK and at least multiple states of the US[1], what do we want to do here? We can't leave it like this for sure.

It would be great to get any feedback from all of you or at least of 2/3 of you on how to move.

[1] California Consumer Privacy Act (CCPA), first passed in 2020 and amended in 2023 by the California Privacy Rights Act (CPRA)
Virginia Consumer Data Protection Act (VCDPA)
Colorado Privacy Act (CPA)
Connecticut Data Privacy Act (CTDPA)
Utah Consumer Privacy Act (UCPA)

@tvdeyen
Copy link
Member

tvdeyen commented Feb 8, 2025 via email

@fthobe
Copy link
Contributor Author

fthobe commented Feb 8, 2025

@tvdeyen i do not agree that the GDPR extension currently covers you.

I'd personally consider that solving this once for all for everybody would take some work, but considering that the EU privacy laws have settled very very well (there are not much changes to expect) it should be considered to either fully drop compliance (with also here a big disclaimer and not hindering you to implement it), document gaps or (at least try) to fully satisfy it (which probably means making some changes to how sessions work also).

I have some recollection of us discussing cookies in a comment, but this is such a well defined building block box that I would really consider to cover in one go:

  • Addresses to comply with right to forgotten
  • cookies
  • tracking consent
  • newsletter consent
  • consent api

I would love to just drill that down and I would consider a lawyer here as much as we did on the license things to at least have a glance at it.

Right now we can't reach compliance without significant effort in:

  • Europe
  • Canada
  • US States such as MA, CA (I do not have the full list here)

I imagine the US being a joyride on consumer issues the next 4 years as I expect states to behave differently according to political alignment on privacy issues.

I think we could have that ready by end of next month / mid april covering all aspects with a Lawyer underwriting it.

@fthobe fthobe changed the title GDPR / Privacy Compliance of indefinitely stored addresses GDPR / Privacy Compliance of indefinitely stored addresses can not be achieved Feb 8, 2025
@fthobe
Copy link
Contributor Author

fthobe commented Feb 8, 2025

The keep addresses forever feature should also be re-considered. There is no need to keep them for incomplete orders and that editing an address necessarily always creates a new address record without deleting the old - maybe invalid - address is debatable.

Keep in mind that storing addresses indefinitely is fine as long as the user consents to it. If you, let's say "store the data for the purpose of minimizing data entry for future orders" retention is absolutely within the rights till the user either

  • Alters
  • deletes it
  • deletes the whole account

There's nothing legally wrong with it.

The problem is that currently the data is kept even upon "cancellation".

This does have some side effects to:

For most erp / billing systems the address data is a mutable resource that gets "burned" into invoices. The invoice billing address is an immutable copy of the address upon submission while the address itself continues to be subject to changes

I would also consider this a feature with can have positive impacts on each of your sales. Solving this problem with compliant solution for say script injection for analytics based on consent is a strong sales pitch in 2024.

@fthobe
Copy link
Contributor Author

fthobe commented Feb 8, 2025

Which by the way given that a consent library could use some great jsonb array, it would be a magnificient reason to accept meta and extend the meta data for users to contain in the jsonb stored consent information for consent management.

@fthobe
Copy link
Contributor Author

fthobe commented Feb 8, 2025

I put some thought in addresses (be prepared for an essay about government standards, naming conventions and other absurdities like why names are not validated on credit card payments, I know you all love me for those). I am trying to drill down all problems with addresses here. I hope you can see this in the spirit of solving multiple address related problems in 1-3 PRs.

First name, last name, vat id, tax id, company name are all distinct individual fields for einvoicing in Europe according to en16931-1 and related standards on billing addresses (shipping address is much more easy going) and in the mind of the EU that's coming to everybody including B2B transactions soon and given that electronic receipts became standard as well, probably also online. Funny enough Italy was a leader in defining that standards. While I do not believe we need to offer the invoice generation, it should be a reasonable default to collect information needed for invoices at least in EU and NAFTA countries.

As much as I appreciate from a technical point of view the comment once made by Thomas quoted below (seriously, I agree with every word of it) the technical reality of doing business in Europe and US is just a different one regarding address schemes.

Quote from Thomas in #3234

The problem
Names :)
Seriously, read https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
Also issues like solidusio/solidus_paypal_braintree#226 happen
The proposal
We should add a name column to Spree::Address then deprecate and later remove firstname and lastname.

The behavior of PayPal and stripe is caused by the historical difference in how names are normalized in card payments (normalising proved to be so difficult that till today the name is only partially or not validated at all in the credit card exchange). Who wants to smile can read this article about that. Basically the name is not checked at all during card payments itself (some banks have anti fraud software though that validates names) as the construction of names across countries (and even in only the US) has proven difficult back then. The Braintree solution to that was not wrong (to concatenate the first name and last name) but dropping everything after a space was.

Anyhow, the direction is the one below: full xml field parity to allow collection of data for invoices with name, last name, company name according to need and I believe concatenating values here is a smaller problem than splitting them also because in the end for fiscal requirements name are normalised by the fiscal authorities and usually recoverable information through ID and / or company registry. So as much as there's outcome variability, it is usually drilled down by the authorities (admittedly much less in NAFTA than in the EU), which is also the reason that differently to good wine Thomas' remarks while technically spot on did not age well as good wine does.

This brings me to this:

  1. Yes, names have an (quasi) infinite outcome variability and Elon Musks son "X Æ A-12" reminds us of that;
  2. No, that outcome variability does not apply to solidus as fiscal transactions are happening here.

So I would account for what's universal standard in Europe and NAFTA which can be a very long, sometimes even alfanumeric, first name and last name often accompanied by a company name (with fiscal and vat id in Europe).

So what I would propose is following:

  1. implement the PR by Mamhoff
  2. extend the fields in main to have some time to fix stripe / PayPal
  3. implement a drop in replacement that contains all fields and has deletable addresses
  4. bugfix field association on payment / shipping modules
  5. there you have Solidus 5.1 and another reason to profoundly dislike me.

This is pretty much how Europe is seeing einvoicing (and therefor addresses):
image

Desired address format

  • Business / Individual / Public Goverment
  • First Name
  • Last Name
  • Company Name / public entity
  • VAT ID (which in a dream scenario can be validated against the free EU api with soft fail)
  • Tax ID
  • street 1
  • street 2 (c/o, cont'd)
  • zip
  • city
  • state
  • Country
  • phone for shipping
  • email for shipping as it's becoming more common

@mamhoff
Copy link
Contributor

mamhoff commented Feb 9, 2025

Let's keep this issue to deletion of addresses. My PR #3852 does not fully address this issue, as the address will continue being stored in the addresses table, but without any reference to users or orders. So we don't have a a way to fully delete an address currently, and that needs to be addressed (pun intendended).

Regarding the argument that we should add lots of fields to billing addresses, I think these arguments are all wrong.

  1. First name and Last name are not necessary for e-invoicing or EN16931-1 compliance. Here's for example the German implementation of that norm. All of the name fields are first name and last name combined: https://www.e-rechnung-bund.de/wp-content/uploads/2023/04/Uebersichtslisten-Eingabefelder-OZG-RE.pdf
  2. If you need to interop with services requiring first name and last name, we have an object that can help you with that: https://github.com/solidusio/solidus/blob/main/core/spec/models/spree/address/name_spec.rb
  3. The VAT ID is more appropriately stored with the user account, or - in some cases - with a "Company" entity that could have many users. Since addresses are used in many contexts, for example shipping, the presence of the VAT ID would violate the GDPR here (data frugality violation since the VAT ID is not necessary in this context). The same goes for the tax ID field.

@fthobe
Copy link
Contributor Author

fthobe commented Feb 9, 2025

I am trying really hard not to copy and paste your slack remark here, trying to stay positive and answer all your points exhaustively trying to believe that you are acting in good spirit here and not just to contesting whatever I say for the pleasure of doing it.

First Name and Last

  1. First name and Last name are not necessary for e-invoicing or EN16931-1 compliance. Here's for example the German implementation of that norm. All of the name fields are first name and last name combined: https://www.e-rechnung-bund.de/wp-content/uploads/2023/04/Uebersichtslisten-Eingabefelder-OZG-RE.pdf

TL;DR: Germany, Belgium and France don't, but others do and it does not hurt Germany to have it separated as you can always combine strings

Don't ask me why we ended up with 25 standards for one EU regulation, but that's how it is. Keep in mind that EU regulations are broad guidelines, not strict implementation rules.

Basically there are 5 concepts:

  1. First Name and Last Name as separated values such as in Spain
  2. Concatenated (Germany and France for example which by the way developed the standard together, but also Belgium) as you linked already.
  3. Company Name interchangeably used with First Name Last Name concatenated.
  4. Company Name not interchangeably used with First Name Last Name concatenated but as fully separate field.
  5. Interchangeably either 1 or 2 works (Italy for example with the extended format, but not with the simplified) for individuals, but company names are an entirely different field so you actually need to identify if your address is a company or private individual (which you can do through a company name field for example).

VAT ID should be stores with user accounts (Part 1)

The VAT ID is more appropriately stored with the user account, or - in some cases - with a "Company" entity that could have many users.

TL;DR: companies having one VAT ID is not a correct statement

A company can also have many VAT IDs actually and it is a frequently encountered problem, especially with ERP systems. So while setting it on the customer might work for many cases, it does not work for all. Putting the VAT ID / Tax ID in the address instead does work for all cases and is literally the same effort considering also the fact that reverse charge eligibility is determined by origin country, destination country and VAT ID vies participation.

That's the reason why many B2B stores display the prices incl / excl VAT.

Actually it gets even worse:
In some countries (Italy is among them) you have separate systems for fiscal correspondence where you have a destination code (imagine an email address just to receive invoices as xml) that are also a 1 customer to n destination code situations.

Cases with more VAT IDs

  • Many (Common) Stock companies (German equivalents are GMBH or AG) can have multiple different vat ids (and depending of your fiscal residence destination codes) to break costs down per departments or branch offices for accounting (which is very common in large companies), admittedly that's not possible in Germany, but occurs much more frequently eg in UK or other countries;
  • (Common) Stock companies and individual vat ids (gewerbeschein in case of Germany or partita IVA in case of Italy) are obliged by law in Europe to have a vat ID for every country where they have an unincorporated branch office.

So no, having them in the customer does not cover plenty of cases while keeping it in the address does actually do it.

Other Quirks to be considered

In addition, just to make life worse:
You need the VAT ID to calculate taxes in Europe as in many countries the VAT ID does not automatically allow you to receive cross border orders without VAT (which is mandatory for many B2B cross border transactions in Europe) if both parties participate in the VIES scheme (like pretty much everybody).

To give you the most extreme example:
Vendor is legally located in Germany but has a storage and fulfilment warehouse in the Netherlands:

  1. German Company ordering to the German HQ don't pay VAT
  2. German Company ordering to the dutch office do pay VAT
  3. German Company ordering in drop shipping to a European customer may or may not have to pay VAT depending on VAT ID and location
  4. German Companies ordering in drop shipping to an NL customer may or may not have to pay VAT.

To determine if a VAT ID is eligible to receive the order without VAT you need to consult this platform otherwise you risk having to backpay VAT (if the national fiscal authority doesn't reject your invoice straight away).

VAT ID should be stores with user accounts (Part 2)

The VAT ID is more appropriately stored with the user account, or - in some cases - with a "Company" entity that could have many users.

Every company who wants to order needs an invoice to detract VAT with a VAT ID, consumers don't. Our customer model in reality is a User Model. Why should we ram a VAT ID into the user if we have the above described outcome variability. Just putting it into the address covers all cases while not all users we have need one. It's called billing address for a purpose.

Privacy for VAT IDs

Since addresses are used in many contexts, for example shipping, the presence of the VAT ID would violate the GDPR here (data frugality violation since the VAT ID is not necessary in this context).

So you are right about the data frugality for consumer information, but you didn't get the application right in this case:

VAT-IDs are not covered by privacy for various reasons:

  1. GDPR does not apply to companies;
  2. VAT-ID is a public information stated in (especially in Germany) in the company imprint and registered at the chamber of commerce. Actually there's even a European platform to validate them (google VIES) and Germany provides (as all EU countries) an API to validate it;
  3. same goes in many jurisdictions (not Germany) also for tax ID;
  4. VAT-ID / fiscal ID is even needed on custom declarations so it's a natural part of shipment documentation and in some countries you NEED it for reverse charge on the shipping label or added documentation (hence why you can put it with DHL actually right there);
  5. VAT-ID is needed to actually determine if VAT needs to applied or not (hence VAT ID) inside Europe;
  6. But even if bullet points 1-4 wouldn't exist, that doesn't neither mean that you don't need the data for fiscal reasons nor that you can't just ignore it for shipping labels;

So i hope this exhaustively answers all your points.
The bottom line is: with the information stored in the address you can cover all cases, do not have privacy problems and have more granular control. Just jacking the VAT ID into the user / customer resource is just not smart. Admittedly that mistake has never the less been made in many systems forcing you to keep multiple customer resources for one billing account.

Let's keep this issue to deletion of addresses. My PR #3852 does not fully address this issue, as the address will continue being stored in the addresses table, but without any reference to users or orders. So we don't have a a way to fully delete an address currently, and that needs to be addressed (pun intendended).

Do you want to do it or should we give it a try? We would throw in an Address PR directly afterwards, otherwise we can try to fix the whole issue.

@mamhoff
Copy link
Contributor

mamhoff commented Feb 9, 2025

Please open two other issues: One for handling of the VAT ID, and one for the purported issues with the name field, so that we can decide separately on each of them. We're mixing concerns in this PR, and it's confusing. When doing that, please link to the resources you linked to above so we have a complete problem description for each of them. These are not linked (and should not be tackled by the same PR, either).

@fthobe
Copy link
Contributor Author

fthobe commented Feb 9, 2025

Regarding the argument that we should add lots of fields to billing addresses, I think these arguments are all wrong.

I have heard of people not dying by admitting they are wrong. 😬

Please open two other issues: One for handling of the VAT ID, and one for the purported issues with the name field,…

I would actually consider structuring it differently.

I mean, we should align with lowest possible pain to what’s legislative standard in privacy and data preparation for export (like API import into the system that actually handles stock, billing and taxes), so I would split this up in two Issues:

  1. change the address format
  2. Add new features to taxation

Also because calculating tax correctly (2) is not possible without a full address (1).

@fthobe fthobe changed the title GDPR / Privacy Compliance of indefinitely stored addresses can not be achieved [Compliance] GDPR / Privacy Compliance of indefinitely stored addresses can not be achieved Feb 9, 2025
@mamhoff
Copy link
Contributor

mamhoff commented Feb 10, 2025

Thank you for opening up the two issues. It's unfortunate that they still don't separate the schema changes out separately, so we keep discussing several things at once now in a separate issue.

Regarding the deletion of addresses:

Currently, every address is unique. That means, if two user accounts store the same address in their address book, that address is now shared between user accounts. If user a wants "their" data to be deleted, we need to delete only that user's address entry.

Addresses can be added without a user account. When changing an address on an anonymous order, the old address stays in the database.

Any solution to this problem should avoid the following bugs:

  • a) in case of a shared address, changing the address on an anonymous order should not remove that same address from another user's address book.
  • b) deleting one user's addresses should not delete another user's addresses.

I do not currently have time for working on this.

@fthobe
Copy link
Contributor Author

fthobe commented Feb 10, 2025

Thank you for opening up the two issues. It's unfortunate that they still don't separate the schema changes out separately, so we keep discussing several things at once now in a separate issue.

I have replied on the other issue why I believe that VAT-IDs do not require a separate schema. I see your point, just don't agree with it ;)

Regarding the deletion of addresses:

Currently, every address is unique. That means, if two user accounts store the same address in their address book, that address is now shared between user accounts. If user a wants "their" data to be deleted, we need to delete only that user's address entry.

Man ok, so two ways around it:

  1. we check upon deletion what other associations are present;
  2. we drop the entire address system and make a replacement with all associated pain.

How would orders behave if we kill of an address used in an order? Please be patient with me here.

Also is there a quantification of shared addresses? I have no idea how prevalent or non-prevalent the issue is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants