-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
District cleaner #310
base: master
Are you sure you want to change the base?
District cleaner #310
Conversation
Deduplicate the diff please. Something like: -GREATER MUMBAI
+Mumbai Don't need to know the remaining fields, just a unique list of districts impacted by the change |
What's the coverage of the change? (How many districts are matched, and left unmatched?) |
Only 23 out of 16320 districts are left unmatched and required manual patches. Rest of the districts matched with the list |
Updated the gist |
The changes are too aggressive. -NEAR NEW MONDHA (ANAJ MANDI) HINGOLI
+Gandhinagar
-IN FRONT OF KANYASHALA
+Kalahandi
-RAVI STEEL CHOWK, KAMRE, RATU ROAD
+Amravati
-BLOCK- KANDHLA, DIST - SHAMLI
-NAI BAZAR, BHARWARI
+Hazaribagh
-PATTI, PAKHWANIA
+Panipat
-PO-AKHAR, DUDHER
+Dhar
-LEFT BANK, ALEU, NEW MANALI, DISTT - KULLU
+Dibang Valley
-TAL JAWHAR DISTT THANE
+Jalandhar
-NIKETAN ASHRAM, DISTT. PAURI
+Amristar Don't think we can merge this till we're sure about the accuracy of the data. In the meanwhile, I've found a nice source for an official list of districts india, with district codes that we can perhaps use. https://lgdirectory.gov.in/. Here's a cleaned up version: https://github.com/planemad/india-local-government-directory/blob/main/administrative/2-district.csv It's missing a few districts, I've filed a PR for that. |
Updates the 'DISTRICT' field using fuzzy matching to match the closest standardized district name. District names are taken from here
The changes made to the dataset can be seen here