Apply a function to all column names of a dataframe. It may not always be the case that I want to apply to_camel_case as the column transformation function. Because of this func is an optional argument with a default value of to_camel_case
Apply a function to all column names of a dataframe
for df in [contact_methods, contacts, gifts]: transform_cnames(df)
Validating that the contacts column names are now CamelCase
contacts = pd.concat([ contacts,# Dataframe of donors not in contacts pd.DataFrame(donors_not_in_contacts[['DonorNumber', 'FirstName', 'LastName']] .drop_duplicates() .rename(columns={'DonorNumber': 'Number'}) .drop_duplicates() .to_dict('records'))])
Spliting rows with Multiple People
contacts.FirstName.head(1)
0 Karita & Kelvin
Name: FirstName, dtype: object
Split the names on ’ & ’ or ’ and ’, then expand the resulting lists into new rows
contacts[['FirstName', 'SecondaryFirstName']] = contacts['FirstName'].str.split(' & | and ', expand=True).fillna('')
Aditionally, going to look for any records where there’s a duplicated Number
The following Python code will return True for rows where the Number was deemed to be a duplicate:
contacts.Number.duplicated()
For all the rows that are duplicates, I’m going to:
1. Place them in a seperate variable 2. Remove them from the original dataset 3. Add them as the 2nd Person on the non-duplicated row 4. Ensure they have proper first & last names
for record in records_to_join: contacts.loc[contacts.Number.isin([record['Number']]), ['SecondaryFirstName', 'SecondaryLastName']] = [record['FirstName'], record['LastName']]
Adding Secondary Last Name where it was previously missing