GPT Solution

Ask GPT to perform the data cleaning

Configuring OpenAI Credentials

openai.api_key = os.getenv("OPENAI_API_KEY")

source

GptPrompt

 GptPrompt (messages)

Class For sending messages to Open AI using GPT-3.5 Turbo 16k 0613

OpenAI Solutoins

Solutions to a subeset of the data modifications using OpenAI API :)

CreditCardType

Can only be Visa, Mastercard, AMEX, Discover

- Solution:
- Asking Chat GPT to format the unique CreditCardTypes into the pre-approved list of values

credit_card_prompt =[ 
    {
      "role": "system",
      "content": """
      You will be given a comma separated list of items. 
      Each item is supposed to be a unique credit card type taken from a column on a database table. 
      The only acceptable credit card types are [Visa, Mastercard, AMEX, Discover]
      Your job is to examine each item in the list to see if it matches one of the acceptable credit card types or not.
      For each item in the list that is not ALREADY in the list of acceptable credit card types you will need to provide which credit card type it matches with. 
      If a item doesn't match ANY of the acceptable credit card types [Visa, Mastercard, AMEX, Discover] then match it with an empty string ''
      Format your response in JSON
      """
    },
    {
      "role": "user",
      "content": "Americn Ex,AMEX,Visa,Master car,Mastercard,Discover,Jazz"
    },
    {
      "role": "assistant",
      "content": """{
              "Americn Ex": "AMEX",
              "Master car": "Mastercard",
              "Jazz":""
          }
          """
    },
]
CreditCardCleaner = GptPrompt(messages=credit_card_prompt)
values_to_replace = CreditCardCleaner.call_gpt(to_csl(gifts['CreditCardType']))
values_to_replace
{'American Ex': 'AMEX', 'Master card': 'Mastercard', '': ''}
gifts['CreditCardType'] = gifts['CreditCardType'].replace(values_to_replace)

GiftType

Can only be Cash, Check, Credit, Other, or Reversing Transaction

- Solution:
- Asking Chat GPT to format the unique GiftTypes into the pre-approved list of values
- Replace ‘Reversing Transaction’ using simple logic in Python

gift_type_prompt = [
        {
          "role": "system",
          "content": """
          You will be given a comma separated list of items. 
          Each item is supposed to be a unique payment method taken from a column on a database table. 
          The only acceptable payment methods are [Cash, Check, Credit, or Other]
          Your job is to examine each item in the list to see if it matches one of the acceptable payment methods or not. If it doesn't map to 
          For each item in the list that is not ALREADY in the list of acceptable payment method  you will need to provide which payment method  type it matches with
          Format your response in JSON
          """
        },
        {
          "role": "user",
          "content": "$,cash,Credit,AMEX,Square"
        },
        {
          "role": "assistant",
          "content": """{
                  "$: "Cash",
                  "cash: "Cash",
                  "AMEX":"Credit",
                  "Square":"Other",
                  
              }
              """
        },
      ]
GiftTypeCleaner = GptPrompt(messages=gift_type_prompt)
values_to_replace = GiftTypeCleaner.call_gpt(to_csl(gifts['PaymentMethod']))
values_to_replace[''] = 'Other'
values_to_replace
{'PayPal': 'Other',
 'check': 'Check',
 'cash': 'Cash',
 'credit card': 'Credit',
 'money order': 'Other',
 '': 'Other'}
gifts.apply(lambda row: 'Reversing Transaction' if row['AmountReceived'] < 0 else values_to_replace[row['PaymentMethod']], axis=1)
0                     Other
1                     Check
2                      Cash
3                    Credit
4                      Cash
5                     Other
6                     Check
7                     Check
8                     Check
9                    Credit
10                   Credit
11                    Other
12                    Check
13    Reversing Transaction
14                    Other
15                    Other
16                     Cash
17                    Other
18                    Other
19                    Other
20                    Check
21                   Credit
22                     Cash
23                    Other
24                    Other
25                     Cash
26    Reversing Transaction
27                   Credit
28                     Cash
29                    Other
30                   Credit
dtype: object

Pandas GPT Solutions

ContactType

`is required and can only be Household or Organization

- Source Table: Contacts Table
- Solution:
- Create procedure to add new column ContactType

df = contacts.copy()
gpt_response = df.ask("create a new column called ContactType. The value is required and can only be either Household or Organization. If CompanyName is '' assume it's a household")
gpt_response[['Number', 'CompanyName', 'ContactType']].head(5)
Number CompanyName ContactType
0 653377813-7 Household
1 390551098-7 Household
2 093004505-X Household
3 729707142-0 A Company Co. Organization
4 488464926-5 Household

Postal Code

if address is present and is US, must be a valid zip code, either 12345 or 12345-1234

- Source Table: Contacts - Solution: - Create procedure to remove any postal codees that doesn’t match the approved format from the usps

contacts[['Postal']]
Postal
0 20535-871
1 89130
2
3 8104
4 49560
5 837016
6
7 30066
8 68164
9
10
df = contacts.copy()
gpt_response = df.ask("Clean the Postal Column. If address is present and is US, must be a valid zip code, either 12345 or 12345-1234. Don't delete rows with an invalid zip, just replace the invalid zip with ''")
gpt_response[['Postal']]
Postal
0
1 89130-
2
3
4 49560-
5
6
7 30066-
8 68164-
9
10

IsDeceased

can only be TRUE or FALSE

- Source Table: Contacts
- Solution:
- Create procedure to update Deceased to TRUE/FALSE

gpt_response.Deceased.unique()
array(['', 'No', 'Yes'], dtype=object)
df = contacts.copy()
gpt_response = df.ask('Can you convert the Deceased column to a boolean. Assume empty strings '' are False')
gpt_response.Deceased.unique()
  Number CompanyName  ... SecondaryLegacyIndividualId ContactName
0      1         ABC  ...                                        
1      2         DEF  ...                                        
2      3         GHI  ...                                        
3      4         JKL  ...                                        
4      5         MNO  ...                                        

[5 rows x 17 columns]
array([False,  True])

Export

import nbdev
nbdev.nbdev_export('03_Gpt_Solution.ipynb')