= os.getenv("OPENAI_API_KEY") openai.api_key
GPT Solution
Configuring OpenAI Credentials
GptPrompt
GptPrompt (messages)
Class For sending messages to Open AI using GPT-3.5 Turbo 16k 0613
OpenAI Solutoins
Solutions to a subeset of the data modifications using OpenAI API :)
CreditCardType
Can only be Visa, Mastercard, AMEX, Discover
- Solution:
- Asking Chat GPT to format the unique CreditCardTypes into the pre-approved list of values
=[
credit_card_prompt
{"role": "system",
"content": """
You will be given a comma separated list of items.
Each item is supposed to be a unique credit card type taken from a column on a database table.
The only acceptable credit card types are [Visa, Mastercard, AMEX, Discover]
Your job is to examine each item in the list to see if it matches one of the acceptable credit card types or not.
For each item in the list that is not ALREADY in the list of acceptable credit card types you will need to provide which credit card type it matches with.
If a item doesn't match ANY of the acceptable credit card types [Visa, Mastercard, AMEX, Discover] then match it with an empty string ''
Format your response in JSON
"""
},
{"role": "user",
"content": "Americn Ex,AMEX,Visa,Master car,Mastercard,Discover,Jazz"
},
{"role": "assistant",
"content": """{
"Americn Ex": "AMEX",
"Master car": "Mastercard",
"Jazz":""
}
"""
}, ]
= GptPrompt(messages=credit_card_prompt) CreditCardCleaner
= CreditCardCleaner.call_gpt(to_csl(gifts['CreditCardType'])) values_to_replace
values_to_replace
{'American Ex': 'AMEX', 'Master card': 'Mastercard', '': ''}
'CreditCardType'] = gifts['CreditCardType'].replace(values_to_replace) gifts[
GiftType
Can only be Cash, Check, Credit, Other, or Reversing Transaction
- Solution:
- Asking Chat GPT to format the unique GiftTypes into the pre-approved list of values
- Replace ‘Reversing Transaction’ using simple logic in Python
= [
gift_type_prompt
{"role": "system",
"content": """
You will be given a comma separated list of items.
Each item is supposed to be a unique payment method taken from a column on a database table.
The only acceptable payment methods are [Cash, Check, Credit, or Other]
Your job is to examine each item in the list to see if it matches one of the acceptable payment methods or not. If it doesn't map to
For each item in the list that is not ALREADY in the list of acceptable payment method you will need to provide which payment method type it matches with
Format your response in JSON
"""
},
{"role": "user",
"content": "$,cash,Credit,AMEX,Square"
},
{"role": "assistant",
"content": """{
"$: "Cash",
"cash: "Cash",
"AMEX":"Credit",
"Square":"Other",
}
"""
}, ]
= GptPrompt(messages=gift_type_prompt) GiftTypeCleaner
= GiftTypeCleaner.call_gpt(to_csl(gifts['PaymentMethod']))
values_to_replace ''] = 'Other' values_to_replace[
values_to_replace
{'PayPal': 'Other',
'check': 'Check',
'cash': 'Cash',
'credit card': 'Credit',
'money order': 'Other',
'': 'Other'}
apply(lambda row: 'Reversing Transaction' if row['AmountReceived'] < 0 else values_to_replace[row['PaymentMethod']], axis=1) gifts.
0 Other
1 Check
2 Cash
3 Credit
4 Cash
5 Other
6 Check
7 Check
8 Check
9 Credit
10 Credit
11 Other
12 Check
13 Reversing Transaction
14 Other
15 Other
16 Cash
17 Other
18 Other
19 Other
20 Check
21 Credit
22 Cash
23 Other
24 Other
25 Cash
26 Reversing Transaction
27 Credit
28 Cash
29 Other
30 Credit
dtype: object
Pandas GPT Solutions
ContactType
`is required and can only be Household or Organization
- Source Table: Contacts Table
- Solution:
- Create procedure to add new column ContactType
= contacts.copy()
df = df.ask("create a new column called ContactType. The value is required and can only be either Household or Organization. If CompanyName is '' assume it's a household")
gpt_response 'Number', 'CompanyName', 'ContactType']].head(5) gpt_response[[
Number | CompanyName | ContactType | |
---|---|---|---|
0 | 653377813-7 | Household | |
1 | 390551098-7 | Household | |
2 | 093004505-X | Household | |
3 | 729707142-0 | A Company Co. | Organization |
4 | 488464926-5 | Household |
Postal Code
if address is present and is US, must be a valid zip code, either 12345 or 12345-1234
- Source Table: Contacts - Solution: - Create procedure to remove any postal codees that doesn’t match the approved format from the usps
'Postal']] contacts[[
Postal | |
---|---|
0 | 20535-871 |
1 | 89130 |
2 | |
3 | 8104 |
4 | 49560 |
5 | 837016 |
6 | |
7 | 30066 |
8 | 68164 |
9 | |
10 |
= contacts.copy()
df = df.ask("Clean the Postal Column. If address is present and is US, must be a valid zip code, either 12345 or 12345-1234. Don't delete rows with an invalid zip, just replace the invalid zip with ''")
gpt_response 'Postal']] gpt_response[[
Postal | |
---|---|
0 | |
1 | 89130- |
2 | |
3 | |
4 | 49560- |
5 | |
6 | |
7 | 30066- |
8 | 68164- |
9 | |
10 |
IsDeceased
can only be TRUE or FALSE
- Source Table: Contacts
- Solution:
- Create procedure to update Deceased to TRUE/FALSE
gpt_response.Deceased.unique()
array(['', 'No', 'Yes'], dtype=object)
= contacts.copy()
df = df.ask('Can you convert the Deceased column to a boolean. Assume empty strings '' are False')
gpt_response gpt_response.Deceased.unique()
Number CompanyName ... SecondaryLegacyIndividualId ContactName
0 1 ABC ...
1 2 DEF ...
2 3 GHI ...
3 4 JKL ...
4 5 MNO ...
[5 rows x 17 columns]
array([False, True])
Export
import nbdev
'03_Gpt_Solution.ipynb') nbdev.nbdev_export(