ChatGPT Code Interpreter in Data Analytics: Data Anonymization and Revenue Analysis

​​ChatGPT has a wide range of potential applications, including data analytics. Additionally, Open AI has recently made the Code Interpreter extension available to ChatGPT Plus subscribers. Code Interpreter allows users to execute Python code directly within the chat interface. You can also upload files and make the model work with them.

scandiweb Analytics team recently explored the Code Interpreter extension for ChatGPT. This article will highlight our key findings and show how to anonymize data before feeding it to the model. We’ll also provide you with ideas for prompts for revenue analysis based on Magento (Adobe Commerce) sales export data.

Data anonymization

Sharing sensitive personal information with ChatGPT or any Artificial Intelligence (AI) poses potential risks. First, Chat GPT chats can be accessed and read by its employees. Secondly, while AI systems and the companies that run them take measures to protect data, no system is entirely immune to breaches or unauthorized access. Personal information could be exposed if security measures are somehow compromised.

That is why you should anonymize PII (personally identifiable information). For eCommerce sites, this information can be user data (e.g., names, surnames, emails) or records displaying who the data belongs to (e.g., hostname, URLs, and sometimes product names).

Below you can find the data anonymization process with Python with the example of Magento (Adobe Commerce) backend sales data export.

  1. Load the libraries you will use.
import pandas as pd

import hashlib
  1. Define the function for hashing columns. A hashed column will get appended to the data frame. If ChatGPT finds interesting connections with hashed values, you can find the unhashed related value in Google Colab. Alternatively, you can define a function that will replace a column with hashed values.
def hash_column(df, column_name):
    # Make a copy of the DataFrame to avoid altering the original
    df_copy = df.copy()
    # Create a new column in the DataFrame that is a hashed version of the original column
    df_copy[column_name + '_hashed'] = df_copy[column_name].apply(lambda x: hashlib.sha256(str(x).encode()).hexdigest())
    return df_copy
  1. Load your data and display several top rows to see it.
df = pd.read_csv('/your-path/your_file.csv')
df.head(3) #looking at the data
  1. To ease the work with data, drop the extra columns that will not be used and may contain personal data. We will not use the columns of Billing AddressBill-to Name, Shipping Information, Customer Name, Ship-to Name, and Shipping Address.
df = df.drop(columns=['Billing Address', 'Bill-to Name','Shipping Information','Customer Name','Ship-to Name' ,'Shipping Address'])
  1. Add hashed columns and create a new data frame.
df_hashed = hash_column(df, 'Purchase Point')
df_hashed = hash_column(df_hashed, 'Customer Email')
df_hashed = hash_column(df_hashed, 'Inventory Source')

  1. Drop the unhashed columns with PII you duplicated in the hashed format.
df_hashed = df.drop(columns=['Purchase Point', 'Customer Email', 'Inventory Source'])
  1. Finally, you can download the data.

What’s next?

After anonymizing the data, you can feed it to ChatGPT, ask questions, and replicate the code in Google Colab using the unhashed values.

Revenue analysis

Revenue analysis provides valuable insights into the financial performance of a business. By analyzing data related to revenue, companies can better plan their budgets and make informed decisions about resource allocation.

For example, revenue analysis can inform customized email marketing campaigns tailored to individual customer preferences and purchase histories. By understanding customer behavior and preferences, businesses can create more effective marketing campaigns that are more likely to increase revenue.

When working with Code Interpreter, you can get creative and ask it any questions. However, if you need inspiration, here are some prompts for analyzing revenue-related data

Note that our data is a Magento (Adobe Commerce) sales export. 

You can also ask the model for additional interpretations and recommendations.

Prompts for revenue analysis

  1. Provide a histogram with the distribution of avg. order value using the values of the ‘Grand Total (Purchased)’ column. Clean outliers
  2. Provide a statistical summary for the revenue using the ‘Grand Total (Purchased)’ column: count, mean, standard deviation, minimum, 25th percentile, 50th percentile, 75th percentile, maximum
  3. Provide a doughnut chart with the shares of the number of purchases performed by customers. Display the percentage share just for top-4 values and show the rest as “Other”. Display the legend separately on the right
  4. What are the top 50 customers? Display a vertical bar chart with the value of “Grand Total (Purchased)”
  5. Provide a statistical summary of an average customer and of top-50 customers: avg. number of purchases, avg. order value, avg. first purchase value, avg. LTV
  6. What are the shares of order statuses?
  7. What percentage of revenue is lost on canceled orders?
  8. What shares of revenue are brought by each customer group? Provide a pie chart. Display the percentage share just for top-4 values and show the rest as “Other”.
  9. How does the first purchase value correlate with LTV? Use the following columns: ‘Purchase Date’, ‘Grand Total (Purchased)’, ‘Customer Email_hashed’
  10. Which payment methods were used most frequently, and did any particular payment method show a correlation with higher sales?

Revenue analysis is an important tool for businesses looking to optimize their financial performance and improve their bottom line. By leveraging data analytics tools like Chat GPT’s Code Interpreter, companies can gain valuable insights into their revenue streams and make data-driven decisions that drive growth and success.


ChatGPT’s Code Interpreter for data analytics has many potential applications, including revenue analysis. However, remember to take precautions when sharing sensitive personal information or data with any AI systems! One way to mitigate risks is anonymizing data before feeding it to the model. 

With the help of prompts for revenue analysis based on Magento (Adobe Commerce) sales export data, businesses can gain valuable insights into their financial performance and make data-driven decisions that drive growth and success.

Does your eCommerce business need a proper data & analytics setup to make sense of your data and use it for business growth? Learn more about the BI & Actionable Analytics services we offer, or get in touch to speak with an Analytics Expert!

If you enjoyed this post, you may also like