Using AI in Research and Teaching: A Practical Guide

[N.B. I gave a version of this post for an internal discussion at SOM. I’ve modified it for general consumption. Many of the examples I walked through live – I will try to add videos for those, in the future, but for now they will just sit as “examples” in the text.]

The goal for today’s post is to highlight a number of useful ways that you can use AI in your research and teaching. I’m using AI super loosely here – what I really mean is the relatively new innovations of Large Language Models that have sprung up following the release of ChatGPT 3.

Today, I will cover a few topics:

  1. VS Code + Copilot installation
  2. Examples using Chat LLMs for research and teaching
  3. Images from Midjourney
  4. Running your own LLM locally

VS Code + Copilot

A big chunk of what I want to show you revolves around VS Code. This is a free code editor that is quite powerful and has a lot of extensions that make it a great choice for coding. Benefits include:

  1. Free
  2. Good IDE with broad support
  3. Built-in Git that is quite good (see next post)
  4. Great remote access for servers
  5. Copilot (LLM for inline code suggestions and chat program)
  6. Extensions with open community (good for many software platforms, also alows extensions in CodeGPT)

I wrote a blog post last year on setting this up, but I’ll walk through it here as well. Much of my version is in Mac, so sorry if you’re a Windows user!

  1. Install VS Code
  2. Setup Extensions

    Extensions are first- and third-party additions to VS Code that give additional functionality. Often, these will give improved code highlighting such as for Stata, R, or LaTeX. And, as we will see in a second, they also open up a collaborative LLM that helps you code (and will chat with you to fix and explain code). Some useful language-specific extensions:

    • LaTeX

      img

      Then you’ll have a similar LaTeX environment to many of the LaTeX editors, but with the added benefit of being able to use an LLM to help you write LaTeX code.

      img

    • Python
      • Python, Pylance, Python Debugger, autopep8, Jupyter
    • R I now use this instead of RStudio1 (which has its own LLM, but I prefer the VS Code environment)
    • Stata
      • stataRun, Stata Enhanced
      • This works in Mac OS well, not so well in Windows2
      • Only modifications that I had to do to get this working was to go into the stataRun extension and change the program from StataSE to StataMP (you can edit your preferences for an extension by going to the extension and clicking the little gear next to where the install/uninstall button is):

        stataRun

      • Historically, Stata was not as good in LLMs as other language. Owen Ozier wrote about this but my sense is that has changed a lot in the last year.
      • Useful keystrokes:
        • shift+cmd+a: Run All Lines → This will run the entire file
        • shift+cmd+s: Run Selection → To run a given selection. If no selection is made it default to Run All
        • shift+cmd+c: Run Current Line → This will run the current line code
        • shift+cmd+d: Run Downward Lines → from current line to end
        • shift+cmd+f: Run Frontward Lines → from first line to current line
    • Remote
      • Makes it very easy to work on remote servers, like high performance clusters
  3. Install Copilot + Chat
  4. [Optional] You can setup other preferences such as:
    • Theme
    • Font
    • Keybindings

Some examples of how Copilot is awesome

  1. R

    One easy task is to define a scripted program that does a particular task, such as downloading the population for states and making a plot. To try this out, write the following in a comment at the top of a blank R script:

     ## This program downloads population counts for all states in the US in 2018.
    
     ## It then makes a bar chart for each state, with states on the y-axis, 
     ## and population on the x-axis, from largest to smallest state.
    
     ## It uses the tidycensus package to download the data, 
     ## and ggplot2 to make the bar charts.
    

    This approach requires the tidycensus package installed (and ggplot2).

    After you enter this comment, hit return twice, and wait for Copilot to propose a line of code. Then hit tab to accept the proposals. As you move across lines, it will write out the code.

    One interesting fact — you will have to hit return more than once, because it wants the proper line breaks!

    alt text

  2. Latex

    This is where things got even more amazing. Consider the writing from a recent paper that we just got published. I can open a new latex document, and enter our opening paragraph, and then the first two sentences. Github Copilot proposes the rest of the paragraph. It’s not even a totally unreasonable proposal! [It’s not correct, in our setting, but it’s still a remarkable attempt!]

    alt text

    I’ve had other settings where it will propose explanations for regression tables that are not even half-bad. It’s a pretty killer tool to have for someone who struggles to get words down on the page (like me) — the words may not be great, but then you can edit them and fix it up. Pretty remarkable.

  3. Explaining code

    Often, you’ll inherit code (or read coauthors’ code) that you don’t understand. Copilot can help explain what the code is doing. For example, consider the following SAS code (from WRDS):

     /* ************************************************************ */
     /* This code calculates Bond Returns using FISD and Trace Data  */
     /* Should Run "Clean_Trace_Standard" and "Clean_Trace_Enhanced" */ 
     /*             prior to running this code 			            */
     /* Author: Rabih Moussawi, Qingyi (Freda) Drechsler             */ 
     /* Date: November 2018											*/
     /* Updated: April 2022                                          */
     /* Current Version: v9                                          */
     /* ************************************************************ */;
    
     <Comments removed by PGP>
    
     /* ************************************************ */
     /* Step 1:                                          */  
     /* Combine cleaned Standard and Enhanced TRACE Data */
     /* ************************************************ */
    
     /* Get Trace Enhanced Cutoff Date */
     proc sql noprint;
     select distinct max(TRD_EXCTN_DT) format date9. into :cutoffdate
     from tracee.trace_enhanced;
     quit;
    
     %put &cutoffdate;
    
     /* Enhanced TRACE has an 18m lag and post the CutOffDate use data directly from standard TRACE */
     data trace_std; set trace_standard_clean; 
     where trd_exctn_dt > "&cutoffdate"d and cmsn_trd ne 'Y' and rptd_pr>0;
     SEQNUM = msg_seq_nb;
     drop msg_seq_nb orig_msg_seq_nb;
     run;
    
     data trace_enh; set trace_enhanced_clean;
     where cmsn_trd ne 'Y' and rptd_pr>0;
     SEQNUM=input(msg_seq_nb,best12.); 
     label SEQNUM="Message Sequence Number";
     drop msg_seq_nb orig_msg_seq_nb;
     run;
    

    You may not know SAS very well (or at all). You can ask Copilot to explain what this code does by selecting the text alt text

    Then, you can even ask it to write similar code in Python’s Pandas (or whatever language you like): alt text

Some more examples (videos to come):

  1. Example prompt/script for scraping data
  2. Converting HTML to markdown
  3. Creating comments for code

Copilot vs. CodeGPT

Known issues I’ve had with VS Code

  1. For Mac users, launch vscode from the command line rather than from Applications – the PATH variable isn’t the same for each, and many of the relevant programs are on the command line PATH variable. You need to install the code command for VS Code for this.
  2. Sometimes the LaTeX compiler or Python interpreter can’t be found. This is a PATH issue.
  3. Sometimes httpgd messes up inside VS Code. I often will have the graphs in a separate window, instead, but can be annoying.

Chat LLMs for Research and Teaching [Living Document]4

I am going to walk through a number of use cases here. There are many more, and I still don’t use these LLMs as effectively as I should. For any of these tasks below, I have in mind that you would use the chat interface for ChatGPT or Claude. I pay for both (!) because I find them useful for different things. You should pay for it!5 There are a few things below that are distinctive (Midjourney, for example) that are separate from ChatGPT and Claude.

  1. Write script to scrape data
    • Example: Pull NBER Working paper codes
    • Example: pull all journal of finance brattle prize winners. Look at how citation counts correlate with probability of winning
  2. Construct a make file for a project
  3. Quiz questions for class -> mapping to qualtrics Prompt for class:
     I'm writing a multiple choice short quiz for a class called 
     "Investment Management" for MBA students.
     You're going to help me write the quizzes. 
     My questions look like this:
    
     `latex
    
     \question The CAPM model is a single factor model. What is the factor? 
     \begin{choices}
     \choice The risk-free rate
     \correctchoice The market risk premium
     \choice The market return
     \choice The market volatility
     \end{choices} 
     `
    
     Generally, I want you to ask questions that probe understanding and
     require students to use logic, not just memory. Feel free to use 
     situational questions like "Imagine that...". Please acknowledge.
    

    Now we’ll give it my materials from a day of class:

     Let's write five questions evaluating the material 
     in these slides that are attached.
    

    I currently manually copy this into a Qualtrics survey, but figuring out how to automate that process as well (takes about 3 minutes, not a huge deal, but would be nice to do automatically). Would be even easier to do with Canvas, as it allows for QTI format (which can be converted from Markdown).

  4. Making clip art for slides

    Often, it is nice to make “clip art” for slides. Either title slides, or simply decoration to make your slides nicer (and more engaging) without adding text.
    I use Midjourney to construct this type of “art.”

    An example prompt:

    a researcher being helped by a robot to do programming in sepia tone --ar 1:1

    alt text

    Example slides here

    Here are some tutorials on how to use Midjourney:

  5. Chatbots and their various forms

    Open AI has ChatGPT.

    Anthropic has Claude.

    Others include Gemini, Mixtral, and many alternatives (open source too). There are many to use. If you want to use your own, you need to set it up locally. This can be non-trivial, especially for the powerful ones. Would be good to have more of an expert chime in, but here are a few fun exmaples:

    1. https://ollama.com/
    2. https://github.com/simonw/llm
    3. https://github.com/npiv/chatblade

    For example, here’s an LLM running locally with minimal setup thanks to Ollama:

     $ ollama run llama3
     >>> What are the four horsemen of the apocalypse for an economist?
     What a great question!
    
     In economics, the "Four Horsemen of the Apocalypse" refers to four major economic crises that had significant impacts on the global economy in the 20th century. These crises were:
    
     1. **The Great Depression (1929-1939)**: A severe and prolonged downturn in economic activity that affected many countries worldwide, leading to widespread unemployment, business failures, and a significant decline in international trade.
     2. **The Oil Shock of 1973 (1973-1974)**: A sudden and sharp increase in oil prices, triggered by an Arab-Israeli war, which led to a global energy crisis, high inflation, and a recession in many countries.
     3. **The Latin American Debt Crisis (1980s)**: A widespread default on international debts by several Latin American countries, which was exacerbated by high interest rates, currency fluctuations, and structural adjustment policies imposed by the
     International Monetary Fund (IMF).
     4. **The Asian Financial Crisis (1997-1998)**: A rapid and severe decline in asset values and a collapse of currencies in several East Asian economies, particularly Thailand, Indonesia, Malaysia, and South Korea, triggered by poor economic
     management, excessive debt, and capital flight.
    
     These crises were characterized by:
    
     * Sharp declines in economic activity
     * High levels of uncertainty and volatility
     * Significant changes in monetary and fiscal policies
     * International coordination and cooperation to address the crises
    
     Economists often use these four crises as case studies to analyze and learn from the causes, consequences, and responses to such events.
    

    But there are good and real research applications that you should consider, especially once combined with tuning.

  6. Making your evaluations more paletable (and making evals more useful than just scores)

    Here’s two sets of evals from my courses: Investment Management and Applied Empirical Methods.

    Let’s go through an evaluation and ask it to digest it for us:

         I just got course evaluations for my class, Investment Management. I've attached the PDF of the reviews. Can you summarize and read through and give me the highlights? I'd like to hear positive stuff, to feel good, and also any constructive feedback I can take.
    
     Any particularly nice comments that I should add to my file of "feel good" comments?
    
     Can you give me any suggestions on how I can address the concerns raised by the students?
    
     Can you do the same now for these reviews for my PhD class?
    
  7. Converting math to code

    Let’s write some math on the board and ask it to convert it to LaTeX code.

  8. Convert tables in papers to graphs (almost) instantly

    Let’s take a table from a paper and ask it to convert it to a graph. alt text alt text alt text alt text alt text

    I then futzed around with it in Claude (asked it to rotate, etc.) and got this: alt text

    Pretty great for making new figures for an MBA class!

  1. I still use RStudio occasionally, but I find it to be slow to startup and I can’t have multiple projects open at the same time. This is because VS Code actually scales linearly with the number of projects you have open, while RStudio does not. I usually have 10-15 projects left open, and they’re always quick to start back up when I restart VS Code. 

  2. See the discussion in the extension. To quote the package maintainer: “Installation instructions for Windows basically follows the original Atom stata-exec building manual with some modifications for code.” But the list of instructions looks challenging… I promise you it’s worth it! 

  3. Thanks to Joshua Mask for introducing this to me! 

  4. This list is a work-in-progress. I will be adding to it as I think of more examples. If you have any suggestions, please let me know! 

  5. Often, there are educational discounts or your university pays for an account already!