Prueba

Information Extraction from Emails via AI NLP

//Arteco - Tecnologías de la información
  • :)
  • :0
  • :D
  • ;)
  • :]
foto Ramón Arnau

Ramón Arnau

Gerente de Arteco Consulting SL

In this article, we will teach you how to use powerful AIs to process emails with natural language for automatic integrations

In this article, we will explore how to use an open-source large language model (LLM) in a completely confidential manner along with Python to process emails and extract structured information. We will focus on creating a system that reads emails and automatically creates hotel bookings based on the received confirmations, using artificial intelligence locally from a pre-trained model available from the community.

What is an LLM and why use it to process emails?

Large Language Models (LLMs) are advanced artificial intelligence systems capable of understanding and generating text in a human-like manner. By using an open-source LLM to process emails, we can:

  1. Interpret the natural language of emails
  2. Accurately extract relevant information
  3. Adapt to different email formats and styles
  4. Handle variations in the way information is presented

Setting Up the Development Environment

To get started, we need to set up our Python environment and install the necessary libraries:

pip install transformers torch pandas

If you encounter an error, try using pip3. We also recommend using virtualenv to keep your computer directories as clean as possible.

Selection and Implementation of the LLM

For this project, we will use the ROBERTA model, which is open source and performs well in natural language processing tasks:

from transformers import pipeline
model_name = "deepset/roberta-base-squad2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

Processing Emails and Extracting Information

This model is capable of answering questions about a context, so we will prepare a function that will iterate over the questions we want to ask about the email:

def extract_info_from_email(email_content):
    questions = [
        "What is the hotel name?",
        "What is the check-in date?",
        "What is the check-out date?",
        "How many adults are staying?",
        "How many children are staying?"
    ]

    results = {}

    for question in questions:
        QA_input = {
            'question': question,
            'context': email_content
        }
        res = nlp(QA_input)
        results[question] = res['answer']

    return results
    

The operation is very simple: we iterate over each of the predefined questions that will be evaluated against the textual content of the email, which is passed as an argument to the function. At the end of this process, the results variable will contain the output from the pre-trained artificial intelligence large language model.

Creating Hotel Reservations from Extracted Information

Continuing with the example, we will convert the model's predicted responses into a structure that we can process. For example, a hotel reservation class:

class HotelReservation:
    def __init__(self, hotel, check_in, check_out, adults, children):
        self.hotel = hotel
        self.check_in = check_in
        self.check_out = check_out
        self.adults = adults
        self.children = children
        

Test Data as Emails

For testing purposes, we will use emails written in natural language, such as the following:

Dear Guest,

We are pleased to confirm your reservation at Hotel Sunset. Your check-in date is July 15, 2023, and your check-out date is July 20, 2023. The reservation is for 2 adults and 1 child.

Thank you for choosing Hotel Sunset. We look forward to welcoming you.

Best regards, Hotel Sunset Reservations Team

If the model is sufficiently sized and trained, it should be able to easily respond to the questions posed about this text.

Implementing the Complete System

The process is very straightforward. Let’s see how the complete system would look with all the previously mentioned functions:

from transformers import pipeline

model_name = "deepset/roberta-base-squad2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

# Definir la función para extraer información del contenido del email
def extract_info_from_email(email_content):
    questions = [
        "What is the hotel name?",
        "What is the check-in date?",
        "What is the check-out date?",
        "How many adults are staying?",
        "How many children are staying?"
    ]
    
    results = {}
    
    for question in questions:
        QA_input = {
            'question': question,
            'context': email_content
        }
        res = nlp(QA_input)
        results[question] = res['answer']
    
    return results

# Definir la clase de reserva de hotel
class HotelReservation:
    def __init__(self, hotel, check_in, check_out, adults, children):
        self.hotel = hotel
        self.check_in = check_in
        self.check_out = check_out
        self.adults = adults
        self.children = children

# Definir la función para crear una reserva a partir del contenido del email
def create_reservation_from_email(email_content):
    info = extract_info_from_email(email_content)
    
    return HotelReservation(
        hotel=info.get("What is the hotel name?", "Unknown"),
        check_in=info.get("What is the check-in date?", "Unknown"),
        check_out=info.get("What is the check-out date?", "Unknown"),
        adults=int(info.get("How many adults are staying?", "0") or "0"),
        children=int(info.get("How many children are staying?", "0") or "0")
    )

import pandas as pd

# Definir la función para procesar los emails y crear reservas
def process_emails_and_create_reservations(emails):
    reservations = []
    
    for email in emails:
        reservation = create_reservation_from_email(email)
        reservations.append(reservation)
    
    return pd.DataFrame([vars(r) for r in reservations])

# Ejemplo de uso
emails = [
    """
    Dear Guest,
    We are pleased to confirm your reservation at Hotel Sunset. 
    Your check-in date is July 15, 2023, and 
    your check-out date is July 20, 2023. 
    The reservation is for 2 adults and 1 child.
    Thank you for choosing Hotel Sunset. 
    We look forward to welcoming you.
    Best regards,
    Hotel Sunset Reservations Team
    """,
    """
    Dear Guest,
    We are excited to confirm your booking at Playa Resort. 
    Your check-in date is August 1, 2023, and 
    your check-out date is August 7, 2023. 
    The reservation includes 3 adults and 0 children.
    We appreciate your choice of Playa Resort 
    and can't wait to have you with us.
    Best wishes,
    Playa Resort Booking Team
    """
]

reservations_df = process_emails_and_create_reservations(emails)
print(reservations_df)

Execution Result

The process will require some initial steps that are performed only once, such as downloading the LLM model from the internet to host it locally. After downloading, it will be available for subsequent executions. The process maintains data privacy since everything happens locally without sharing information outside of your computer.

If the execution is successful, you will see an output similar to the following:

python3 main.py
hotel        check_in       check_out  adults  children
0  Hotel Sunset   July 15, 2023   July 20, 2023       2         1
1  Playa Resort  August 1, 2023  August 7, 2023       3         0

And in such a simple way, we can incorporate automated processes for data ingestion, leading to cost reduction and fewer errors in typing.

Conclusions and Next Steps

In this article, we explored how to use an open-source LLM along with Python to automate the process of creating hotel reservations from confirmation emails. This approach can significantly improve efficiency and accuracy in reservation management.

Some next steps could include:

  1. Improving model accuracy with fine-tuning
  2. Implementing error handling and data validation
  3. Integrating the system with a reservation database
  4. Developing an API interface for integration with other tools

With these tools and techniques, you can create intelligent systems that automate complex information processing tasks, saving time and reducing errors in various business contexts.

If you think solutions like these, but more advanced, could help improve processes in your organization, feel free to reach out to us.

Mantente Conectado

Newsletter

¡Mantente al día con lo último en tecnología y negocios! Suscríbete a nuestra newsletter y recibe actualizaciones exclusivas directamente en tu correo.

Reunión Online

No dejes pasar la oportunidad de explorar nuevas posibilidades. ¡Agenda una reunión online con nosotros hoy y comencemos a construir juntos el futuro de tu negocio!

  • :)
  • :0
  • :D
  • ;)
  • :]