Skip to the content.

Improvement of a Ticketing System using Machine Learning

Master-Praktikum: Machine Learning for Information Systems Students

UCC Munich - Technische Universität München (TUM)




  1. select only the columns that we actually need from tickets.csv
  2. match the ticket ID from tickets.csv with the corresponding ticket file and read it in
  3. extract the first message of the conversation from the ticket file
  4. clean the message of any weird characters
  5. filter out NaN values and empty strings and such in the dataframe
  6. detect the message language
  7. translate English messages to German via Google Translate’s API


ML Model

Model Structure:

  1. first we convert the text into a token count vector: CountVectorizer – this removes some “stop words” like und or die and makes everything lowercase in the process
  2. then we apply tf-idf (term-frequency times inverse document-frequency) to these counts via sklearns TfidfTransformer – with this we identify important words that set the ticket apart from the others
  3. finally, we use the Naive Bayes classifier MultinomialNB to classify the input data onto discrete classes (the operator or the ticket category)