SIFSearch Progress & Developments

Authors: Julia Chen, Ravi Panguluri, & Kaushik Vejju

Introduction

Since last semester, we’ve worked on designing and building out SIFSearch, the internal search engine for the Smith Investment Fund. During this time, we’ve introduced new features to this application and have refactored much of its functionality to ensure a modular and scalable implementation. In the following sections, we’ll be covering the changes to SIFSearch, beginning with its new design, followed by a discussion of this application’s key features.

SIFSearch Implementation Changes

In the first SIFSearch blog post, we mentioned the critical role that the Django framework played in this application, serving as the home for both the frontend and backend code. Given that it follows the Model-View-Template (MVT) pattern, Django provided us with the ability to define SIFSearch’s data, design a backend API, and render HTML templates to be displayed on the client-side. These capabilities were useful in our initial development of SIFSearch, but we did run into limitations, which we’ll be discussing in the following sections.

Frontend Changes

Earlier, we were using Django’s templating features to create SIFSearch’s user interface (UI). Given that the front-end wasn’t too complex at the time, this was a solid choice. Aside from their ease-of-use, Django templates allowed Python code to be embedded into HTML, allowing us to communicate back-end data to the front-end immediately. However, the limitations of the

Django templates became apparent as SIFSearch’s user-interface became more interactive. Adding additional front-end features became more tedious with Django templates, as there was a lack of support for reusable UI components and creating a Single Page Application (SPA) experience. The application also lacked modularity since the HTML templates directly embedded variables from the backend API, steering away from a decoupled design. In an attempt to resolve these issues, we pivoted to an alternative solution for the front-end: React.

Aside from the initial setup, redesigning SIFSearch’s UI through React was a great choice. From a developer perspective, writing and maintaining the front-end code became easier. In addition, the clear separation of the front-end and back-end allowed us to operate on each part of the application independently. Using React, we developed multiple UI components, including Login, Registration, and Upload. For components related to searching, such as a search bar, pagination, and a refinement list, we utilized InstantSearch.js, a library provided by Algolia.

Here’s a look at the revised frontend:

Backend Changes

Adjustments to the Django backend were also done. One notable change was a shift from SQLite to PostgreSQL for the database system. By default, a Django web application is configured with a lightweight SQLite database, which is optimal for read-only operations and websites with low traffic. While SIFSearch will need to write to the database when users are uploading new search entries, the number of writes are minimal compared to the read/search operations.

The limitations with SQLite arise when dealing with data representation. The core data of SIFSearch is the search entries a user uploads. Internally, a search entry is represented by the SearchEntry class, which includes fields such as name, description, link, file, and tag. SQLite supports basic types like strings, so the initial representation of a search entry was suitable. However, when we were designing a feature to include multiple tags to a user-uploaded search entry, we learned that SQLite does not support complex types, such as lists. Considering that we wanted this to be an important feature of our application, we decided to shift to PostgreSQL, due to its support of complex data types. PostgreSQL offers other benefits as well, including scalability and concurrency control, but these weren’t as important for our use case.

Another change to our Django backend was the creation of a User model, and methods for registering new Users and allowing them to login and logout of the application. Because we wanted to add an authentication system for SIFSearch, these changes were necessary. More details on the authentication system can be found in the New Features section, which discusses this feature in more detail.

Implementation Diagram

Here’s a diagram that provides a general overview of SIFSearch's implementation, with the new changes mentioned above.

The “URLS” component in the diagram above represents the path to a resource in the application. In Django, requests are handled by a urls.py file, which maps the request to a handler function in a views.py file (represented by the “VIEWS” component above). In addition, the search functionality has not been implemented by us. Instead, we are leveraging the Algolia API to perform search operations by reading from the PostgreSQL database.

New Features

Login, Registration, & Authentication System

Having a login system increases the security of an application and provides each user with custom experience. For these reasons, it was essential for us to create a login system for SIFSearch.

As mentioned in the previous section, the creation of a login system required us to define a new User model for the application. A User consists of the username, email, and password fields, all of which are represented as strings. Shown below is the class definition:

One important aspect of the code above is the extension of the AbstractUser class. Doing this allows us to create a custom User model specific to the Django application.

We also needed to write methods for handling operations performed on Users, such as logging in, registering, and logging out. These were written in a views.py file and were all implemented as POST functions.

For registering new users, it was especially important that the passwords were hashed. We accomplished this by implementing a custom UserSerializer class, which is shown below:

The create() method above overrides the default create() method of Django’s serializers.ModelSerializer class. In this implementation of create(), we are extracting the password from the rest of the user data, and invoke Django’s set_password() method to take care of hashing In addition, we’ve also made the password a write_only field, as shown in the Meta class, which contains meta-data about the UserSerializer. This choice acts as a security measure, preventing the User’s password, which is sensitive information, from being part of the serialized output. In other words, the password is only accepted as input and is not meant to be returned as a response.

On the front-end side, here are screenshots of the registration and login components:

Parsing Jupyter Notebooks

This feature was implemented by Abhinav Modugula, a previous SIF member who worked on SIFSearch last semester. Given that much of SIF’s research is written in Jupyter notebooks, Abhinav wanted to provide a way for users to search for code blocks within .ipynb files.

To implement this functionality, he made use of the nbformat package, which provides a set of APIs to work with Jupyter notebooks. Shown below is his code for extracting the code blocks from an uploaded .ipynb file:

Abhinav’s new feature required us to add an additional field to the SearchEntry model. Now, a SearchEntry would include a codeblock property (represented as a CharField). In the event a user doesn’t upload a Jupyter Notebook, the codeblock field is left as an empty string. Otherwise, it contains the exact text of the code block.

While the code block feature is convenient, it does present us with challenges to address. For one, potential storage issues could arise if the uploaded Jupyter notebooks contain a large number of code blocks. In addition, presenting the code blocks on the front-end is also challenging, as we’d want to present the code in the same way a text editor or IDE does rather than as a simple line of text. These are areas to consider as we continue our development of SIFSearch.

Next Steps

As we progress head into next semester, our goal is to have SIFSearch hosted for production. At the same time, we’d also like to develop new features and improve on existing ones. For one, we'd like to refine our authentication system, giving users the ability to login into SIFSearch with their social accounts (Google, Facebook, etc.) and provide a way to change and reset passwords. Additionally, we are working to implement JWT (JSON Web-Token) authentication to this application inorder to securely transmit information between the client-side and backend. Another feature we’re working on is searching with multiple tags. This requires changes to the SearchEntry model and additional configurations for the Algolia API, which we’ll need to look into further.

In all, this past Fall semester has been a great learning experience for us. We’ve built our understanding of Django even more and have worked on integrating technologies such as React and PostgreSQL into our application. We look forward to making more progress next semester and having SIFSearch be used by the team.

Previous
Previous

Sports Betting Strategies

Next
Next

Options Trading