Database Smart Web Tool (DABS) is a web tool that can compare, visualize and recommend various database management systems (DBMS) according to user (e.g. a developer) preferences and specific usages. The tool supports filtering databases by name, supported languages, implementation languages, operating systems, license types and project types.
It’s a smart platform that provides an interactive way to find the most suitable database solutions for any project development. DABS can facilitate, in an easy and accessible way, the process through which the users can discover information regarding databases. The user-friendly web interface offers a good experience for developers, but also for non-technical people.
No. | Description |
---|---|
NFR. 1 | Scalability - The system shall be able to handle a large volume of databases that keeps increasing. |
NFR. 2 | Security - All data inside the system or its part shall be protected against malware attacks or unauthorized access. |
NFR. 3 | Availability - The system shall be accessible for users 100% of the time. |
NFR. 4 | Performance - The system shall be able to determine how fast an application responds to commands (Speed and Response Time). When a user searches for a certain database or uses certain filters, the results must load within 0.2 seconds. When the "Compare" button is pressed, the result screen (pop-up) must load within 0.1 seconds. |
NFR. 5 | Maintainability - The system must be capable of being maintained cost-effectively over its expected lifetime. |
NFR. 6 | Usability & Accessibility - The system should be able to be used by as many people as possible. |
NFR. 7 | Capacity - The system should be to deliver sufficient functionality required by the end users. |
No. | Description |
---|---|
Start page: filters & recommendations | |
FR. 1 | The user should be able to enter data in the search bar. |
FR. 2 | The user should be able to search for a certain database by name. |
FR. 3 | The user should be able to filter the search based on the desired database characteristics. |
FR. 4 | The database filters must display information about: specific usages, paradigms, query languages, platforms and running environments and specific contexts, country of origin, open source status and implementation language. |
FR. 5 | In the Recommendation area, the system must display the most popular databases used or searched. |
Search results, database comparison | |
FR. 6 | The user should be able to select a maximum of 3 databases. |
FR. 7 | The user should be able to press the "Compare" button, for comparing the databases selected to the previous step. |
FR. 8 | A pop-up will be opened, showing a table with the comparison between the databases (at least 2). |
FR. 9 | The differences between the databases’ features will be coloured red. |
View databases | |
FR. 10 | The user is able to press on a specific database name from the table. |
FR. 10 | The system will automatically highlight the database name (with the "bold" option) when the user is pressing on it. |
FR. 10 | The system will display a pop-up with information regarding the selected database (a table, a text and a website reference). |
FR. 10 | The user is able to press on the website reference (ex: "Wikipedia") and will be taken to another website to read more details about the selected database. |
The design of the application was built using Miro[8], an online whiteboard for team collaboration, and the latest version can be accessed here (beware that the actual implementation might differ from designs).
A brief demo of the application can be seen in this video.
The main key of the Start Page will be the search bar and the filters that help the users choose the characteristics they are looking for.
Besides the name input, the user can select information about: implementation programming language
(C
, C++
,
etc.), programming language supported (Java
, Python
,
etc.), operating system (Linux
, Windows
,
etc.), project type (Commercial
, Open Source
,
etc.), and license (Apache v2
, BSD
,
etc.).
On the Most popular section of the Start Page will be displayed four of the most searched or used databases. Each one will be located in a different card, with some general information, so that the people that are not sure what is more suitable for their project will have the recommendation of what they can use.
Implementation outcome:
The user can either search after a keyword or choose some features from the dropdowns and toggles presented below the search bar:
After the user presses the Search button, a table will be shown with the results regarding the database name, programming languages used for implementation, programming languages supported, licenses, operating systems and project types.
Implementation outcome:
The user has the possibility to select a maximum of 3 databases for comparing them, by pressing the "Compare" button, after which a pop-up appears. In this pop-up, the selected databases are presented in a table format with differences between their features coloured with red.
This feature is post-MVP.
The user is able to press on a database name and a pop-up will be shown with the details of the selected database. In this pop-up, there’s also a table composed of 2 columns (the first column indicates the feature name and the second column is the particular feature description). In the right part of the pop-up is a text with some details about the database selected by the user and at the bottom of the pop-up there's a reference to various pages providing more information about the database (Homepage, Wikipedia, Source code - depending on what data we have available in the backend).
Implementation outcome:
The relational database management system chosen is PostgreSQL[1], also known as Postgres, due to our familiarity with it over the years. It is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance.
PostgreSQL features transactions with Atomicity, Consistency, Isolation, Durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is designed to handle a range of workloads, from single machines to data warehouses or Web services with many concurrent users. It is the default database for macOS Server and is also available for Windows, Linux, FreeBSD, and OpenBSD.
The dataset was provided directly by Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University, Andy Pavlo, who is one of the main maintainers of dbdb.io, the Database of Databases. Permission to use the dataset for this project has been granted.
The data was sent as a Django database dump archive, which contained a 257mb one-line JSON
file.
The format in which the data was received was not fit to be used by our application, so further operations were necessary in order to clean and restructure it.
Initially, we tried to format the one-line JSON
file to human-friendly JSON
formatting by using jq.
That didn’t really make much of a difference since the resulting JSON
[3]
file was still quite big,
almost impossible to open with the usual tools (Notepad
, Notepad++
, Visual
Studio
and even vi
).
Further, it was clear that the JSON
format was not proper for us to understand the data
structure and
content. The aim is to import the data into a database so it’s easy to query by any backend. Databases go
well with CSV
imports, so we converted the JSON
file into a CSV
file
using a simple Python
[4] script:
The resulting CSV
file contained about 1 million rows and 53 columns, out of which many turned
out to be
redundant, missing dependencies and/or not useful to us (we removed about 900.000 rows which were nginx logs
from the actual dbdb.io website). Importing the CSV
file to a PostgreSQL database was less
straight forward
than expected, but, using pgfutter and pgAdmin, it finally resulted in a new table.
The table structure after importing the CSV
consisted of a table with columns such as:
Working with a single table with 100.000 rows and 53 columns can be cumbersome, so, after some data exploration in the remaining dataset, we worked out a possible improved data structure:
This new structure, however, needed to be created from scratch. The tables and relationships were created directly using pgAdmin UI, but the data from the initial table was copied to the new tables using a PHP script [5] which can be found in our GitLab repository here
In UML, use-case diagrams [7] model the behavior of a system and help to capture the requirements of the system. Use-case diagrams describe the high-level functions and scope of a system. These diagrams also identify the interactions between the system and its actors (the users).
For our Data Smart Web Tool, we identified 3 use cases:
The steps regarding the process map with BPMN[2] based on the Database Smart Web Tool are:
The process map drawn with BPMN looks like this:
The canvas for a map drawn with BPMN consists of lanes and pools:
Special thanks goes to Andy Pavlo, Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University, who promptly sent us the dataset of dbdb.io [6].