Gus Vieweg's Tech Blog

Rolling Stone Simplified | How I Scratched My Own Itch

Gus Vieweg — Wed, 28 Sep 2022 15:18:29 GMT

How I Scratched My Own Itch is a series wherein I recount a project I wrote for myself. I discuss the problem, solution, and lessons learned in building it.

Context

My girlfriend, Addie, and I have been on a cross-country road trip throughout 2022. To keep us entertained and to broaden our horizons, we thought it would be fun to listen to Rolling Stone's "500 Greatest Albums of All Time" list.

Boy, were we right! Lots of great, new music. My favorites so far:

If You're Feeling Sinister by Belle and Sebastian at #481
Journey in Satchidanada by Alice Coltrane at #446
Cosmo's Factory by Creedence Clearwater Revival at #413
Beauty and the Beat by The Go-Gos at #400

Note: As of this publish date, we have only gotten through the bottom 114 albums.

The Problem

It was frustrating to discover which albums were up next. Each of the entries on the Rolling Stone site has a blurb justifying each album's ranking by detailing its historical and musical significance. Addie and I love reading the blurbs.

Rolling Stone's site is a long list sectioned into 50 albums per advertisement-riddled page. On the rural roads of Wyoming over spotty data networks, it could take well over four minutes to load the page, let alone scroll to the album we were on.

We knew there was a better way.

Solution

The solution we devised was two part:

Part 1. Scrape the Rolling Stone site.

We could distill the Rolling Stone pages down to only the information we cared about.

Fortunately, listicles such as the Top 500 Albums list are rooted in some sort of repeatable format within a content management system. This allowed us to select the album elements by class name, extract the information we needed, and save it to CSV.

The import script we created can be found here: rs500-simplified/import.js

Part 2. Serve this scraped data in a simple web format.

Now that we had the data we cared about, we could build a simple, ad-free site.

Part 2a. Setting up the data

We scraped the pages we needed and saved them into a Google Sheet. This would serve as our backend - a purely static, uneditable sheet. Since we were not updating any cells, this was a perfect solution. No need for a "true" database - we are only reading/selecting information.

With this setup, we could use Python's googleapiclient package to query this Google Sheet.

Part 2b. Building the website

We wanted to get this deployed as soon as possible, as we started working on this a few days before our drive from Sun Prairie, Wisconsin to Fayetteville, Arkansas (11 hours!).

To do so, we used Python and Flask to create a simple website. When someone queries the /album/ endpoint, we pull in the Google Sheet row corresponding to the rank. Then, we're able to render a template based on the data returned by the Google Sheet row.

It's a very simple site, but that's the point! No ads, just info!

The Flask server we made can be found here: rs500-simplified/app.py

Part 2c. Deploying the website

I've had some experience deploying sites on PythonAnywhere (now that Heroku is paid only...), which is why we chose Flask. We didn't need the architecture that Django provides.

We needed simple, fast, and efficient.

So we:

Got the googleapiclient all set up on a PythonAnywhere server
Created a .env file with our environment variables
Edited the WSGI configuration according to PythonAnywhere's guidelines
And hit deploy!

And bada-bing-bada-boom, we were greeted with this simple page when we accessed album #481:

Rolling Stone... simplified!

It loads SO much faster than the Rolling Stone site but still has all the information we want! And, we can load the next album with ease.

Check out the site here: http://awvieweg.pythonanywhere.com/album/481

Lessons Learned

1. If you have the luxury to do so, avoid perfection.

This application was a great example of the 80/20 rule and diminishing returns.

There were so many opportunities to do things "right" and make our application bulletproof but we ignored them for the sake of speed. We don't need a homepage. We don't need a full-proof web scraping solution (see Lessons Learned #2). We don't need any fancy templating.

We built this for ourselves and the minimum viable product was good enough.

2. A little manual work can go a long way.

Initially, when we were trying to build the web scraper, we wanted to execute the script once and have it pull in every entry without fail. However, the Rolling Stone site is constructed in a way where the URL doesn't change and the page loads are unpredictable. We tried an approach whereby we attempted to accommodate these hiccups but ran into issue after issue.

Eventually, we realized we could go one "loaded" page at a time. We may have to execute a simpler script ten times, but the overall time savings would be massive.

3. Use the right technology for the job.

Even though this was my first time with Flask, I had a feeling it would work perfectly. You can just spit out routes without having to set up massive infrastructure like my Django experiences.

A Google Sheet backend was the right choice due to simply pulling data. We didn't need anything more complex than that. Addie and I were able to work on the data at the same time in a very clear and visible manner.

Thanks for reading!

If you want to check out the whole repository, click here.
If you want to see the site, click here (you may have to refresh if it breaks on you).

How to Install the Progress OpenEdge 12.0 ODBC Driver on a CentOS Server

Gus Vieweg — Fri, 23 Sep 2022 16:04:50 GMT

I’m writing this because I could not find a succinct guide on how to install the Progress OpenEdge 12.0 ODBC Driver on a CentOS server.

Context

To connect to our Progress-based database, I was previously using a .jar file in a spoofed Java Runtime Environment. Python packages used included jaydebeapi and jPype1. The configuration was confusing and isolated other developers on the team. Once Oracle started auditing our company to track Java usage based on IP address, we determined it was time to evaluate other methodologies of connecting to our Progress database. Installing a driver on our CentOS server was our best bet.

I imagine at least one other developer somewhere in the world will appreciate this guide. If you are that developer, hello and welcome. I hope this serves you well.

Let’s begin!

1. OpenEdge Configuration

I was fortunate enough to stumble upon this blog post on installing OpenEdge SQL Client Access on a Ubuntu machine.

Follow the above blog post until you reach the section “Confirm correctness of OpenEdge ODBC drivers.”

The blog will instruct you to confirm success by using the command:

# in your terminal
cd /usr/dlc/odbc/lib/
ldd pgoe27.so

The output will be the same as in the blog post. You will be missing libpgicu27.so (not found).

The blog post’s solution is for Ubuntu machines. To configure libpgicu27.so in CentOS, run the following from the directory you are in (/usr/dlc/odbc/lib):

# in your terminal
sudo cp libpgicu27.so /lib64
ldd pgoe27.so

The libpgicu27.so file should now be linked to the copy you made above. Nice!

2. `~/.bashrc` Configuration

Now that we’ve successfully installed the driver and all files point to where they are supposed to, we need to make sure pyodbc and other ODBC libraries can find it.

I have distilled what I’ve learned from Progress’s documentation into the following steps:

We need to add some global export environment variables to our ~/.bashrc file. Add the following to your ~/.bashrc or similar configuration file.

# ~/.bashrc
export ODBCINI="/etc/odbc.ini"
export ODBCINST="/etc/odbcinst.ini"

This next one is a bit different, as it is appending rather than declaring.

Run:

# in your terminal
echo $LD_LIBRARY_PATH

Take its output (in my case, it was /usr/local/lib) and append :/usr/dlc/odbc:/usr/dlc/odbc/lib. Save that string as LD_LIBRARY_PATH in your ~/.bashrc file:

# ~/.bashrc
export LD_LIBRARY_PATH="/usr/local/lib:/usr/dlc/odbc:/usr/dlc/odbc/lib"

Save your ~/.bashrc file and, from your terminal, run:

# in your terminal
source ~/.bashrc

Your changes to ~/.bashrc are now applied. Nice!

3. ODBC File Configuration

ODBC File Configuration depends on whether you create a Data Source Name (DSN) or use an instanced ODBC configuration for your drivers.

To be more efficient (and because I couldn’t understand Progress’s instanced ODBC docs), we can use a Data Source Name (DSN) in our /etc/odbc.ini to hold some of these variables for us.

3a. Data Source Name Configuration

Open your /etc/odbc.ini file and append the below configuration. Note that you will need to supply variables where I have put YOUR_* placeholders.

# /etc/odbc.ini
[ODBC Data Sources]
ProgressOpenEdge=DataDirect 7.1 Progress OpenEdge Wire Protocol

[ProgressOpenEdge]
Driver=/usr/dlc/odbc/lib/pgoe27.so
Description=DataDirect 7.1 Progress OpenEdge Wire Protocol
AllowedOpenSSLVersions=1.1.1,1.0.2
AlternateServers=
ArraySize=
ConnectionRetryCount=0
ConnectionRetryDelay=3
CryptoLibName=
CryptoProtocolVersion=TLSv1.2, TLSv1.1, TLSv1,SSLv3
Database=YOUR_DATABASE
DefaultIsolationLevel=1
EnableTimestampWithTimezone=1
Encryption Method=0
FailoverGranularity=0
FailoverMode=0
FailoverPreconnect=0
HostName=YOUR_HOSTNAME
HostNameInCertificate=
KeepAlive=0
LoadBalancing=0
LoginTimeout=15
PortNumber=YOUR_PORT_NUMBER
QueryTimeout=0
SSLLibName=
TrustStore=
TrustStorePassword=
UseWideCharacterTypes=0
ValidateServerCertificate=1

[ODBC File DSN]
DefaultDSNDir=

[ODBC]
IANAAppCodePage=4
InstallDir=/usr/dlc/odbc/
Trace=0
TraceFile=odbctrace.out
TraceDll=/usr/dlc/odbc/lib/pgoe27.so
ODBCTraceMaxFileSize=102400
ODBCTraceMaxNumFiles=10

The fields Database, HostName, and PortNumber will be based on your Progress database’s configuration. You may also need to adjust other variables depending on other database factors. The above is what worked for me.

Here are the following fields I amended:

Under [ODBC Data Sources]
Added ProgressOpenEdge=DataDirect 7.1 Progress OpenEdge Wire Protocol based on Progress’s example odbc.ini
Under [Progress OpenEdge]
Amended Driver to /usr/dlc/odbc/lib/pgoe27.so (our installation file from Step 1).
Amended the fields Database, HostName, and PortNumber to placeholders you will change.
Under [ODBC]
Amended InstallDir to /usr/dlc/odbc/ (our parent installation directory from Step 1 - see Progress’s Configuring Through The System Information (odbc.ini) File
Amended TraceDll to /usr/dlc/odbc/lib/pgoe27.so (our installation file from Step 1).
NOTE. I lucked out on IANAAppCodePage=4 (the default). There’s a IANAAppCodePages mapping table on Progress’s site, but I couldn’t easily deduce the code page of my database. Fortunately, my friends at Zedfox added their code page of their Progress database in their “Testing OpenEdge ODBC connection” section, on line 12 of their Test.php script. Thanks fellas!

You’re successfully altered the Data Source Name (DSN) configuration. Nice! Skip to Step 4.

4. Testing the Connection

Navigate to your main Python directory and boot up a Python shell. For me, I use:

# in your terminal
pipenv run python

Run the following Python commands in that shell:

# in a python shell
import os
os.getenv("ODBCINI")         # If you don't get /etc/odbc.ini, go back to Step 2
os.getenv("ODBCINST")        # If you don't get /etc/odbcisnt.ini, go back to Step 2
os.getenv("LD_LIBRARY_PATH") # If you don't get :/usr/dlc/odbc:/usr/dlc/odbc/lib, go back to Step 2

import pyodbc
pyodbc.dataSources()
{'ProgressOpenEdge': '/usr/dlc/odbc/lib/pgoe27.so', ...} 
# If you don't see 'ProgressOpenEdge' mapping to
# '/usr/dlc/odbc/lib/pgoe27.so', return to Step 3a/b and double-check your configuration settings.

pyodbc.connect("DSN=ProgressOpenEdge;uid=YOUR_USERNAME;pwd=YOUR_PASSWORD")

You should be greeted with a . If you are, you have successfully installed the Progress OpenEdge 12.0 Driver on CentOS! Nice!

Gus Vieweg's Tech Blog

Rolling Stone Simplified | How I Scratched My Own Itch

Context

The Problem

Solution

Part 1. Scrape the Rolling Stone site.

Part 2. Serve this scraped data in a simple web format.

Part 2a. Setting up the data

Part 2b. Building the website

Part 2c. Deploying the website

Lessons Learned

1. If you have the luxury to do so, avoid perfection.

2. A little manual work can go a long way.

3. Use the right technology for the job.

Thanks for reading!

How to Install the Progress OpenEdge 12.0 ODBC Driver on a CentOS Server

Context

1. OpenEdge Configuration

2. ~/.bashrc Configuration

3. ODBC File Configuration

4. Testing the Connection

2. `~/.bashrc` Configuration