Small Scale SEO Log File Analysis Part 1 – ELK Stack On Intel NUC MicroPC

By on

This is a 2-part blog post on running ELK Stack on an Intel NUC microPC, on a LAN (I’ll soon post a tactic I use to overcome dynamic IP addressing, as despite having a 300mb FTTH connection, BT won’t give me a static IP!).

Part One (you’re already reading it… welcome!): Covers why the feck I want to do this, and how I installed ELK Stack on the NUC, running Ubuntu. Those that are not interested in the set-up can skip to the 2nd blog post…

Part Two: Covers actually using ELK Stack on the NUC to perform logfile analysis for SEO.

Note: If you’re an expert at either ELK Stack or Log File analysis, this post (and part 2) may be a bit basic for you.

Why Do SEOs Analyse Apache Access Logfiles?

With so many excellent analytics tools out there, it may seem to some that it’s a bit odd (or geeky?!) to obsess over log files. Let’s face it, to most people, this doesn’t look that sexy!

However. Whilst analytics tools tend to offer a great UI, shorter learning curve, and ease of drill-down report creation, server log files (in the case of this post, Apache log files) also have a lot going for them:

  • Logged server-side
  • Offers a great way to track bot (e.g. GoogleBot) crawls
  • Can be useful for spotting security issues
  • Identify 404 errors on pages that may not have GA or other code added

What Is ELK Stack? & Why Run On An Intel NUC?

Introducing ELK

ELK Stack (which is being renamed to Elastic Stack), comprises of a combination of awesome tools that work together to help you ingest, filter and visualise log files (not just Apache logs, either). ELK stands for Elasticsearch, Logstash, and Kibana. All three are open source, making this a great option for a fun project, however the tools are also professional enough to handle large-scale log file analysis.

Why Install ELK Stack On An Intel NUC?

Okay partly this was a bit of a fun project, but NUCs can handle a decent amount of work. They are a whole different ball game than something like a Raspberry Pi, but are still pretty compact and not too expensive. The stats of the NUC I’m using, for example:

  • Intel i7 Dual Core CPU
  • 8GB DDR3 RAM
  • 128GB M.2 SSD (pretty fast, 2150MB/R, 1550MB/W)

Now, if you want to run ELK properly, it’s not THAT expensive, but the cost isn’t insignificant:

elk-stack-pricing

That said, an ELK Stack cluster with Elastic like the above will of course be much more optimised, professional, and robust than my NUC experiment. Still, in my mind not as much fun! Installing ELK from scratch using Linux terminal and then configuring config files manually is also a great way to get comfortable with each of the component parts of the ELK Stack, what you can expect from them, and what issues you may run into.

Installing ELK Stack on Ubuntu

I’ll try to keep this brief, but feel free to skip to the 2nd blog post if you’re just here for the log analysis aspect. I won’t be offended, honest. Not a bit.

Skip To Logfile Analysis

For Those Not Skipping This Section, Some Housekeeping Before We Get Our Geek-On!

I used a few different sources for install instructions for ELK on Ubuntu, however none quite hit the mark for one reason or another. So below are instructions for what worked for me – I’ll paste sources of a few related blog posts at the end of this post.
First, let’s be good and make sure all is up to date:

Now we need to know if you have Java 8:

If not, we need to install it:

Install Elastic Search

You’ll to import the Elastic Search public GPG keys:

Next, we’ll:

  • Download Elastic Search (note: This is the latest version as of writing this, however check before downloading here: https://www.elastic.co/jp/downloads/elasticsearch – Right-click on the Deb package):
  • Install it
  • Set to run on system start

Now Elastic Search is installed, we can fire it up!

A quick test to make sure all is running:

…Should give something along the lines of:

Next-Up, Let’s Install Logstash

Depending on what you have set-up already in Ubuntu, you might need to install the apt-transport-https package on Debian:

Save repo def to /etc/apt/sources.list.d/elastic-6.x.list:

Okay, let’s update then install:

That’s Logstash installed… But not configured. We’ll get to shipping logfiles to Logstash later, don’t worry 😉

Start Logstash with:

Lastly, Because We Want Pretty Charts Let’s Install Kibana!

Grab the latest version from here (right-click on the Deb): https://www.elastic.co/jp/downloads/kibana

Aaaannndddd install!:

Configuring Kibana:

You’ll need to edit the Kibana config file with your fave editor (I’m a noob, so I use nano):

You need to make sure the server.port & server.host are configured (and NOT commended out). I use a local static IP for my server.host (for reasons I’ll get onto later!) and the port is default for me (5601).

So – Start Kibana!:

If you navigate to your defined IP & Port in your browser, you should now get the Kibana welcome screen (if you don’t, check over the steps above, or check the list of sources this info was pulled from at the end of the post).

Shipping Logs To ELK Stack

Shipping Logs To ELK Stack
Image Credit

There are various ways to ship server logs to ELK. Beats offers a pretty easy, automated way. However for this project I was lazy… I popped a server log file onto my machine and gave it as a source in the logstash conf file. I guess we should go over how to do this, right?

I’d recommend reading up on configuring logstash, however briefly, this is what I did:

Now, lets break that down a bit! the first section:

This covers the input of the logfile into the logstash. Now, as you can see I’m inputting a file. There are many, many ways you can ship a file to Logstash. Beats is a good way to automate the process, however for testing the set-up I was a bit lazy and just dropped a logfile onto the NUC. Notice the wildcard in the filename? (*.log). This means ANY .log files I drop in that folder will be processed and have the ‘type’ set to ‘websitename’ (the last line).

I gave it a ‘type’ as I’ll later add another input (with logfiles for a different website) and give that a different ‘type’. This is because in the last section, the output:

I use the placeholder %{type} to define the index name, with is used in Kibana’s UI. This means I will have different indices for different websites/projects. Note that there are many ways you can configure Logstash, this is just the way I found easiest but I’m in NO WAY saying it’s the BEST way to set things up, Iit’s just a way I found that works for me…

As with anything SEO related, DO NOT take anyones word for it… the best option is to question, test, and figure out what works for you.

(and if you find a set-up that works better for you don’t be shy! Please share!)

The middle section is the filter section. This handles the parsing of the log file. I took this from 2 or 3 different guides, but in short this section handles the matching of Kibana fields to the log file entries. I’ve installed two Kibana plugins for this to work though, namely:

  • Ingest User Agent Processor Plugin – https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-user-agent.html
  • Ingest GEOIP Processor Plugin – https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-geoip.html

The former splits up the user agent field and segments this input multiple fields, making it easier to search on things like the name of the user agent.

The latter uses the MAXMIND GEOIP database to enable mapping the geolocation of the IP addresses in your logfiles.

Install of both is easy, however I found that in a lot of the guides online, they didn’t specific the same path as me. To make sure you get the right path, run this command:

Look for this line:

The elasticsearch-plugin location is usually in the ‘bin’ folder, uder the ‘home’ path, mentioned above.

In my case, based on the above, the correct commands would be:

This, combined with the entries in the config file (the filters) should make slicing & dicing the log file data much nicer.
OKAY – Part 2… LOG FILE ANALYSIS FOR SEO.

 

Sources Used:

I found these sources useful when performing the above install, however none worked fully for me, but I learned enough from the posts below be successful, so citing them here (in no particular order).

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *