![geolocation data from ip2location geolocation data from ip2location](https://botwiki.org/wp-content/uploads/2018/02/ip2location-768x691.png)
Their weblogs obfuscate the final quad of the IPv4 dotted quad, so they look like this: 192.168.1.jjr.Īfter loading the data, we now need to geolocate the web hits. After some research, I located a partially obfuscated Apache web log from the US Securities and Exchange Commission (SEC) available here: As it turns out, this is not an easy thing to do. One of the thing needed to test geolocating a web server log is, well, a web server log. TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('') Where to Get Sample Apache Web Server Logs Here is the Snowflake File Format you can use to import Apache Web Server logs: ALTER FILE FORMAT "WEBLOG"."APACHE_WEB_SERVER".APACHE_WEB_LOG SET COMPRESSION = 'AUTO'įIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 1įIELD_OPTIONALLY_ENCLOSED_BY = '\042' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = FALSEĮSCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO'
GEOLOCATION DATA FROM IP2LOCATION CODE
You can change the sample code as necessary if you’d prefer to use another database or different schema names. The Apache Web Server data will go into a schema named APACHE_WEB_SERVER, and the geolocating data will go in a schema named IP2LOCATION. Note: For this exercise, we’ll use a database named WEBLOG. Note: Although the Apache Web Log is space delimited, we will use the CSV option for the Snowflake File Format - simply change the delimiter from a comma to a space. You can create a file format using SnowSQL (documented here: ) or you can use the Snowflake Web UI (documented here: ). Snowflake provides a very simple way to ingest structured data in flat files using File Formats. Even without reading the Apache documentation it’s clear that the web log is space delimited and wraps any fields with spaces inside double quotes.
GEOLOCATION DATA FROM IP2LOCATION HOW TO
In the Apache documentation, we can get more detailed information on the meaning of each field in the line, but for now we’re going to concentrate on 1) how to load the web logs into Snowflake, and 2) the key aspects for business intelligence and geolocation. 64.242.88.10 - "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
![geolocation data from ip2location geolocation data from ip2location](https://lerablog.org/wp-content/uploads/2018/01/vrevevew-310x271.png)
Here’s a line in a sample Apache Web Server log. Let’s take one sample web server among many, Apache Web Server, and quickly examine the structure of a log entry. Some web server statistics report on traffic grouped by nation, but what if we want to get much more granular information and incorporate this information with the main data warehouse? The First Step – Getting Web Visitor IP Numbers Suppose a product launch gets lots of web traffic, but the only source of information on visitors is the web log. The Business Case for Geolocating IP Numbersīusiness intelligence and data science teams can get valuable insights knowing the geolocation of website visitors. Since considerable time has passed and changes made to the testing since posting those articles, this post will start from the beginning. This is a continuation of the Part 1 and Part 2 of this series. Posted on NovemNovemby Greg Pavlik Programming Note