An article from jgp: The Best Spark in Town Yesterday, Apache Spark v2.2.0 has been released. Excitement started a few months ago, reaching a “summit” during Spark Summit where a […]
Meet Cactar, the Ancient Mongolian Warlord of Data Quality
An article from jgp: A Little History On August 18, 1227, the well-known Mongolian emperor Genghis Khan passed. Despite numerous criticisms, based on rumors of genocide and brutality, he united […]
Spark Boosts IBM Event Store
An article from jgp: IBM just announced Event Store, a hybrid datastore to store events. The originality? Events can be streamed in and it is based on Apache Spark. IBM […]
The Key to Machine Learning is Prepping the Right Data
An article from jgp: Earlier this month, I was in San Francisco, CA, to attend Spark Summit 2017. I gave a talk on the phase before you can apply Machine […]
A new Informix Book is Out for MacOS and Java
An article from jgp: Why a book on Informix? Informix has been a passion for almost 20 years. Very often, a younger version of myself would say: “I don’t like […]
HDP 2.6 is Out: Spark 2, Hive 2, and Zeppelin 0.7 are GA
An article from jgp: Hortonworks Data Platform (HDP) v2.6 has been released and you can download the platform from their website. The sandbox is not yet available in v2.6. New […]
Recents Publications
An article from jgp: A quick flashback on a few articles I published recently. You Are Not a Machine, So Learn Machine Learning published by Database Trends and Applications on February 21st, […]
The Netherlands Welcomes Trump, What About the Rest of Europe?
An article from jgp: Following President Trump’s election, some European countries have started reacted through their humorists in a very original way, mixing apprehension, gratitude, and (a little bit of) […]
What are Spark Checkpoints on Dataframes?
An article from jgp: Let’s understand what can checkpoints do for your Spark dataframes and go through a Java example on how we can use them. Checkpoint on Dataframe In […]
Apache Spark Event in the Triangle
An article from jgp: A quick post to share the next Spark event that we will run in the NC Triangle (RTP – Chapel Hill, Durham, Raleigh). This event will […]