<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<title>divisonbyzero.net</title>
	<subtitle>i wear this chaos well</subtitle>
	<link href="https://divisionbyzero.net/atom.xml" rel="self" type="application/atom+xml"/>
  <link href="https://divisionbyzero.net"/>
	<generator uri="https://www.getzola.org/">Zola</generator>
	<updated>2022-11-12T00:00:00+00:00</updated>
	<id>https://divisionbyzero.net/atom.xml</id>
	<entry xml:lang="en">
		<title>Goodbye, Twitter</title>
		<published>2022-11-12T00:00:00+00:00</published>
		<updated>2022-11-12T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/goodbye-twitter/" type="text/html"/>
		<id>https://divisionbyzero.net/goodbye-twitter/</id>
		<content type="html">&lt;p&gt;I joined Twitter in 2008. It allowed me to connect to the InfoSec community in
a way I couldn&#x27;t in person at the time. I had a lot of positive experiences,
and it opened a few doors for me professionally. Today, after reading about
more senior folks resigning and rumors that Musk is searching for ways to
monetize user data in unethical ways, it&#x27;s time to say good-bye.&lt;&#x2F;p&gt;
&lt;p&gt;I am now happily reliving the best experiences of early Twitter on the &lt;a
rel=&quot;me&quot; href=&quot;https:&#x2F;&#x2F;hachyderm.io&#x2F;@reyjrar&quot;&gt;hachyderm.io&lt;&#x2F;a&gt; Mastodon
instance.&lt;&#x2F;p&gt;
&lt;p&gt;If you&#x27;re considering leaving Twitter, there&#x27;s a few things you might want to
do to ensure your data isn&#x27;t used in whatever the off-the-rails cry-baby
billionaire dreams up next.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;download-your-archive&quot;&gt;Download Your Archive&lt;&#x2F;h2&gt;
&lt;p&gt;You might want to keep a copy of your Twitter data before you nuke it. This
step can take a few hours or days to process, so start here.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Login to Twitter&lt;&#x2F;li&gt;
&lt;li&gt;Click on &amp;quot;More&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;Click on &amp;quot;Settings and Support&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;Click on &amp;quot;Settings and Privacy&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;Under &amp;quot;Your Account&amp;quot; select &amp;quot;Download archive of your data&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;Follow the verification process&lt;&#x2F;li&gt;
&lt;li&gt;Click on &amp;quot;Request archive&amp;quot;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;At this point, your request will be queued to process. It could take a few
days to get your data.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;preparing-for-the-migration&quot;&gt;Preparing for the Migration&lt;&#x2F;h2&gt;
&lt;p&gt;If you&#x27;re migrating to Mastodon, you&#x27;ll want to give your followers a few days
or weeks of notice.  You&#x27;ll need to choose an instance and get up and running
there.  I highly recommend &lt;a href=&quot;https:&#x2F;&#x2F;hachyderm.io&quot;&gt;hachyderm.io&lt;&#x2F;a&gt; as the
community there is a great cross section of folks from all walks of life who
universally agree to be kind to one another. There&#x27;s even a great &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;hachyderm&#x2F;community&#x2F;blob&#x2F;main&#x2F;welcome&#x2F;README.md&quot;&gt;welcome
guide&lt;&#x2F;a&gt; to
cover getting used to Mastodon and the hachyderm community.&lt;&#x2F;p&gt;
&lt;p&gt;Once you select your Mastodon instance, you&#x27;ll want to notify your twitter
followers.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Set your &amp;quot;display name&amp;quot; on Twitter to your Mastodon handle on your Twitter
profile so the various tools will be able to find you. Mine is
&lt;code&gt;@reyjrar@hachyderm.io&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Set your account to private by selecting &amp;quot;Protect my tweets&amp;quot; in &amp;quot;Your
account&amp;quot; &amp;gt; &amp;quot;Account Information&amp;quot; &amp;gt; &amp;quot;Protect Tweets&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;Tweet that you&#x27;re moving to Mastodon, and pin that tweet to your profile.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;While you&#x27;re waiting for your archive download, poke around in the Twitter
settings for privacy, advertising, and location tracking and disable all that
crap. For instance, under &amp;quot;Privacy and safety&amp;quot; you might want to disable
everything in &amp;quot;Discoverability&amp;quot; so you won&#x27;t be discovered in searches
anymore.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;your-archive-is-ready&quot;&gt;Your Archive is Ready&lt;&#x2F;h2&gt;
&lt;p&gt;Download the archive and check it out. It&#x27;s important to note that some of the
media in the archive may not be provided locally.  There are a number of tools
on GitHub for downloading the remote content. I did not use these, so I don&#x27;t
have any recommendations.  If you are concerned about maintaining the images
and videos in your archive, you might need to find one of those tools and run
them before deleting your account and tweets.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Update:&lt;&#x2F;strong&gt; &lt;a href=&quot;https:&#x2F;&#x2F;mathstodon.xyz&#x2F;@timhutton&quot;&gt;@timhutton&lt;&#x2F;a&gt; created &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;timhutton&#x2F;twitter-archive-parser&quot;&gt;a tool to
make your archive
better&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ready-to-delete&quot;&gt;Ready to Delete&lt;&#x2F;h2&gt;
&lt;p&gt;It&#x27;s not an easy decision. I understand that. If you&#x27;re going to do it, you
might as well do it right.  If you&#x27;re not in Brazil, the EU, the UK, or
California, you probably won&#x27;t be able to delete your account. You&#x27;ll have
to rely on a third party service to get rid of as much of your data as you
can.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;tweetdeleter&quot;&gt;TweetDeleter&lt;&#x2F;h3&gt;
&lt;p&gt;I chose to use &lt;a href=&quot;https:&#x2F;&#x2F;tweetdeleter.com&quot;&gt;TweetDeleter&lt;&#x2F;a&gt;, but there are other
options.  Here&#x27;s a trick: if you link TweetDeleter to your account, but don&#x27;t
purchase anything, after a few days they will send you an email with a 20%
off deal.  You&#x27;ll need to do this before you delete your Twitter account.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Link TweetDeleter to your account&lt;&#x2F;li&gt;
&lt;li&gt;Check out the UI, but wait a few days&lt;&#x2F;li&gt;
&lt;li&gt;Wait for the discount code email&lt;&#x2F;li&gt;
&lt;li&gt;Redeem it, choose &amp;quot;Monthly Billing&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;Select &amp;quot;Unlimited&amp;quot; Plan, it should cost &lt;strong&gt;$9.99&lt;&#x2F;strong&gt; for the monthly option&lt;&#x2F;li&gt;
&lt;li&gt;Delete all tweets&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;It might take a while to delete all your tweets, so be patient. After an hour
or two, you should be able to refresh your twitter profile and see no tweets.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Go back to TweetDeleter and into Account settings and &amp;quot;Cancel
Subscription&amp;quot;&lt;&#x2F;strong&gt;. You must do this before you delete your account or else you&#x27;ll
lose access to you TweetDeleter account and potentially wind up with a
recurring bill.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;deactivating-your-twitter-account&quot;&gt;Deactivating Your Twitter Account&lt;&#x2F;h2&gt;
&lt;p&gt;It&#x27;s sad, I know. But it&#x27;s time to let go.&lt;&#x2F;p&gt;
&lt;p&gt;Here&#x27;s the most paranoid way to &amp;quot;delete&amp;quot; your account.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Change&lt;&#x2F;strong&gt; your Twitter username, it doesn&#x27;t matter what you change it to.&lt;&#x2F;li&gt;
&lt;li&gt;Navigate to &amp;quot;Your account&amp;quot; &amp;gt; &lt;strong&gt;&amp;quot;Deactivate account&amp;quot;&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Create&lt;&#x2F;strong&gt; a new Twitter account with your old username, and lock it down:
&lt;ul&gt;
&lt;li&gt;Set your display name to your Mastodon handle&lt;&#x2F;li&gt;
&lt;li&gt;Set &amp;quot;&lt;strong&gt;Protect my tweets&lt;&#x2F;strong&gt;&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;Disable all discoverability options&lt;&#x2F;li&gt;
&lt;li&gt;Unfollow the one user you had to follow to create the account&lt;&#x2F;li&gt;
&lt;li&gt;Disable all the location tracking&lt;&#x2F;li&gt;
&lt;li&gt;Disable all the advertising customizations&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;OK, so, if you&#x27;re giving up Twitter, why rename&#x2F;create the blank account? It&#x27;s
just a placeholder to  your account name so internet weirdos don&#x27;t try to
impersonate you as Twitter burns to the ground.  Yes, it&#x27;s paranoid, but I
think it&#x27;s a good idea, especially if you gave a lot of conference talks and
there&#x27;s videos of you with your slides referencing you on Twitter. It will help
people find you, at least while Twitter is still a thing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;deleting-your-twitter-account&quot;&gt;Deleting Your Twitter Account&lt;&#x2F;h2&gt;
&lt;p&gt;If you&#x27;re in Brazil, the EU, the UK, or California, you live somewhere
with reasonable privacy laws which allow you to request that your data be
removed from the company&#x27;s servers.  If they fail to do so, then you can
notify the relevant government body and they will levy fines against Twitter
until they comply.&lt;&#x2F;p&gt;
&lt;p&gt;Unfortunately, this process will take a little while to complete, which is why
we went through all the rest of this pain.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Head over to &lt;a href=&quot;https:&#x2F;&#x2F;yourdigitalrights.org&#x2F;d&#x2F;twitter.com&quot;&gt;yourdigitalrights.org&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Fill out your name&lt;&#x2F;li&gt;
&lt;li&gt;Select the relevant privacy law from the drop down&lt;&#x2F;li&gt;
&lt;li&gt;Under additional identifying information list:
&lt;ul&gt;
&lt;li&gt;The &amp;quot;renamed&amp;quot; old Twitter handle&lt;&#x2F;li&gt;
&lt;li&gt;Any email addresses associated with that account&lt;&#x2F;li&gt;
&lt;li&gt;Any phone numbers associated with that account&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;Click the down arrow on &amp;quot;Review and Send&amp;quot; and &amp;quot;Copy Text to Clipboard&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;Paste into your email client
&lt;ul&gt;
&lt;li&gt;The first two lines include the to and subject&lt;&#x2F;li&gt;
&lt;li&gt;To: &lt;a href=&quot;copyright@twitter.com&quot;&gt;mailto:copyright@twitter.com&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Subject: &lt;strong&gt;Data deletion request&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;Send the email&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;This starts the process. Most likely, they will need to verify your identity
and place of residence to ensure they are authorized to delete your
information.  I am on this step of the process and expect it to take a week or
two to complete. I will update this article as I get more information.&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>My Experience with Burnout</title>
		<published>2021-04-14T00:00:00+00:00</published>
		<updated>2021-04-14T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/my-experience-with-burnout/" type="text/html"/>
		<id>https://divisionbyzero.net/my-experience-with-burnout/</id>
		<content type="html">&lt;p&gt;For nearly 4 years, I dealt with high levels of stress in my life without
seeking help. As a consequence, my stress response got stuck &amp;quot;on&amp;quot;. While I
removed myself from the primary stressor, I took on new stress with an
international move, new job, a new house, and reverse culture shock coming
back to the USA. Even though these were mostly positive changes, my body kept
the stress response active. I knew something was wrong, but I told myself I
could manage it. I thrived in stressful situations. I knew my limits.&lt;&#x2F;p&gt;
&lt;p&gt;I was catastrophically wrong. My inability to recognize the severity of my
situation lead to three devastating physical health issues I am still actively
managing every day. I wish I had reached out for help sooner. &lt;&#x2F;p&gt;
&lt;p&gt;These are the steps I am taking to manage my mental, emotional, and physical
health:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;I &lt;strong&gt;started working with a mental health professional&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;I &lt;strong&gt;removed myself&lt;&#x2F;strong&gt; from stressful situations&lt;&#x2F;li&gt;
&lt;li&gt;I &lt;strong&gt;exercise&lt;&#x2F;strong&gt; regularly&lt;&#x2F;li&gt;
&lt;li&gt;I &lt;strong&gt;value my attention&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;I&#x27;d like to share my story of how the stress I experienced manifested
physically. If for no other reason than to serve as a warning to folks
currently dealing with anxiety and stress. I wish someone would&#x27;ve told me,
&amp;quot;you don&#x27;t have to do this alone. It&#x27;s OK to ask for help even if you feel
like others are in a worse place.&amp;quot;&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;my-story&quot;&gt;My Story&lt;&#x2F;h2&gt;
&lt;p&gt;I woke up Thursday, May 31st, 2018 with an ear ache in my right ear.  The
scalp on my right side felt slightly tingly and sensitive to the touch.  I was
flying on Saturday to &lt;a href=&quot;http:&#x2F;&#x2F;monitorama.com&#x2F;2018&#x2F;pdx.html&quot;&gt;Monitorama in
Portland&lt;&#x2F;a&gt;, so I made an appointment with
my doctor to make sure I could fly with my ear bothering me.  On Friday, the
ear ache had gotten worse and the sensitivity on my scalp was undeniable. My
doctor told me that my ear drum was not inflamed, but was irritated. She
didn&#x27;t see signs of a typical ear infection and told me to dose up on
over-the-counter pain and allergy medications.  She said I was OK to fly and
it would probably clear up in a few days. There was no explanation for the
tingling and sensitivity on my scalp.&lt;&#x2F;p&gt;
&lt;p&gt;Saturday morning, I woke up and made strawberry pancakes for breakfast As I
took a bite of the pancakes, all I could taste was bitterness, acidity, and a
strange metallic flavor.  I asked my wife if they tasted OK to her, and she
said they were fine. I poured on more maple syrup, but I couldn&#x27;t taste it. I
was excited to be going to Monitorama, and chalked this strange taste
malfunction up to the allergy medicine.  I packed my bags and my wife drove me
to the airport.  The flight was difficult. My ear ache was agitated by the
pressure changes and the food I brought with me tasted off, like I wasn&#x27;t able
to taste the sweet notes in anything I ate. The pain in my ear and on the side
of my face intensified.  I took more allergy and pain medication and tried
desperately to sleep.&lt;&#x2F;p&gt;
&lt;p&gt;I woke up Sunday morning in so much pain I could barely think.  I couldn&#x27;t lay
on my right side because the whole right side of my face felt like it
was being stabbed by thousands of needles. The pain medication was not
helping.  I looked for a doctor&#x27;s office and managed to get an appointment
first thing Monday morning. I needed food so I ventured out through the pain
to find something to eat.&lt;&#x2F;p&gt;
&lt;p&gt;I found a sandwich shop and ordered a toasted sandwich with fries. The fries
tasted edible and I had hope the worst was over. I took a bite of the sandwich
and spit it out into the trashcan. It tasted like cigarette ashes smelled. I
disassembled the sandwich and was able to eat the avocado on it, but the rest
of the sandwich was off in a way I couldn&#x27;t stomach. I finished the fries and
avocado and climbed back into bed to try and get some sleep.  Anytime I&#x27;d roll
over on my right side, I&#x27;d shoot straight up in the bed, engulfed in all
encompassing pain. I had a doctor&#x27;s appointment first thing in the morning.
Instead of helping setup for the conference, I walked to the doctor&#x27;s office
and got there before they opened.&lt;&#x2F;p&gt;
&lt;p&gt;The doctor diagnosed me with a subdermal skin infection on the right side of
my face. Unfortunately, the diagnosis did not explain my inability to taste
sweetness which was slowly driving me mad. I was adamant that I needed an
explanation and she referred me to an Ear, Nose, Throat (ENT) doctor for a
same day appointment.&lt;&#x2F;p&gt;
&lt;p&gt;Later at the ENT doctor, I explained my symptoms and he performed an exam
including what appeared to be several neurological tests. After the exam he
said &amp;quot;I have good news, and I have bad news. The good news is I know what you
have, the bad news is that it&#x27;s Ramsay Hunt syndrome, also known as Herpes
Zoster Oticus.&amp;quot; In layman&#x27;s terms, the virus that causes the chicken pox, had
come back to life, as the shingles, only instead of affecting the nerves in my
torso, it had localized on the nerve in my jaw and ear. He said this type of
presentation was rare, and even rarer for someone only 37 yrs old. He asked me
if I was under a lot of stress. Yes, near constant for over 6 years..&lt;&#x2F;p&gt;
&lt;p&gt;He prescribed high dose steroids, antivirals, and pain medication
and told me I should to head home immediately.  I wasn&#x27;t yet in the contagious
phase, but I would be soon. He warned me to stay away from infants and
pregnant women.  I went back to my hotel, changed my flight, checked out of my
hotel and flew home.  I would miss Monitorama this year. That was almost as
painful as my illness.  Two of my colleagues had made it to the conference
this year and I was looking forward to spending the week with them for months.
I would now be stuck at home, watching the live stream as I phased in and out of
consciousness from the medications.&lt;&#x2F;p&gt;
&lt;p&gt;Back at home, I saw an ENT doctor and an opthamologist. There was a very real
possibility of the infection spreading to the optic nerve and blinding me in
my right eye. I was on an emotional roller coaster between the steroids, the
pain medications, and the anti-virals.  The first anti-viral didn&#x27;t wipe out
the infection, and I had to get a second anti-viral added to the mix.&lt;&#x2F;p&gt;
&lt;p&gt;In a matter of minutes, I would cycle between being depressed almost to the
point of not being able to move, to so angry that I was literally screaming at
my computer screen. I had no control over my emotions and they erupted from my
keyboard into work chats. It&#x27;s a testament to my manager I wasn&#x27;t fired. Every
night I told my wife I was going to quit my job because I was either so
depressed or angry.&lt;&#x2F;p&gt;
&lt;p&gt;The cloud of medications started to lift after 3 weeks. A neurologist placed
me on gabbapentin to manage the nerve pain.  Unfortunately, it made me dizzy
and sleepy.  My on-call rotation came up, and I had to get a co-worker to
cover it because on the gabbapentin, I could sleep, but nothing woke me up
until it wore off.&lt;&#x2F;p&gt;
&lt;p&gt;My ability to taste sweetness had not returned.  Things I used to love,
coffee, beer, grilled food, chocolate ice cream, would make me gag they tasted
so terrible. I could eat food and feel physically full, or even bloated, but
my brain wasn&#x27;t getting any feedback from &amp;quot;sweetness&amp;quot; receptors on my tongue.
Oddly, this left me feeling mentally starving and physically full. It was the
strangest and most uncomfortable feeling I&#x27;ve ever had.&lt;&#x2F;p&gt;
&lt;p&gt;I exhausted my paid time off very quickly, and thanks to a change to our PTO
policy, I was not allowed to go negative. My manager encouraged me to try to
make use of the company&#x27;s short term disability policy.  Unfortunately, the
illness had only devastated my mental and emotional health. The physical
effects it had were not enough for me to qualify, so I had to continue working
remotely. Exhausted, still in moderate pain, and weening myself off steroids,
I was an unpleasant person in every aspect of my life. I was spiraling.&lt;&#x2F;p&gt;
&lt;p&gt;At this point, a coworker reached out to me. He had been dealing with his own
personal health issues. In the course of his research, he had read that among
people with sensory impairments, suicide rates were the highest among patients
who had lost some or all of their ability to taste. He said he was concerned
about me and asked me if I had considered seeing a mental health professional.&lt;&#x2F;p&gt;
&lt;p&gt;I had always admired people who had shared that they were working with mental
health professionals. I thought of them as wise and brave. However, this
machoism-duality in my brain told me &amp;quot;I&#x27;m strong enough to deal with my own
issues.&amp;quot; It wasn&#x27;t until a male friend I admired admitted to me that he
was seeing a mental health professional and said something along the lines of
&amp;quot;it&#x27;s OK if you see one too&amp;quot; that I let myself be OK with the idea.&lt;&#x2F;p&gt;
&lt;p&gt;I started to see a mental health professional in early August that year.
I spent most of my time in therapy complaining. During the first few months,
maybe even year, I used therapy to point out what the world had done to me. I
also felt like I had to prove to my therapist that I didn&#x27;t need to be in
therapy. It&#x27;s complicated, but I was raised to put on a happy face for
strangers. Only the immediate family really ever saw each other&#x27;s flaws. It
took a while for me to get comfortable with the idea of sharing honestly. Even
longer for me to really commit to the process in a way that started to have
dramatic impact on my life.&lt;&#x2F;p&gt;
&lt;p&gt;During the &amp;quot;toe in the water&amp;quot; phase with therapy, I had my first experience
with acid reflux. The first time feeling that type of pain is terrifying. I
started freaking out, and had a full on anxiety attack.  I was diagnosed with
GERD. Only 6 months after getting a mostly normal sense of taste, I&#x27;d
have to give up all my favorite foods and beverages to manage the condition.
My doctor said both conditions were likely triggered by &amp;quot;stress.&amp;quot;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;getting-better&quot;&gt;Getting Better&lt;&#x2F;h2&gt;
&lt;p&gt;My life has been changed forever by the physical health issues caused by my
inability to address the stress in my life. I still experience occasional loss of
taste due to nerve damage caused by the Ramsay Hunt Syndrome. I am battling
with acid reflux, so my intake of coffee and spicy foods is severely limited to
none. I regret not taking my stress more seriously.&lt;&#x2F;p&gt;
&lt;p&gt;You may be thinking, &amp;quot;I have a job, my loved ones are relatively safe, I don&#x27;t
have it as hard as everyone else.&amp;quot; You may not be wrong about that. You would
be wrong if you delayed reaching out or ignored your own mental health.
Something Matty Stratton said in his talk &lt;a href=&quot;https:&#x2F;&#x2F;speaking.mattstratton.com&#x2F;pFDGrd&quot;&gt;Fight, Flight, or
Freeze&lt;&#x2F;a&gt; stuck with me: &amp;quot;Stress is
relative.&amp;quot; You may not have the same problems as someone else, but that
doesn&#x27;t mean your problems will cause you any less stress. Your experience of
stress is unique to you. You shouldn&#x27;t compare your circumstances to others as
an excuse to not reach out for help you feel you need.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;consult-a-mental-health-professional&quot;&gt;Consult a mental health professional&lt;&#x2F;h3&gt;
&lt;p&gt;If you only take one suggestion on my list, find a mental health professional.
I was too close to my issues to see the whole picture.  The guidance and
perspective of a third party expert will multiply the positive effects of your
actions. Be open and honest, and do your best to put the work in from the
start. I was slow to commit to the process and that delayed my recovery and
lead to more physical health problems.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;remove-yourself-from-the-stressful-situations&quot;&gt;Remove yourself from the stressful situations&lt;&#x2F;h3&gt;
&lt;p&gt;Even if you can&#x27;t remove yourself from the most stressful situation in your
life, there&#x27;s a good chance there are lesser stressors you can quit. Drop
those things and replace them with activities that bring you happiness and
calm. I dropped social media apps and limited my TV time. I filled that time
with more cycling, meditation, reading, and walking my dogs.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;exercise&quot;&gt;Exercise&lt;&#x2F;h3&gt;
&lt;p&gt;I know, exercise. Everyone says exercise. There&#x27;s &lt;a href=&quot;https:&#x2F;&#x2F;www.womenshealthmag.com&#x2F;uk&#x2F;health&#x2F;mental-health&#x2F;a27098268&#x2F;how-to-de-stress&quot;&gt;research showing exercise
helps our bodies and brains close a stress
cycle&lt;&#x2F;a&gt;.
When you experience a threat in nature, you react with fight, flight, or
freeze. Your cortisol and adrenaline spike. Your breathing becomes shallow.
Your digestion slows.  Exercise reduces your body&#x27;s cortisol and adrenaline
levels naturally. This slows your breathing, improves digestion, and makes it
easier to sleep. I like the &lt;a href=&quot;https:&#x2F;&#x2F;www.downdogapp.com&quot;&gt;DownDog Apps&lt;&#x2F;a&gt; as
it provides timed exercises that don&#x27;t require any gym equipment.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;value-your-attention&quot;&gt;Value your attention&lt;&#x2F;h3&gt;
&lt;blockquote&gt;
&lt;p&gt;What you focus on is who you become.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;I highly recommend you watch &lt;a href=&quot;https:&#x2F;&#x2F;www.netflix.com&#x2F;title&#x2F;81254224&quot;&gt;The Social
Dilemma&lt;&#x2F;a&gt;. If you don&#x27;t value your
attention, there are companies making billions of dollars valuing it for you.
After watching that documentary, I made some hard decisions and deleted a ton
of apps from my phone as well as disabling nearly every notification.  Here&#x27;s
what I did to my phone:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Deleted any app that fed my dopamine receptors: social media, games, etc.&lt;&#x2F;li&gt;
&lt;li&gt;Disabled all but essential notifications: calls and texts from known
callers, and my on-call app&lt;&#x2F;li&gt;
&lt;li&gt;Deleted all open tabs from my browsers and switched to &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Firefox_Focus&quot;&gt;Firefox
Focus&lt;&#x2F;a&gt; on my phone&lt;&#x2F;li&gt;
&lt;li&gt;Set an aggressive &amp;quot;Do Not Disturb&amp;quot; window, 8pm - 7am&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;I made these changes as a revolt against the companies using my personal data
for profit, but the changes had a dramatic impact on my wellness. Nothing else
in the list aside from exercise had such an immediate and measured impact on
my well being. It&#x27;s also really easy to try these things for a week and revert
them if they don&#x27;t help you.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;mindfulness-meditation&quot;&gt;Mindfulness meditation&lt;&#x2F;h4&gt;
&lt;p&gt;Nothing will help you value your attention more than understanding and
training your attention. There is no better way to do so than with mindfulness
meditation. My therapist recommended meditation constantly.  I
didn&#x27;t fully commit to it at first. When I did, I was put off by the presence
of spirituality and religion in the guided meditation apps. As &lt;a href=&quot;https:&#x2F;&#x2F;livingroomrebellion.com&#x2F;prose&#x2F;non-fiction&#x2F;2015&#x2F;10&#x2F;18&#x2F;american-atheist.html&quot;&gt;an
atheist&lt;&#x2F;a&gt;,
there&#x27;s nothing that can snap me out of a mindful state like mentions
of religion and spirituality.&lt;&#x2F;p&gt;
&lt;p&gt;It wasn&#x27;t until I listened to the &lt;a href=&quot;https:&#x2F;&#x2F;podcasts.apple.com&#x2F;us&#x2F;podcast&#x2F;lets-be-reasonable-sam-harris-and-rufus-in-conversation&#x2F;id1482067226?i=1000492486555&quot;&gt;Next Big Idea&#x27;s Episode with Sam
Harris&lt;&#x2F;a&gt;
did I find a practice I could use reliably. Sam Harris is a neuroscientist and
approaches mindfulness meditation from a purely scientific and secular angle.
His pragmatic approach to guided meditations keeps me engaged.&lt;&#x2F;p&gt;
&lt;p&gt;Meditation isn&#x27;t something that shows enormous benefits all at once. I
initially felt calmer during the meditations, and then got back to my
day-to-day life. Over the course of a few weeks, I began to notice I was able
to set markers for myself. When I experienced a negative emotion, I could
pause and acknowledge it. That process strips the negative emotions of their
ability to control me. I&#x27;m still learning. I get upset, frustrated, angry,
sad, and make all kinds of mistakes. I just make considerably fewer missteps
now that I&#x27;ve started a mindfulness practice.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;i-hope-this-helps&quot;&gt;I hope this helps&lt;&#x2F;h2&gt;
&lt;p&gt;I wanted to share my experience in the hopes that someone recognizes the
stress and anxiety in their life and decides to take action. I&#x27;m gradually
working my way back to homeostasis, but if I had acted sooner, my life would
be a lot different. I&#x27;d probably still be able to enjoy coffee and spicy
foods!&lt;&#x2F;p&gt;
&lt;p&gt;Take care of yourself, because if you don&#x27;t, no one else will.&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>ElasticSearch CLI Tools - Part 1</title>
		<published>2019-05-18T00:00:00+00:00</published>
		<updated>2019-05-18T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/elasticsearch-cli-tools/" type="text/html"/>
		<id>https://divisionbyzero.net/elasticsearch-cli-tools/</id>
		<content type="html">&lt;p&gt;While working at Booking.com, I was looking for a solution to logging that
matched the ease of use and power as &lt;a href=&quot;https:&#x2F;&#x2F;graphiteapp.org&quot;&gt;Graphite&lt;&#x2F;a&gt; did
for metrics.  Reluctant to bring a new technology into production, I talked to
co-workers and one mentioned that they were using
&lt;a href=&quot;https:&#x2F;&#x2F;www.elastic.co&#x2F;products&#x2F;elasticsearch&quot;&gt;ElasticSearch&lt;&#x2F;a&gt; in some
front-end systems for search and disambiguation.  He mentioned hearing there
were a few projects using ElasticSearch for storing log data.&lt;&#x2F;p&gt;
&lt;p&gt;This began my love-hate-love relationship with ElasticSearch.  I&#x27;ve spent the
past 8 years working with ElasticSearch professionally and in my spare time.
Graphite and ElasticSearch are two projects that change the game in terms of
exploring your data.  The countless insights I&#x27;ve gained into system
performance, application performance, and system and network security with
these tools is unparalleled.  Tools like &lt;a href=&quot;https:&#x2F;&#x2F;grafana.com&quot;&gt;Grafana&lt;&#x2F;a&gt; and
&lt;a href=&quot;https:&#x2F;&#x2F;www.elastic.co&#x2F;products&#x2F;kibana&quot;&gt;Kibana&lt;&#x2F;a&gt; allow you to visualize your
data quickly and beautifully.  As a system and security engineer, sometimes
this isn&#x27;t enough.  I spend most of my day in a terminal and needed something
to explore and pivot through the data there.&lt;&#x2F;p&gt;
&lt;p&gt;This is the first part, in a many part series about a tool I created to make
ElasticSearch&#x27;s powerful search interface more accessible from the terminal.
This tool has been essential to nearly every incident I&#x27;ve investigated.  It 
was developed with the help, patience, and amazing ideas from co-workers both
at Booking.com and now at &lt;a href=&quot;https:&#x2F;&#x2F;www.craigslist.org&quot;&gt;Craigslist&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;perl-setup&quot;&gt;Perl Setup&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;m a &lt;a href=&quot;https:&#x2F;&#x2F;perl.org&quot;&gt;Perl&lt;&#x2F;a&gt; programmer.  You may have strong feelings about
that, but Perl has been good to me.  The freedom to write code as beautifully,
or as ugly, as I need to get the job done is liberating.  I recommend using
Perl 5.28 or newer with &lt;a href=&quot;https:&#x2F;&#x2F;perlbrew.pl&#x2F;&quot;&gt;Perlbrew&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;You should be comfortable with the command line, so follow the steps to
install Perlbrew from it&#x27;s homepage.  After that:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ perlbrew init
&lt;&#x2F;span&gt;&lt;span&gt;$ perlbrew install -j 8 -n --thread 5.28.2
&lt;&#x2F;span&gt;&lt;span&gt;$ perlbrew switch 5.28.2
&lt;&#x2F;span&gt;&lt;span&gt;$ perlbrew install-cpanm
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now that you have a working, local, user managed Perl, we&#x27;ll install the
toolset.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ cpanm App::ElasticSearch::Utilities
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The utilities and their dependencies will be installed in your local,
user managed Perl path.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;some-utilities-installed&quot;&gt;(Some) Utilities Installed&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;es-alias-manager.pl&lt;&#x2F;code&gt; - Alternative to
&lt;a href=&quot;https:&#x2F;&#x2F;www.elastic.co&#x2F;guide&#x2F;en&#x2F;elasticsearch&#x2F;client&#x2F;curator&#x2F;current&#x2F;index.html&quot;&gt;curator&lt;&#x2F;a&gt;
for managing aliases for indexes&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;es-apply-settings.pl&lt;&#x2F;code&gt; - Applies settings to an index based on index name
and age.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;es-copy-index.pl&lt;&#x2F;code&gt; - Tool for copying all (or based on a search) documents
from an index on the same or a different cluster to another index,
optionally supports supplying alternate settings&#x2F;mappings for the
destination index if it&#x27;s being created&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;es-daily-index-maintenance.pl&lt;&#x2F;code&gt; - Alternative to
&lt;a href=&quot;https:&#x2F;&#x2F;www.elastic.co&#x2F;guide&#x2F;en&#x2F;elasticsearch&#x2F;client&#x2F;curator&#x2F;current&#x2F;index.html&quot;&gt;curator&lt;&#x2F;a&gt;
for maintaining index life spans&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;es-graphite-dynamic.pl&lt;&#x2F;code&gt; - Script to extract ElasticSearch Performance
metrics into &lt;a href=&quot;https:&#x2F;&#x2F;graphiteapp.org&quot;&gt;Graphite&lt;&#x2F;a&gt; directly or via
collectd&#x2F;diamond.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;es-status.pl&lt;&#x2F;code&gt; - A quick &amp;quot;how&#x27;s the cluster&amp;quot; status overview&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;es-storage-overview.pl&lt;&#x2F;code&gt; - Check how much storage each node and&#x2F;or index is
consuming in the cluster.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;And finally, the tool I&#x27;m going to be talking about: &lt;code&gt;es-search.pl&lt;&#x2F;code&gt;.  This is
a tool designed with the UNIX philosophy in mind to enable workflows where the
output of one query can be fed into another.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;configuration&quot;&gt;Configuration&lt;&#x2F;h2&gt;
&lt;p&gt;In order to ensure we have the most fun with the tool, let&#x27;s setup some
defaults to make our command lines shorter.  All of the scripts (and if you&#x27;re
so inclined, the entirety of the &lt;code&gt;App::ElasticSearch::Utilities&lt;&#x2F;code&gt; functions)
use this config file to determine how to find, connect, and talk to your
ElasticSearch cluster.&lt;&#x2F;p&gt;
&lt;p&gt;Create &lt;code&gt;~&#x2F;.es-utils.yaml&lt;&#x2F;code&gt; file with something like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;yaml&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-yaml &quot;&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;&lt;span&gt;---
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;host&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;localhost
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;port&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;9200
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;base&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;syslog
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;days&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;1
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;timestamp&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;@timestamp&amp;#39;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;host&lt;&#x2F;code&gt; - The host of the hostname or IP of the node you&#x27;d like to use to
connect, default is &lt;strong&gt;localhost&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;port&lt;&#x2F;code&gt; - The port to use to connect, the default is &lt;strong&gt;9200&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;base&lt;&#x2F;code&gt; - Default index base name, defaults to &lt;strong&gt;logstash&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;days&lt;&#x2F;code&gt; - Default number of days to search, defaults to &lt;strong&gt;7&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;timestamp&lt;&#x2F;code&gt; - Default name of the field containing the timestamp for logging
events, defaults to &lt;strong&gt;@timestamp&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;index-bases&quot;&gt;Index Bases&lt;&#x2F;h3&gt;
&lt;p&gt;The idea behind this tool, is to make things as simple as possible.  If you&#x27;re
like me, you probably use index names to differentiate where shards are
allocated and ultimately, how long shards will exist on your cluster.  On
large indices, where data is variably interesting, I tend to use this pattern.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;I want to index HTTP access logs, I&#x27;ll designate the mappings keying off
the pattern: &lt;code&gt;*-access-*&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;My logs span multiple datacenters, so I&#x27;ll set allocation rules to make
shards in each datacenter &lt;strong&gt;stay&lt;&#x2F;strong&gt; in that datacenter.  If my datacenter tag
is &lt;code&gt;sfo&lt;&#x2F;code&gt;, I&#x27;d set a pattern &lt;code&gt;sfo-*&lt;&#x2F;code&gt; to grab those shards&lt;&#x2F;li&gt;
&lt;li&gt;There maybe &lt;em&gt;lower value&lt;&#x2F;em&gt; data in the logs, like requests for images, CSS, or
JavaScript assets.  I want these around, but if they&#x27;re 90% of my logging
volume and they generally become less interesting more quickly and I&#x27;ll want
shorter retention rules applied to them. These indexes might include a tag in
the index name of &lt;code&gt;*-bulk-*&lt;&#x2F;code&gt; to make them distinguishable.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;At the end of this madness I might have a list of indexes like:&lt;&#x2F;p&gt;
&lt;table class=&quot;table table-sm table-bordered table-striped&quot;&gt;
&lt;thead class=&quot;thead-dark&quot;&gt;
&lt;tr&gt;&lt;th&gt;Index Name&lt;&#x2F;th&gt;&lt;th&gt;Alias&lt;&#x2F;th&gt;&lt;th&gt;Retention&lt;&#x2F;th&gt;&lt;th&gt;Content&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;&#x2F;td&gt;&lt;td&gt; ams-access-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; access-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; 90d &lt;&#x2F;td&gt;&lt;td&gt; Normal access logs for `ams` servers &lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;&#x2F;td&gt;&lt;td&gt; ams-access-bulk-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; access-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; 7d &lt;&#x2F;td&gt;&lt;td&gt; Uninteresting access logs for `ams` servers &lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;&#x2F;td&gt;&lt;td&gt; ams-syslog-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; syslog-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; 90d &lt;&#x2F;td&gt;&lt;td&gt; Syslog data for `ams` servers &lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;&#x2F;td&gt;&lt;td&gt; sfo-access-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; access-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; 90d &lt;&#x2F;td&gt;&lt;td&gt; Normal access logs for `sfo` servers &lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;&#x2F;td&gt;&lt;td&gt; sfo-access-bulk-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; access-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; 7d &lt;&#x2F;td&gt;&lt;td&gt; Uninteresting access logs for `sfo` servers &lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;&#x2F;td&gt;&lt;td&gt; sfo-syslog-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; syslog-2019.05.19 &lt;&#x2F;td&gt;&lt;td&gt; 90d &lt;&#x2F;td&gt;&lt;td&gt; Syslog data for `sfo` servers &lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;
&lt;&#x2F;table&gt;
&lt;p&gt;If I wanted to search those indexes, I could just use &lt;code&gt;--base access&lt;&#x2F;code&gt; as they
all will be parsed to the correct bases.  If you&#x27;re not sure what
&lt;code&gt;es-search.pl&lt;&#x2F;code&gt; might think of what &lt;em&gt;bases&lt;&#x2F;em&gt; you have available, ask it to tell
you:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ es-search.pl --bases
&lt;&#x2F;span&gt;&lt;span&gt;Bases available for search:
&lt;&#x2F;span&gt;&lt;span&gt;	access
&lt;&#x2F;span&gt;&lt;span&gt;    ams-access
&lt;&#x2F;span&gt;&lt;span&gt;    ams-access-bulk
&lt;&#x2F;span&gt;&lt;span&gt;    ams-syslog
&lt;&#x2F;span&gt;&lt;span&gt;    sfo-access
&lt;&#x2F;span&gt;&lt;span&gt;    sfo-access-bulk
&lt;&#x2F;span&gt;&lt;span&gt;    sfo-syslog
&lt;&#x2F;span&gt;&lt;span&gt;    syslog
&lt;&#x2F;span&gt;&lt;span&gt;# Bases: 8 from a combined 6 indices.
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;handling-more-than-one-index-base-with-ease&quot;&gt;Handling More Than One Index Base with Ease!&lt;&#x2F;h3&gt;
&lt;p&gt;That&#x27;s all fine and good if all of your indexes contain the same document
types.  That&#x27;s unlikely as you should be splitting different document types up
into separate indices, if not clusters.  If you want to work with
&lt;code&gt;es-search.pl&lt;&#x2F;code&gt; across all those indexes easily, it will need to know the
correct timestamp field.  To enable per-base timestamp fields, you can just
add a &lt;code&gt;meta&lt;&#x2F;code&gt; section to your &lt;code&gt;~&#x2F;.es-utils.yaml&lt;&#x2F;code&gt; file.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;yaml&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-yaml &quot;&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;&lt;span&gt;---
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;host&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;localhost
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;port&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;9200
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;base&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;syslog
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;days&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;1
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;meta&lt;&#x2F;span&gt;&lt;span&gt;:
&lt;&#x2F;span&gt;&lt;span&gt;  &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;access&lt;&#x2F;span&gt;&lt;span&gt;:
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;timestamp&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;timestamp
&lt;&#x2F;span&gt;&lt;span&gt;  &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;ossec&lt;&#x2F;span&gt;&lt;span&gt;:
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;timestamp&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;ts
&lt;&#x2F;span&gt;&lt;span&gt;  &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;zeek&lt;&#x2F;span&gt;&lt;span&gt;:
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;timestamp&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;event_ts
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now &lt;code&gt;es-search.pl&lt;&#x2F;code&gt; and the rest of the utilities will know that when you
specify &lt;code&gt;--base zeek&lt;&#x2F;code&gt; the timestamp field to sort on will be &lt;code&gt;event_ts&lt;&#x2F;code&gt; and
you won&#x27;t need to think about adding &lt;code&gt;--timestamp event_ts&lt;&#x2F;code&gt; to the command
line.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;seeing-data&quot;&gt;Seeing Data&lt;&#x2F;h2&gt;
&lt;p&gt;Now that you&#x27;re configured, we can just run: &lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ es-search.pl
&lt;&#x2F;span&gt;&lt;span&gt;= Querying Indexes: syslog-2019.05.19
&lt;&#x2F;span&gt;&lt;span&gt;---
&lt;&#x2F;span&gt;&lt;span&gt;action: connect
&lt;&#x2F;span&gt;&lt;span&gt;hostname: janus
&lt;&#x2F;span&gt;&lt;span&gt;message: &amp;#39;connect from unknown[102.165.34.33]&amp;#39;
&lt;&#x2F;span&gt;&lt;span&gt;proc: smtpd
&lt;&#x2F;span&gt;&lt;span&gt;proc_id: 30775
&lt;&#x2F;span&gt;&lt;span&gt;program: postfix&#x2F;smtpd
&lt;&#x2F;span&gt;&lt;span&gt;src: unknown
&lt;&#x2F;span&gt;&lt;span&gt;src_ip: 102.165.34.33
&lt;&#x2F;span&gt;&lt;span&gt;tags:
&lt;&#x2F;span&gt;&lt;span&gt;  - decoder_syslog
&lt;&#x2F;span&gt;&lt;span&gt;  - mail
&lt;&#x2F;span&gt;&lt;span&gt;  - postfix
&lt;&#x2F;span&gt;&lt;span&gt;timestamp: 2019-05-19T02:07:34.861416
&lt;&#x2F;span&gt;&lt;span&gt;total_time: 0.004363
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;snip&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;# Search Parameters:
&lt;&#x2F;span&gt;&lt;span&gt;#    {&amp;quot;bool&amp;quot;:{}}
&lt;&#x2F;span&gt;&lt;span&gt;# Displaying 20 of 357 in 0.0584328174591064 seconds.
&lt;&#x2F;span&gt;&lt;span&gt;# Indexes (1 of 1) searched: syslog-2019.05.19
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Each document&#x27;s &lt;code&gt;_source&lt;&#x2F;code&gt; is YAML printed to the screen.  This is not the usual
use case for &lt;code&gt;es-search.pl&lt;&#x2F;code&gt;, so let&#x27;s do better.  It&#x27;s also likely that the
documents you&#x27;re viewing may not contain all the valid fields in the index.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;finding-the-fields-in-the-index&quot;&gt;Finding the Fields in the Index&lt;&#x2F;h2&gt;
&lt;p&gt;When you start working with ElasticSearch indexes, you may not know all the
fields available for search.  &lt;code&gt;es-search.pl&lt;&#x2F;code&gt; allows you to explore a bit:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ es-search.pl --base syslog --fields
&lt;&#x2F;span&gt;&lt;span&gt;Fields available for search:
&lt;&#x2F;span&gt;&lt;span&gt;	- action
&lt;&#x2F;span&gt;&lt;span&gt;	- dev
&lt;&#x2F;span&gt;&lt;span&gt;	- dst_geoip.continent
&lt;&#x2F;span&gt;&lt;span&gt;	- dst_geoip.country
&lt;&#x2F;span&gt;&lt;span&gt;	- dst_geoip.location
&lt;&#x2F;span&gt;&lt;span&gt;	- dst_ip
&lt;&#x2F;span&gt;&lt;span&gt;	- dst_port
&lt;&#x2F;span&gt;&lt;span&gt;	- exe
&lt;&#x2F;span&gt;&lt;span&gt;	- file
&lt;&#x2F;span&gt;&lt;span&gt;	- hostname
&lt;&#x2F;span&gt;&lt;span&gt;	- in_bytes
&lt;&#x2F;span&gt;&lt;span&gt;	- message
&lt;&#x2F;span&gt;&lt;span&gt;	- out_bytes
&lt;&#x2F;span&gt;&lt;span&gt;	- proc
&lt;&#x2F;span&gt;&lt;span&gt;	- proc_id
&lt;&#x2F;span&gt;&lt;span&gt;	- program
&lt;&#x2F;span&gt;&lt;span&gt;	- proto_app
&lt;&#x2F;span&gt;&lt;span&gt;	- rec_id
&lt;&#x2F;span&gt;&lt;span&gt;	- src
&lt;&#x2F;span&gt;&lt;span&gt;	- src_geoip.city
&lt;&#x2F;span&gt;&lt;span&gt;	- src_geoip.continent
&lt;&#x2F;span&gt;&lt;span&gt;	- src_geoip.country
&lt;&#x2F;span&gt;&lt;span&gt;	- src_geoip.location
&lt;&#x2F;span&gt;&lt;span&gt;	- src_geoip.postal_code
&lt;&#x2F;span&gt;&lt;span&gt;	- src_ip
&lt;&#x2F;span&gt;&lt;span&gt;	- src_port
&lt;&#x2F;span&gt;&lt;span&gt;	- src_user
&lt;&#x2F;span&gt;&lt;span&gt;	- tags
&lt;&#x2F;span&gt;&lt;span&gt;	- timestamp
&lt;&#x2F;span&gt;&lt;span&gt;	- timing.phase
&lt;&#x2F;span&gt;&lt;span&gt;	- timing.seconds
&lt;&#x2F;span&gt;&lt;span&gt;	- total_time
&lt;&#x2F;span&gt;&lt;span&gt;# Fields: 32 from a combined 1 indices.
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This will help you understand what an index contains.  Maybe you wanna see
what&#x27;s in a field?  There&#x27;s two ways, the first with search, the second with
aggregations.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;finding-field-values-with-search&quot;&gt;Finding Field Values with Search&lt;&#x2F;h3&gt;
&lt;p&gt;The simplest, and least taxing way to ask ElasticSearch what a field contains
is to query the index and return the relevant field.  To optimize for
documents containing the field, we can use the &lt;code&gt;--exists &amp;lt;fieldname&amp;gt;&lt;&#x2F;code&gt; filter.&lt;&#x2F;p&gt;
&lt;p&gt;If I just want to see the most recent 20 documents where the field &lt;code&gt;proc&lt;&#x2F;code&gt;
exists and &lt;em&gt;just&lt;&#x2F;em&gt; see the &lt;code&gt;proc&lt;&#x2F;code&gt; entry, it&#x27;s as simple as:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ es-search.pl --exists proc --show proc
&lt;&#x2F;span&gt;&lt;span&gt;= Querying Indexes: syslog-2019.05.19
&lt;&#x2F;span&gt;&lt;span&gt;timestamp    proc
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:04:06.135686    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:04:06.135786    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:04:05.856884    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:46.471311    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:46.471352    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:46.199116    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:37.013022    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:37.012866    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:36.741711    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:18.239108    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:18.239135    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:17.947805    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:07.837098    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:07.837133    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:07.553645    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:07.342514    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:07.342686    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:03:07.067929    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:02:57.157830    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;2019-05-19T02:02:57.157612    smtpd
&lt;&#x2F;span&gt;&lt;span&gt;# Search Parameters:
&lt;&#x2F;span&gt;&lt;span&gt;#    {&amp;quot;bool&amp;quot;:{&amp;quot;must&amp;quot;:[{&amp;quot;exists&amp;quot;:{&amp;quot;field&amp;quot;:&amp;quot;proc&amp;quot;}}]}}
&lt;&#x2F;span&gt;&lt;span&gt;# Displaying 20 of 85 in 0.0445699691772461 seconds.
&lt;&#x2F;span&gt;&lt;span&gt;# Indexes (1 of 1) searched: syslog-2019.05.19
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This might not give me the best understanding of what the field is, but
already, I know that &lt;code&gt;postfix&lt;&#x2F;code&gt; log entries are setting this field.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;finding-field-values-with-aggregations&quot;&gt;Finding Field Values with Aggregations&lt;&#x2F;h3&gt;
&lt;p&gt;We can do a lot better by leveraging aggregations in ElasticSearch.  To do so,
we ask &lt;code&gt;es-search.pl&lt;&#x2F;code&gt; for the top values.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ es-search.pl --top proc
&lt;&#x2F;span&gt;&lt;span&gt;= Querying Indexes: syslog-2019.05.19
&lt;&#x2F;span&gt;&lt;span&gt;count    proc
&lt;&#x2F;span&gt;&lt;span&gt;224  smtpd
&lt;&#x2F;span&gt;&lt;span&gt;27   smtps_smtpd
&lt;&#x2F;span&gt;&lt;span&gt;12   qmgr
&lt;&#x2F;span&gt;&lt;span&gt;9    localsmtp_smtpd
&lt;&#x2F;span&gt;&lt;span&gt;6    cleanup
&lt;&#x2F;span&gt;&lt;span&gt;4    submission_smtpd
&lt;&#x2F;span&gt;&lt;span&gt;3    anvil
&lt;&#x2F;span&gt;&lt;span&gt;3    lmtp
&lt;&#x2F;span&gt;&lt;span&gt;3    pipe
&lt;&#x2F;span&gt;&lt;span&gt;# Search Parameters:
&lt;&#x2F;span&gt;&lt;span&gt;#    {&amp;quot;bool&amp;quot;:{}}
&lt;&#x2F;span&gt;&lt;span&gt;# Displaying 9 of 693 in 0.00798892974853516 seconds.
&lt;&#x2F;span&gt;&lt;span&gt;# Indexes (1 of 1) searched: syslog-2019.05.19
&lt;&#x2F;span&gt;&lt;span&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;# Totals across batch
&lt;&#x2F;span&gt;&lt;span&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;count    proc
&lt;&#x2F;span&gt;&lt;span&gt;224  smtpd
&lt;&#x2F;span&gt;&lt;span&gt;27   smtps_smtpd
&lt;&#x2F;span&gt;&lt;span&gt;12   qmgr
&lt;&#x2F;span&gt;&lt;span&gt;9    localsmtp_smtpd
&lt;&#x2F;span&gt;&lt;span&gt;6    cleanup
&lt;&#x2F;span&gt;&lt;span&gt;4    submission_smtpd
&lt;&#x2F;span&gt;&lt;span&gt;3    anvil
&lt;&#x2F;span&gt;&lt;span&gt;3    pipe
&lt;&#x2F;span&gt;&lt;span&gt;3    lmtp
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;We now have the top 20 (or fewer if there&#x27;s not 20 total) values in the &lt;code&gt;proc&lt;&#x2F;code&gt;
field.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;putting-it-together&quot;&gt;Putting It Together&lt;&#x2F;h3&gt;
&lt;p&gt;It looks like &lt;code&gt;proc&lt;&#x2F;code&gt; is the &lt;em&gt;component&lt;&#x2F;em&gt; piece for &lt;code&gt;postfix&lt;&#x2F;code&gt; syslog data.  To be
sure, let&#x27;s ask ElasticSearch for the top programs with the top 10 procs each.
Since &lt;code&gt;es-search.pl&lt;&#x2F;code&gt; is designed to make this easy, we type almost exactly
that:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ es-search.pl --top program --with proc:10 --exists proc
&lt;&#x2F;span&gt;&lt;span&gt;= Querying Indexes: syslog-2019.05.19
&lt;&#x2F;span&gt;&lt;span&gt;count  program
&lt;&#x2F;span&gt;&lt;span&gt;224  postfix&#x2F;smtpd              terms.proc    smtpd    224
&lt;&#x2F;span&gt;&lt;span&gt;35   postfix&#x2F;smtps&#x2F;smtpd    	terms.proc    smtps_smtpd    35
&lt;&#x2F;span&gt;&lt;span&gt;12   postfix&#x2F;qmgr    			terms.proc    qmgr    12
&lt;&#x2F;span&gt;&lt;span&gt;9    postfix&#x2F;localsmtp&#x2F;smtpd    terms.proc    localsmtp_smtpd    9
&lt;&#x2F;span&gt;&lt;span&gt;6    postfix&#x2F;anvil    			terms.proc    anvil    6
&lt;&#x2F;span&gt;&lt;span&gt;6    postfix&#x2F;cleanup            terms.proc    cleanup    6
&lt;&#x2F;span&gt;&lt;span&gt;6    postfix&#x2F;submission&#x2F;smtpd   terms.proc    submission_smtpd    6
&lt;&#x2F;span&gt;&lt;span&gt;3    postfix&#x2F;lmtp               terms.proc    lmtp    3
&lt;&#x2F;span&gt;&lt;span&gt;3    postfix&#x2F;pipe               terms.proc    pipe    3
&lt;&#x2F;span&gt;&lt;span&gt;# Search Parameters:
&lt;&#x2F;span&gt;&lt;span&gt;#    {&amp;quot;bool&amp;quot;:{&amp;quot;must&amp;quot;:[{&amp;quot;exists&amp;quot;:{&amp;quot;field&amp;quot;:&amp;quot;proc&amp;quot;}}]}}
&lt;&#x2F;span&gt;&lt;span&gt;# Displaying 9 of 304 in 0.0130970478057861 seconds.
&lt;&#x2F;span&gt;&lt;span&gt;# Indexes (1 of 1) searched: syslog-2019.05.19
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Let&#x27;s break down that query:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;--top program&lt;&#x2F;code&gt; - Top aggregation, infers &lt;code&gt;terms&lt;&#x2F;code&gt;, uses the value of &lt;code&gt;--size&lt;&#x2F;code&gt; which defaults to &lt;em&gt;20&lt;&#x2F;em&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;--with proc:10&lt;&#x2F;code&gt; - Sub aggregation, form is &lt;strong&gt;agg_type&lt;&#x2F;strong&gt;:&lt;strong&gt;field name&lt;&#x2F;strong&gt;:&lt;strong&gt;sub size&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agg_type&lt;&#x2F;code&gt; - defaults to terms and can be omitted, but can also be: &lt;code&gt;significant_terms&lt;&#x2F;code&gt;,&lt;code&gt;max&lt;&#x2F;code&gt;, &lt;code&gt;min&lt;&#x2F;code&gt;, &lt;code&gt;sum&lt;&#x2F;code&gt;, &lt;code&gt;avg&lt;&#x2F;code&gt;, &lt;code&gt;cardinality&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;field_name&lt;&#x2F;code&gt; - is required and is the sub aggregate field name&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;sub_size&lt;&#x2F;code&gt; - defaults to &lt;strong&gt;3&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;--exists proc&lt;&#x2F;code&gt; - Filter the entire aggregation to just documents with the proc field&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;wrapping-up-for-now&quot;&gt;Wrapping up for now&lt;&#x2F;h2&gt;
&lt;p&gt;I think this is a reasonable point to pause.  This provides you with enough
information to start getting your feet wet with the tool.  In the next part,
I&#x27;ll examine building useful queries and how this tool enables pivoting and
data exploration.&lt;&#x2F;p&gt;
&lt;p&gt;If you can&#x27;t wait til next time, run: &lt;code&gt;es-search.pl --manual&lt;&#x2F;code&gt; to get an in
depth overview of the options available.  See below for that man page online:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub Project Page: &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;reyjrar&#x2F;es-utils&#x2F;&quot;&gt;reyjrar&#x2F;es-utils&lt;&#x2F;a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;es-search.pl&lt;&#x2F;code&gt; &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;reyjrar&#x2F;es-utils&#x2F;blob&#x2F;master&#x2F;Searching.mkdn&quot;&gt;man page&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;MetaCPAN Project Page: &lt;a href=&quot;https:&#x2F;&#x2F;metacpan.org&#x2F;pod&#x2F;App::ElasticSearch::Utilities&quot;&gt;BLHOTSKY&#x2F;App-ElasticSearch-Utilities&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>systemd-resolved is broken</title>
		<published>2017-12-20T00:00:00+00:00</published>
		<updated>2017-12-20T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/systemd-resolved-is-broken/" type="text/html"/>
		<id>https://divisionbyzero.net/systemd-resolved-is-broken/</id>
		<content type="html">&lt;p&gt;Full disclosure, I&#x27;m not a fan of systemd.  I started working with Linux in
the late 90&#x27;s and watched it grow from a marginalized operating system to the
most dominant operating system in the datacenter.  I&#x27;ve lived through so many
&amp;quot;year of the Linux desktop&amp;quot; years I remember when it wasn&#x27;t a joke.  From my
vantage point, administering Linux servers professionally for nearly 20 years,
systemd is Linux on the desktop at the cost of Linux in the datacenter.&lt;&#x2F;p&gt;
&lt;p&gt;Why do I feel this way? It&#x27;s mostly the reinvention and incorrect
implementations of core UNIX tools and modalities.  There&#x27;s a lot of
information on systemd out there.  There&#x27;s a lot of bias involved.  So, today,
I&#x27;m not going to talk about that.  I am going to address a critical mistake in
the systemd-resolved daemon which implements DNS lookups for systems running
systemd.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ll jump right to the work-around.  If you&#x27;re running a system which is using
systemd, you should probably be running systemd-resolved configured to use a
single DNS resolver, 127.0.0.1, and run &lt;a href=&quot;https:&#x2F;&#x2F;unbound.net&#x2F;&quot;&gt;Unbound&lt;&#x2F;a&gt;.
There are resources on how to configure and run Unbound, but the best is
&lt;a href=&quot;https:&#x2F;&#x2F;calomel.org&#x2F;unbound_dns.html&quot;&gt;Calomel&#x27;s Unbound Tutorial&lt;&#x2F;a&gt;. If you
need to maintain consistent, reliable DNS resolution that&#x27;s compatible with
previous versions of Linux, the only way to do that is to have a single DNS
server in &#x2F;etc&#x2F;resolv.conf.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;why-this-matters&quot;&gt;Why This Matters&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;systemd&#x2F;systemd&#x2F;issues&#x2F;5755&quot;&gt;This thread on
systemd-resolved&lt;&#x2F;a&gt; explains the
issue.  Yes, putting external DNS servers into your internal servers
&#x2F;etc&#x2F;resolv.conf is not great form, but that&#x27;s completely missing the point
exposed in this bug report.&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;systemd-resolved is implementing state tracking against a stateless
protocol.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Not only that, but it does it poorly.  In the cases described by the
commenters, a temporary blip in connectivity to internal DNS servers wound up
blacklisting them indefinitely.  In my nearly 20 years as a Linux admin, I&#x27;ve
seen nearly every junior admin come up with the same idea after their first
DNS outage, &amp;quot;Why don&#x27;t we just keep track of what DNS servers respond and then
ignore ones that are failing?&amp;quot;  It sounds great, but because DNS is a
stateless protocol by design, determining &amp;quot;working server&amp;quot; from &amp;quot;not working
server&amp;quot; is profoundly more difficult then issuing a HTTP request to a status
handler.  It&#x27;s complicated.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-old-behavior&quot;&gt;The Old Behavior&lt;&#x2F;h2&gt;
&lt;p&gt;So, there&#x27;s a lot of misconceptions about glibc&#x27;s resolver library, so I&#x27;m
hoping to squash a bit of that and address &lt;em&gt;how&lt;&#x2F;em&gt; most name resolutions work on
most Linux systems.  Yes, it&#x27;s possible to use a different resolver library
and those libraries may not implement resolution the same.  However, I want to
talk about the glibc resolver and how it interacts with &#x2F;etc&#x2F;resolv.conf on a
stock CentOS 6 system and every UNIX and Linux prior.&lt;&#x2F;p&gt;
&lt;p&gt;Here&#x27;s a sample &#x2F;etc&#x2F;resolv.conf&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;search edgeofsanity.net
&lt;&#x2F;span&gt;&lt;span&gt;nameserver 192.168.1.1
&lt;&#x2F;span&gt;&lt;span&gt;nameserver 9.9.9.9
&lt;&#x2F;span&gt;&lt;span&gt;nameserver 8.8.8.8
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;First things first, &#x2F;etc&#x2F;resolv.conf supports only three nameservers, and
further servers are ignored.  I&#x27;ve seen up to eight servers in resolv.conf&#x27;s
administered by experienced, knowledgeable folks. Remember, only the first
three are ever queried.&lt;&#x2F;p&gt;
&lt;p&gt;So what happens with this resolv.conf?  Well, if 192.168.1.1 is responding to
queries, it will always be used to resolve every query.  If a query passes the
timeout, the default is 5 seconds, without a response, the query will be
resent to 192.168.1.1 once more before advancing to 9.9.9.9.  These counters
are tracked internally by the &lt;strong&gt;process&lt;&#x2F;strong&gt; running the resolver library.  These
are not global counters, they are local to each &lt;strong&gt;process&lt;&#x2F;strong&gt;.  This particular
failure case is also &lt;strong&gt;per-query&lt;&#x2F;strong&gt;, meaning each DNS query will have to
timeout twice to 192.168.1.1 before advancing to 9.9.9.9.&lt;&#x2F;p&gt;
&lt;p&gt;Why? Well, a timeout could happen for any number of reasons.  A timeout of a
nameserver for one query doesn&#x27;t predict a timeout in the future to the same
server for the same query.  It&#x27;s complicated.&lt;&#x2F;p&gt;
&lt;p&gt;What this configuration guarantees is that every query will take at least 10
seconds to resolve if 192.168.1.1 is down.  This is less than ideal, so we can
&lt;em&gt;improve&lt;&#x2F;em&gt; that a little by adding options.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;search edgeofsanity.net
&lt;&#x2F;span&gt;&lt;span&gt;nameserver 192.168.1.1
&lt;&#x2F;span&gt;&lt;span&gt;nameserver 9.9.9.9
&lt;&#x2F;span&gt;&lt;span&gt;nameserver 8.8.8.8
&lt;&#x2F;span&gt;&lt;span&gt;options timeout 1 attempts 1
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;By setting &lt;code&gt;timeout&lt;&#x2F;code&gt; to 1 second and &lt;code&gt;attempts&lt;&#x2F;code&gt; to 1, we&#x27;ll try 9.9.9.9 if
192.168.1.1 doesn&#x27;t respond within 1 second.  Again, this is per-query,
per-process, so every query will always try 192.168.1.1 before moving on to
9.9.9.9, because, repeat after me, &amp;quot;a timeout of a single query to single DNS
server cannot predict that even the same query to the same server will timeout
at any point in the future.&amp;quot;&lt;&#x2F;p&gt;
&lt;p&gt;This &lt;em&gt;improves&lt;&#x2F;em&gt; the failure case for 192.168.1.1 becoming unavailable, but
it&#x27;s still 1+ second for every DNS query, which is unacceptably slow for any
web-scale service.  There&#x27;s another option we can introduce to decrease the
impact the DNS server being unavailable has on our servers:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;search edgeofsanity.net
&lt;&#x2F;span&gt;&lt;span&gt;nameserver 192.168.1.1
&lt;&#x2F;span&gt;&lt;span&gt;nameserver 9.9.9.9
&lt;&#x2F;span&gt;&lt;span&gt;nameserver 8.8.8.8
&lt;&#x2F;span&gt;&lt;span&gt;options timeout 1 attempts 1 rotate
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;We introduce the &lt;code&gt;rotate&lt;&#x2F;code&gt; option to the config file.  If you were to run this:
&lt;code&gt;while true; do getent hosts www.google.com; done&lt;&#x2F;code&gt; You&#x27;d probably be surprised
to see &lt;strong&gt;EVERY&lt;&#x2F;strong&gt; query going to 192.168.1.1.  Maybe you can guess why that is?
That&#x27;s right! The &lt;code&gt;rotate&lt;&#x2F;code&gt; option is per-process, so each time we run &lt;code&gt;getent&lt;&#x2F;code&gt;
we start a new process, which starts at the first name server for the first
query and continues on to the next server for the next query.  Failures
per-query are still processed the same way.&lt;&#x2F;p&gt;
&lt;p&gt;If you had a failure of 192.168.1.1, you&#x27;d have more than 33% of DNS queries
taking 1+ seconds to resolve.  Why? Again, &lt;code&gt;rotate&lt;&#x2F;code&gt; is per-process so long
running processes will rotate through the bad server every 3 queries.
However, every new process will always start at the beginning of the list.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-new-behavior&quot;&gt;The New Behavior&lt;&#x2F;h2&gt;
&lt;p&gt;OK, so what&#x27;s described in the GitHub issue is the systemd-resolved&#x27;s author
deciding to break a fundamental design in the DNS resolution on UNIX systems.
Servers are &lt;em&gt;never&lt;&#x2F;em&gt; skipped in the previous glibc resolver world.  This is
because, and I&#x27;ll say it again, a timeout for a single DNS query to a single
DNS server does not predict a timeout for that same query to that same server
at any point in the future. The systemd-resolved behavior now adds this state
to a stateless protocol, which leads to unpredictable and inconsistent
behavior in one of the lowest level, most misunderstood, and most critical
components in your infrastructure.&lt;&#x2F;p&gt;
&lt;p&gt;There is a way to work-around this, if every DNS server in the list of DNS
servers is marked as being problematic, systemd-resolved falls back to the
default behavior of going through every server in the list and resetting their
state.  The easiest way to ensure this happens is to list a single nameserver
in the &#x2F;etc&#x2F;resolv.conf settings.  This will force a short circuiting in the
state tracking logic.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;in-closing&quot;&gt;In Closing&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;m not going to bash systemd or any of it&#x27;s authors or maintainers.  They&#x27;re
doing their best to solve hard problems.  I do disagree fundamentally with
their direction and assumptions, but they&#x27;re writing code and dealing with
angry communities, and I won&#x27;t pile on.  However, this behavior is
fundamentally different than everything else in the space and represents what
I fear is a naivety and disinterest in understanding the problem space.  If
you administer Linux systems professionally, you need to be aware of this
difference and how it will impact your infrastructure if there are issues with
upstream DNS providers.&lt;&#x2F;p&gt;
&lt;p&gt;It&#x27;s entirely possible this change in behavior will have no or very little
impact on your infrastructure.  It&#x27;s important to understand this difference
as DNS is often impacted by or impacting the availability of your services.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;updates&quot;&gt;Updates&lt;&#x2F;h2&gt;
&lt;p&gt;First, I got something wrong.  In the case we &lt;code&gt;rotate&lt;&#x2F;code&gt; enabled and 3
nameservers, approximately 50% of queries will take 1+ seconds to resolve.
This is because the state isn&#x27;t magic, it&#x27;s a simple pointer that&#x27;s
incremented each time.  Consider, query #1 goes to 192.168.1.1, it times out,
the pointer is advanced to 9.9.9.9 and it succeeds.  Query #2 comes in and
that pointer is advanced to 8.8.8.8, it succeeds.  Query #3 comes in, the
pointer is advanced to 192.168.1.1 it times out and moves on to 9.9.9.9.
Rinse and repeat.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;bigonbe&#x2F;status&#x2F;943767027080138753&quot;&gt;Laurent Bigonville&lt;&#x2F;a&gt;
suggested removing &lt;code&gt;resolve&lt;&#x2F;code&gt; from &#x2F;etc&#x2F;nsswitch.conf.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;PaulVixie&quot;&gt;Paul Vixie&lt;&#x2F;a&gt; noted that application developers
should consider using &lt;a href=&quot;https:&#x2F;&#x2F;getdnsapi.net&#x2F;&quot;&gt;getdns&lt;&#x2F;a&gt; in their applications
as it&#x27;s a modern, smart resolver library.&lt;&#x2F;p&gt;
&lt;p&gt;Weronika Pawlak graciously translated &lt;a href=&quot;https:&#x2F;&#x2F;www.piecesauto-pro.fr&#x2F;blog&#x2F;2018&#x2F;02&#x2F;20&#x2F;systemd-ratkaistu-rikki&#x2F;&quot;&gt;this post into
Finnish&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>VPNs and Internet Privacy</title>
		<published>2017-07-16T00:00:00+00:00</published>
		<updated>2017-07-16T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/vpns-internet-privacy/" type="text/html"/>
		<id>https://divisionbyzero.net/vpns-internet-privacy/</id>
		<content type="html">&lt;p&gt;After getting a few questions from concerned folks about VPN services. I
realized this might be better served as an article. This way anyone who is
curious about how to protect themselves better online can reference it.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-bad-news&quot;&gt;The Bad News&lt;&#x2F;h3&gt;
&lt;p&gt;Well, there&#x27;s really no easy way to this: &lt;strong&gt;There is very little, if any,
privacy on the Internet.&lt;&#x2F;strong&gt;  Even after following all of the advice I&#x27;m about
to give, all sorts of clever folks in the Valley and beyond are envisioning
&lt;em&gt;clever&lt;&#x2F;em&gt; new ways to improve the &amp;quot;User Experience&amp;quot; (UX) and in the process
accidentally creating newer, clever means to circumvent any and all privacy
controls you might deploy.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;This never used to bother me all that much.  I worked at one of the largest
e-commerce sites in the world.  I routinely used meta-data to piece together
stories about interactions with our site that literally scared me, both because
of what someone had attempted, and because I was able to replicate it with so
very little data.  I am &lt;strong&gt;beyond&lt;&#x2F;strong&gt; amused that Donald Trump signing &lt;a href=&quot;https:&#x2F;&#x2F;www.congress.gov&#x2F;bill&#x2F;115th-congress&#x2F;senate-joint-resolution&#x2F;34&quot;&gt;S.J. Res
34&lt;&#x2F;a&gt;
into law is getting so much attention.  The &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Snowden_leaks&quot;&gt;Snowden
leaks&lt;&#x2F;a&gt; were &lt;strong&gt;MUCH&lt;&#x2F;strong&gt; more
terrifying than this legislation, and &lt;em&gt;spoiler alert&lt;&#x2F;em&gt;, your ISP was already
tracking you and making money off your habits.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;i-came-here-for-the-vpns-smart-ass&quot;&gt;I Came Here For The VPNs Smart Ass&lt;&#x2F;h3&gt;
&lt;p&gt;Right, so you came here for advice on which VPN service to use.  There have
been lots of opinion pieces already written on this, so feel free to search
&lt;a href=&quot;https:&#x2F;&#x2F;duckduckgo.com&quot;&gt;DuckDuckGo&lt;&#x2F;a&gt; for the &amp;quot;best VPNs of 2017.&amp;quot;  The truth
is, a lot of these VPN services are probably much worse than your ISP, who for
all their short comings is a legitimate, accountable business.  This is my
first piece of advice:&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-1-choose-your-isp-wisely&quot;&gt;Step 1: Choose your ISP wisely&lt;&#x2F;h2&gt;
&lt;p&gt;Yes, Comcast and Verizon have the fastest speeds.  They don&#x27;t rank so well in
privacy.  If that&#x27;s important to you, &lt;strong&gt;vote with your business&lt;&#x2F;strong&gt;.  The &lt;a href=&quot;https:&#x2F;&#x2F;www.eff.org&#x2F;who-has-your-back-government-data-requests-2017&quot;&gt;EFF
Who has your
back?&lt;&#x2F;a&gt;
reports the ISPs with the best reputations on Privacy concerns.  I am lucky to
live in a &lt;a href=&quot;https:&#x2F;&#x2F;sonic.net&quot;&gt;Sonic.net&lt;&#x2F;a&gt; coverage area.  Find small, local
ISPs near you and talk to them about their privacy policies.  They&#x27;ll love to
talk you about those things to get the word around.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;vpns-aren-t-private-in-the-way-you-think-they-are&quot;&gt;VPNs Aren&#x27;t Private in the Way You Think They Are&lt;&#x2F;h3&gt;
&lt;p&gt;You&#x27;ll hear some privacy advocates recommend &lt;strong&gt;against&lt;&#x2F;strong&gt; using VPN services in
the US. It&#x27;s likely using VPN services in any US allied nation is probably
&lt;em&gt;worse&lt;&#x2F;em&gt; than using your own ISP.  Why? Allies share intelligence.  OK, I&#x27;ll
say that again: Allies share intelligence.  It may still be illegal for the US
to spy on US citizens, but it&#x27;s not illegal for the UK, Australia, New
Zealand, and Canada to spy on US citizens and share anything they find with
the US Government in exchange for the same favor.  Using VPN services in
countries not allied with the US could be safer from the point of view of
access by the US government to your traffic, but it&#x27;s likely to raise flags
and cause you to be under more scrutiny from the government anyways.&lt;&#x2F;p&gt;
&lt;p&gt;All things considered, you probably want to use a VPN service operating under
the laws of the country you reside. If the reason you&#x27;re using the service is
to circumvent laws in the country where you reside, this article probably
isn&#x27;t for you.  None of my recommendations will protect you from yourself and
none of it will save you from prosecution.  I wish it could, but that&#x27;s not
the Internet &lt;strong&gt;we&lt;&#x2F;strong&gt; built.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-is-a-vpn&quot;&gt;What is a VPN?&lt;&#x2F;h2&gt;
&lt;p&gt;First you have to know the Internet is just a large mesh of computers, each with
connectivity to one or more computers.  These computers agree to transmit data
for one another to end points they&#x27;re not directly connected.  There&#x27;s lots of
math and physics involved.  For our purposes, just think of it as a socialist
group of computers sending data down a path towards its destination.  There&#x27;s
some meta-data wrapped around the data to help the computers determine where the
traffic is from and where it might need to go next.  There&#x27;s a constant exchange
between all the computers informing one another to which computers they can
transmit data.&lt;&#x2F;p&gt;
&lt;p&gt;There&#x27;s a lot of trust in this system, and up until recently, most of the data
itself was unobstructed. Think of it as sending postcards.  Everyone could see
the source, destination, and the entire message. If they carried it, they could
modify it without much ado.  Since the Snowden leaks, the Internet has gotten
serious about encryption.  This means there are more coded messages flying
around, but it&#x27;s still on the back of postcards and anyone in the middle can
read the source and destination.&lt;&#x2F;p&gt;
&lt;p&gt;So, now VPN&#x27;s.  You probably had to use a VPN at work at some point to access
internal company resources while traveling or working from home.  A &lt;em&gt;Virtual
Private Network&lt;&#x2F;em&gt; uses the Internet to allow two computers that are not next to
each on their &lt;em&gt;physical network&lt;&#x2F;em&gt; behave as though they were on the same physical
network.  They usually encrypt the traffic at a very low level to prevent
computers in between from knowing the actual source or destination of the
traffic.&lt;&#x2F;p&gt;
&lt;p&gt;This is good for making internal resources accessible to employees from anywhere
in the world, but it doesn&#x27;t exactly gain you much in the way of privacy.  At
the VPN service, all of your data is unwrapped and shipped off over the Internet
to it&#x27;s destination.  You can transmit encrypted traffic, such as HTTPS, inside
a VPN and the VPN service won&#x27;t know anything about the content of the message,
but they will still know it&#x27;s source and destination.&lt;&#x2F;p&gt;
&lt;p&gt;So, they won&#x27;t know what was in the content, but they will know that after
loading an advertisement that your browser prefetched DNS records for
&amp;quot;erectile-dysfunction.com&amp;quot; you then created an encrypted connection to
&amp;quot;erectile-dysfunction.com&amp;quot; and loaded not one, but several pages, including one
that called out to &amp;quot;verified-by-visa.com.&amp;quot;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-is-dns&quot;&gt;What is DNS?&lt;&#x2F;h2&gt;
&lt;p&gt;Computers prefer numbers, humans prefers letters and words.  To resolve this
issue, (&lt;em&gt;haha, you see what I did there? Don&#x27;t worry if you don&#x27;t, I&#x27;m laughing
by myself&lt;&#x2F;em&gt;) we have the &lt;strong&gt;Domain Name System&lt;&#x2F;strong&gt; or &lt;strong&gt;DNS&lt;&#x2F;strong&gt;.  DNS allows you to
type: &amp;quot;www.google.com&amp;quot; and your computer knows it needs to send data to
&amp;quot;172.217.5.100&amp;quot; or whatever your DNS server says is www.google.com.  It sounds
simple, but it&#x27;s a system with implied trust and some really cool tricks.  This
makes DNS an incredibly complex topic, so I&#x27;ll gloss over all the technical
details and say it does this pretty well, and it does so in plain-text.  Anyone
who sits between you and your local DNS resolver can see every name your
computer tries to resolve without any obstruction, again like a postcard.&lt;&#x2F;p&gt;
&lt;p&gt;Almost 100% of DNS traffic is unencrypted today.
&lt;a href=&quot;https:&#x2F;&#x2F;www.dnscrypt.org&#x2F;&quot;&gt;DNSCrypt&lt;&#x2F;a&gt; is looking to change that, but it&#x27;s just
now gaining traction and you probably aren&#x27;t using it.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;omg-why-do-i-care&quot;&gt;OMG, Why Do I Care?&lt;&#x2F;h3&gt;
&lt;p&gt;Great question! Web pages started out pretty simple, but included ways to link
and even include content from other web sites on your page.  Over time, things
got much more complex, and data gets loaded from everywhere.  The average web
page loads over &lt;a href=&quot;http:&#x2F;&#x2F;www.websiteoptimization.com&#x2F;speed&#x2F;tweak&#x2F;average-web-page&#x2F;&quot;&gt;100 assets and almost 2 megabytes of
files&lt;&#x2F;a&gt;.  This
is a lot of data and a lot of network requests.  To keep that in perspective,
the original &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Legend_Of_Zelda&quot;&gt;Legend of Zelda&lt;&#x2F;a&gt;
came in around ~128 kilobytes, so your average web page is using about &lt;strong&gt;15&lt;&#x2F;strong&gt;
&lt;em&gt;Legend of Zeldas&lt;&#x2F;em&gt; worth of mostly garbage.&lt;&#x2F;p&gt;
&lt;p&gt;Since we all love speed and browsers are throttled by your home internet
speed, a clever person in the Valley or beyond came up with an idea to improve
the User Experience.  They realized, once you load the page, you read it, your
browser is sitting by idly waiting for the next command.  Most of the time,
the next command involves clicking a link, so resolving the hostname, fetching
the page, reading the page for any assets it needs, resolving more names,
fetching those objects.  It takes time and you have to issue those first
requests to know how many of the supporting requests are required.&lt;&#x2F;p&gt;
&lt;p&gt;So, this engineer thought, &amp;quot;what if we scan the page you&#x27;re reading for links
while you&#x27;re reading and start that process?&amp;quot;  And that&#x27;s what they did.  Most
modern browsers use &amp;quot;content pre-fetching&amp;quot; to give you an artificially fast internet
connection.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-2-disable-prefetching&quot;&gt;Step 2: Disable Prefetching&lt;&#x2F;h2&gt;
&lt;p&gt;The terrible truth about prefetching is anyone who can see &lt;em&gt;only your DNS
requests&lt;&#x2F;em&gt; can probably reverse engineer the words you just typed into Google&#x27;s
Search Bar &lt;strong&gt;REGARDLESS&lt;&#x2F;strong&gt; of whether or not you are using HTTPS.  Here&#x27;s a paper
from 2010 detailing &lt;a href=&quot;https:&#x2F;&#x2F;www.usenix.org&#x2F;legacy&#x2F;events&#x2F;leet10&#x2F;tech&#x2F;full_papers&#x2F;Krishnan.pdf&quot;&gt;the privacy implications of DNS
prefetching&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;firefox-all&quot;&gt;Firefox (All)&lt;&#x2F;h3&gt;
&lt;ol&gt;
&lt;li&gt;Open a new tab&lt;&#x2F;li&gt;
&lt;li&gt;Type: about:config&lt;&#x2F;li&gt;
&lt;li&gt;Agree you&#x27;re breaking your warranty&lt;&#x2F;li&gt;
&lt;li&gt;In the search bar, type: network.dns.disablePrefetch&lt;&#x2F;li&gt;
&lt;li&gt;If the value is &amp;quot;false&amp;quot;, double-click on it to change it to &amp;quot;true&amp;quot;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h3 id=&quot;google-chrome-mac&quot;&gt;Google Chrome (Mac)&lt;&#x2F;h3&gt;
&lt;ol&gt;
&lt;li&gt;Select &amp;quot;Preferences&amp;quot; from the menu bar&lt;&#x2F;li&gt;
&lt;li&gt;Select &amp;quot;Advanced&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;Disable the following:&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;ul&gt;
&lt;li&gt;Use a web service to help resolve errors&lt;&#x2F;li&gt;
&lt;li&gt;Use a prediction service to help complete searches and URLs from the URL bar&lt;&#x2F;li&gt;
&lt;li&gt;Use a prediction service to load pages more quickly&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;safari-mac&quot;&gt;Safari (Mac)&lt;&#x2F;h3&gt;
&lt;p&gt;From &lt;a href=&quot;https:&#x2F;&#x2F;discussions.apple.com&#x2F;message&#x2F;12292589#message12292589&quot;&gt;Apple Discussions&lt;&#x2F;a&gt;:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Open Terminal&lt;&#x2F;li&gt;
&lt;li&gt;defaults write com.apple.safari WebKitDNSPrefetchingEnabled -boolean false&lt;&#x2F;li&gt;
&lt;li&gt;Restart Safari&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h2 id=&quot;step-3-tweak-your-browser-privacy-settings&quot;&gt;Step 3: Tweak Your Browser Privacy Settings&lt;&#x2F;h2&gt;
&lt;p&gt;I won&#x27;t go into details on these, but search
&lt;a href=&quot;https:&#x2F;&#x2F;duckduckgo.com&quot;&gt;DuckDuckGo&lt;&#x2F;a&gt; for how to do each on your browser.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Disable Usage Reporting&lt;&#x2F;strong&gt; - Reports to the browser developers how you use
their product.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Disable Crash Reporting&lt;&#x2F;strong&gt; - Reports crashes, including potentially
sensitive or confidential information to the browser developers when your
browser crashes.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Disable WebRTC&lt;&#x2F;strong&gt; - WebRTC allows advertisers to fingerprint you by getting
information about your home network.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Disable Cookies (Third Party)&lt;&#x2F;strong&gt; - Completely disabling cookies will prevent
most sites from working, but this limits it to just sites you visit.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Uninstall Flash and Java&lt;&#x2F;strong&gt; - It&#x27;s 2017, these two need to GTFO
of our browsers. I don&#x27;t have enough curse words to describe why.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;step-4-extend-your-browser&quot;&gt;Step 4: Extend Your Browser&lt;&#x2F;h2&gt;
&lt;p&gt;To enhance your privacy everywhere, regardless of whether you&#x27;re using a VPN
or not, there are some tools freely available.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;ad-blocking&quot;&gt;Ad-Blocking&lt;&#x2F;h3&gt;
&lt;p&gt;I work in IT Security. I know, &amp;quot;ad blocking takes revenue away from the little
guys.&amp;quot;  Unfortunately, the reality is most ad networks do a sub-optimal job of
curating their content.  It&#x27;s not uncommon for malware or viruses to be served
via a legitimate ad network.  Even if that weren&#x27;t the case, the advertiser
and the content-provider may have a contract in place with privacy clauses,
but their contract doesn&#x27;t extend to you, the casual web surfer.  So, you
may trust a particular website, but that doesn&#x27;t mean you necessarily trust
all of their ad partners.&lt;&#x2F;p&gt;
&lt;p&gt;For privacy and security reasons, you need to be using an ad-blocker.  The
best of the breed is &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;gorhill&#x2F;uBlock&quot;&gt;uBlock Origin&lt;&#x2F;a&gt;.  It&#x27;s
light weight, efficient, and super configurable.  Be sure to dig into the
options and go nuts with block lists.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;privacy-enhancements&quot;&gt;Privacy Enhancements&lt;&#x2F;h3&gt;
&lt;p&gt;The EFF publishes two incredibly useful extensions:
&lt;a href=&quot;https:&#x2F;&#x2F;www.eff.org&#x2F;https-everywhere&quot;&gt;HTTPSEverywhere&lt;&#x2F;a&gt; and
&lt;a href=&quot;https:&#x2F;&#x2F;www.eff.org&#x2F;privacybadger&quot;&gt;PrivacyBadger&lt;&#x2F;a&gt;.  HTTPSEverywhere preloads
a new browser feature that prevents snooping and traffic interception by
forcing all communication from popular sites to be HTTPS.  PrivacyBadger is
the EFF&#x27;s curated list of privacy threats.  It&#x27;s a good combo to compliment
the community lists from uBlock Origin.&lt;&#x2F;p&gt;
&lt;p&gt;As I mentioned, web pages these days load resources from all over the place.
There&#x27;s a number of common libraries used in websites that are loaded from
&amp;quot;Content Delivery Networks&amp;quot; (CDNs).  These CDN&#x27;s wind up seeing almost
everything you do on the internet because they receive the context of where
these assets are loaded.  CDN&#x27;s are generally seen as good because they speed
up the internet by having bigger, faster connections in more places than most
content creators can reasonably afford.  They come at cost to privacy though,
as they see what you&#x27;re doing across thousands, if not millions, of sites.&lt;&#x2F;p&gt;
&lt;p&gt;Enter &lt;a href=&quot;https:&#x2F;&#x2F;decentraleyes.org&#x2F;&quot;&gt;Decentraleyes&lt;&#x2F;a&gt;. It contains most
of those common libraries locally. When it can serve the library from a
version it has loaded on you computer, it injects the library locally instead
of fetching it from the CDN.  This dramatically reduces the amount of network
usage and the number of connections per site loaded.&lt;&#x2F;p&gt;
&lt;p&gt;If you happen to be on &lt;a href=&quot;https:&#x2F;&#x2F;getfirefox.org&quot;&gt;Firefox&lt;&#x2F;a&gt;, and you probably
should be, there&#x27;s another extension called
&lt;a href=&quot;https:&#x2F;&#x2F;addons.mozilla.org&#x2F;en-US&#x2F;firefox&#x2F;addon&#x2F;betterprivacy&#x2F;&quot;&gt;BetterPrivacy&lt;&#x2F;a&gt;.
This extension helps clear out persistent tracking data via configurable
thresholds.  You can configure it to wipe them every time you close the
browser, or every X minutes.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;link-and-click-tracking&quot;&gt;Link and Click Tracking&lt;&#x2F;h3&gt;
&lt;p&gt;Do you ever read the full links people post these days?  Or held your mouse
over a link on a Google search results.  There&#x27;s a lot of extra junk in those
links, often with a &lt;code&gt;utm_&lt;&#x2F;code&gt; prefix.  This contains information about the ad
campaign, the medium, and in the case of mobile devices, the application
information used to view the link.  The thing is, these links will work the
same without all of that crap.  I use
&lt;a href=&quot;https:&#x2F;&#x2F;www.ghacks.net&#x2F;2016&#x2F;02&#x2F;07&#x2F;pure-url-removes-optional-url-parameters&#x2F;&quot;&gt;PureURL&lt;&#x2F;a&gt;
to strip unnecessary parameters from links to prevent leaking sensitive data
from my devices.&lt;&#x2F;p&gt;
&lt;p&gt;Again, if you&#x27;re running Firefox, there&#x27;s an additional extension I recommend,
&lt;a href=&quot;https:&#x2F;&#x2F;addons.mozilla.org&#x2F;en-US&#x2F;firefox&#x2F;addon&#x2F;google-privacy&#x2F;&quot;&gt;GooglePrivacy&lt;&#x2F;a&gt;
which does the same thing  PureURL does, but your Google&#x27;s search results and
their specific internal link tracking.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;paranoid-mode&quot;&gt;Paranoid Mode&lt;&#x2F;h3&gt;
&lt;p&gt;For extra paranoid, I recommend the following extensions, but keep in mind,
most of the internet stops working without you explicitly whitelisting
resources in one or more these utilities:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;noscript.net&#x2F;&quot;&gt;NoScript&lt;&#x2F;a&gt; - Disables JavaScript loading, provides
boundary enforcement, prevents &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Clickjacking&quot;&gt;Click
Jacking&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;gorhill&#x2F;uMatrix&quot;&gt;uMatrix&lt;&#x2F;a&gt; -
&lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Little_Snitch&quot;&gt;LittleSnitch&lt;&#x2F;a&gt; for
everything your browser does, everything.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;addons.mozilla.org&#x2F;en-US&#x2F;firefox&#x2F;addon&#x2F;certificate-patrol&#x2F;&quot;&gt;CertificatePatrol&lt;&#x2F;a&gt; -
Reports when the security certificate for a site changes.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;step-5-understand-private-browsing&quot;&gt;Step 5: Understand &amp;quot;Private Browsing&amp;quot;&lt;&#x2F;h2&gt;
&lt;p&gt;&amp;quot;But wait,&amp;quot; you interject, &amp;quot;Why go through this madness when my browser has
private browsing mode?&amp;quot;  Excellent question!  Private browsing is more
&amp;quot;private from other users on my computer&amp;quot; than &amp;quot;privacy from the government.&amp;quot;&lt;&#x2F;p&gt;
&lt;p&gt;When you start a private browsing session, the short and long term caching and
storage for your browser are pointed to a new, unique location for as long as
the private browser window is open.  Think of it as changing your clothes, but
grabbing a different wallet, with different IDs and credit cards each session.&lt;&#x2F;p&gt;
&lt;p&gt;That &lt;em&gt;sounds&lt;&#x2F;em&gt; good, but you don&#x27;t change your internet address, and you&#x27;re not
changing your DNS servers.  This means, to your ISP or anyone able to see your
network traffic, you&#x27;re still you.  So, from an ISP or government perspective,
nothing&#x27;s changed.&lt;&#x2F;p&gt;
&lt;p&gt;Private browsing disables history tracking and makes you look like a new user
to the websites you&#x27;re visiting, but that&#x27;s about it.  Another nice feature is
the local cache is cleared when you close the window.  So all those
embarrassing pictures of the Icy Hot Stuntaz won&#x27;t be sitting around for your
loved ones to find.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-6-vpn-up&quot;&gt;Step 6: VPN Up&lt;&#x2F;h2&gt;
&lt;p&gt;OK, at this point, you&#x27;ve locked your browsers down, but you may have a few
good reasons to use a VPN:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;You&#x27;re going to be using a network you don&#x27;t trust, like mobile networks or
other people&#x27;s WiFi.&lt;&#x2F;li&gt;
&lt;li&gt;You&#x27;re on a mobile device. Mobile networks are terrible for privacy and
there aren&#x27;t many choices aside from VPNs.&lt;&#x2F;li&gt;
&lt;li&gt;You&#x27;re in a country like China where your communication is severely
hindered by your local government.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Those are the only reasons most responsibly savvy users may find they need a
VPN.  As I stated before, I&#x27;d recommend choosing a VPN service that operates
legally in your own country.  Some VPN service providers operate in many
countries and all you to route our traffic to any one of those countries.  If
given the option, even if you live in the USA, I&#x27;d recommend using a VPN end
point in your country.  The exception to this being use-case #3.  In those
instances, choose a country that&#x27;s least likely to cooperate with your
government, but understand that you may wind up drawing attention from your
local government in doing so.&lt;&#x2F;p&gt;
&lt;p&gt;My preference is to setup your own VPN server by building a
&lt;a href=&quot;https:&#x2F;&#x2F;pfsense.org&quot;&gt;pfSense&lt;&#x2F;a&gt; image on Microsoft Azure or Amazon&#x27;s AWS.  The
WebUI for pfSense if pretty easy to configure your VPN Server, even export
profiles to use in your devices.&lt;&#x2F;p&gt;
&lt;p&gt;If that&#x27;s too technical for you, I&#x27;d look into VPN providers with a history of
transparency who are vocal and active with their privacy protections.  Some
signs a VPN Provider takes privacy seriously:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Zero logging policy - This is almost always a lie as logs are necessary for
support, so ask about what is logged and how long it stays on disk.&lt;&#x2F;li&gt;
&lt;li&gt;Warrant Canary - The government doesn&#x27;t allow companies to advertise when a
warrant has been served.  However, librarians came up with a clever system
called a &amp;quot;warrant canary.&amp;quot;  It&#x27;s a notice posted stating &amp;quot;No warrants have
been served in the past X days.&amp;quot;  When a warrant is served, that posting is
removed.  It&#x27;s a legal grey area and any service that takes privacy
seriously will have one.&lt;&#x2F;li&gt;
&lt;li&gt;Accept BitCoin for payment.  I don&#x27;t have enough time to talk about BitCoin
here, but if you want to protect your billing information, you need to use
BitCoin.&lt;&#x2F;li&gt;
&lt;li&gt;Disclosure of the laws they operate under and where there servers are
physically located.&lt;&#x2F;li&gt;
&lt;li&gt;Do they provide DNS service?  If not, then you&#x27;re still leaking data over
your ISP.&lt;&#x2F;li&gt;
&lt;li&gt;Disclosure of all third party services.  Who do they use to send you mail
when your bill is due?  Are they using a third party solutions for
monitoring, instrumentation, or operations?&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;I wish I could tell you, &amp;quot;use this service,&amp;quot; but it&#x27;s not that simple.
There&#x27;s a lot to consider when making this choice and the answer depends on
your expectations and your comfort levels.  I believe the only way to be
certain is to run your own VPN service, but with the rest of the tooling I
mentioned in place to protect you.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;abandon-all-hope&quot;&gt;Abandon all hope..&lt;&#x2F;h2&gt;
&lt;p&gt;I probably bummed you out.  I&#x27;m not sorry.  I&#x27;ve been watching privacy erode
on the internet for the last 20 years.  It&#x27;s hard to do privacy right on the
internet.  You&#x27;re often an accidental mouse-click away from blowing all
your protections.  Hopefully this helps you navigate a bit smarter and
understand a bit better that anyone purporting to sell you privacy on the
internet is just blowing smoke up your ass.&lt;&#x2F;p&gt;
&lt;p&gt;Stay safe, my friends.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;translations&quot;&gt;Translations&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Karolin Lohmus graciously translated &lt;a href=&quot;https:&#x2F;&#x2F;www.espertoautoricambi.it&#x2F;science&#x2F;2017&#x2F;08&#x2F;03&#x2F;vpnide-ja-eraelu-puutumatuse-kohta-internetis&#x2F;&quot;&gt;this article into Estonian&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Artur Weber &amp;amp; Adelina Domingos graciously translated &lt;a href=&quot;https:&#x2F;&#x2F;www.homeyou.com&#x2F;~edu&#x2F;privacidade-na-internet&quot;&gt;this article into Portuguese&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>In which he authors a book on OSSEC</title>
		<published>2013-08-04T00:00:00+00:00</published>
		<updated>2013-08-04T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/ossec-book/" type="text/html"/>
		<id>https://divisionbyzero.net/ossec-book/</id>
		<content type="html">&lt;p&gt;In 2004, when I was starting a new job at the &lt;a href=&quot;http:&#x2F;&#x2F;www.grc.nia.nih.gov&quot;&gt;National Institute on
Aging&#x27;s Intramural Research Program&lt;&#x2F;a&gt; I began
evaluating products to meet
&lt;a href=&quot;http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Federal_Information_Security_Management_Act_of_2002&quot;&gt;FISMA&lt;&#x2F;a&gt; requirements for
file integrity monitoring.  We already purchased a copy of Tripwire, but I
was being driven mad by the volume of alerting from the system.  I wanted
something open source.  I wanted something that would save me time, rather
than waste 2 hours a day clicking through a GUI confirming file changes
caused by system updates and daily operations.&lt;&#x2F;p&gt;
&lt;p&gt;At the time, I found two projects:
&lt;a href=&quot;http:&#x2F;&#x2F;www.la-samhna.de&#x2F;samhain&#x2F;&quot;&gt;Samhain&lt;&#x2F;a&gt; and
&lt;a href=&quot;http:&#x2F;&#x2F;www.ossec.net&quot;&gt;OSSEC-HIDS&lt;&#x2F;a&gt;.  Samhain is a great project that does
one thing and does that one thing very well.  However, I was buried in a
mountain of FISMA compliance requirements and OSSEC offered more than file
integrity monitoring; OSSEC offered a framework for distributed analysis of
logs, file changes, and other anomalous events in the same open source
project.&lt;&#x2F;p&gt;
&lt;p&gt;I now work at &lt;a href=&quot;http:&#x2F;&#x2F;www.booking.com&quot;&gt;Booking.com&lt;&#x2F;a&gt; and manage one of the
world&#x27;s largest distributions of OSSEC-HIDS.  My team and I are active
contributors to the OSSEC Community.  After nearly a decade of experience
deploying, managing, and extracting value from OSSEC, I was approached to
write a book introducing new users to OSSEC.  After 6 months of work, the
book has been published!&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;http:&#x2F;&#x2F;www.amazon.com&#x2F;gp&#x2F;product&#x2F;1782167641&#x2F;ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=1782167641&amp;amp;linkCode=as2&amp;amp;tag=edgofsan0a-20&amp;amp;linkId=NJYGUMGEKPI2NKOU&quot;&gt;Instant OSSEC Host-based Intrusion
Detection&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;writing-a-book-is-hard-kids&quot;&gt;Writing a book is hard, kids.&lt;&#x2F;h2&gt;
&lt;p&gt;I write a lot.  I thought, &amp;quot;writing a book would be fun and easy!&amp;quot;  And, in
some ways it is.   It&#x27;s also very challenging writing for an editor.  It&#x27;s
also very humbling to receive feedback on your book from the technical
reviewers.  Most of my writing, never makes it anywhere public.  Even the
writing that ends up on my blog generates very little feedback.  Now,
imagine someone scrutinizing every single technical detail 50 pages of your
writing.  Every sentence fact checked and marked up.  Every mistake you made
highlighted and commented by multiple technical reviewers.  That&#x27;s hard.&lt;&#x2F;p&gt;
&lt;p&gt;It doesn&#x27;t stop there.  From the first draft, the editor points out every
instance of awkward wording or incorrect grammar.  Once your editors and
technical reviewers are satisfied, the book moves on to the Technical
Editor.  This person is responsible for ensuring language consistency and
compliance to the conventions of the dialect of which the book is written.
If you&#x27;re not comfortable with your work being torn apart and reassembled,
writing a book may not be for you.  It was grueling at times.&lt;&#x2F;p&gt;
&lt;p&gt;It does help to have a good group of people involved in the process.  I was
thankful to have a great team of editors and reviewers for this book at
&lt;a href=&quot;http:&#x2F;&#x2F;www.packtpub.com&quot;&gt;Packt Publishing&lt;&#x2F;a&gt;.  Now, that the job is done
(&lt;em&gt;side note: the job is never done&lt;&#x2F;em&gt;), I am grateful to have been given this
opportunity.  It was a very rewarding experience, and like all rewarding
experiences, it wasn&#x27;t easy.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ossec-hids-and-so-can-you&quot;&gt;OSSEC-HIDS and so can you!&lt;&#x2F;h2&gt;
&lt;p&gt;So, what&#x27;s this OSSEC-HIDS this book is all about?  I, like most of you,
struggle with putting my logging data to good use.  I have several posts on
this blog dedicated to extracting value from logging.  I also have
compliance requirements, previously FISMA, currently PCI-DSS.  OSSEC is an
open source security project which incorporates a number of useful features
that make it worth a look.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Distributed log analysis&lt;&#x2F;li&gt;
&lt;li&gt;File integrity monitoring&lt;&#x2F;li&gt;
&lt;li&gt;Rootkit detection&lt;&#x2F;li&gt;
&lt;li&gt;Policy auditing&lt;&#x2F;li&gt;
&lt;li&gt;Alerting and active response system&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;All of these features interoperate in the same event system, providing
richer context in an infinitely customizable environment.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;distributed-log-analysis&quot;&gt;Distributed log analysis&lt;&#x2F;h3&gt;
&lt;p&gt;OSSEC runs best in a client&#x2F;server model.  You can have hybrid deployments
to aggregate and analyze events at different points in your network.  The
clients run as agents, propagating log data to the servers.  The servers
than evaluate the log data using decoders to extract data and rules to
analyze the data.&lt;&#x2F;p&gt;
&lt;p&gt;OSSEC ships with an impressive list of decoders and rules capable of
analyzing and classifying events from standard system services including,
but not limited to : ssh, ftp, email, web servers, LDAP, ActiveDirectory,
Windows EventLogs, Windows Registry, Cisco PIX, Juniper NetScreens, and most
Linux&#x2F;BSD&#x2F;Solaris system daemons and kernel messaging.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;a href=&quot;http:&#x2F;&#x2F;jentalkstoomuch.blogspot.com&#x2F;2010&#x2F;09&#x2F;writing-custom-ossec-rules-for-your.html&quot;&gt;OSSEC decoders and rules are
extensible&lt;&#x2F;a&gt;,
so if your use case isn&#x27;t addressed it&#x27;s easy to add.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;file-integrity-monitoring&quot;&gt;File integrity monitoring&lt;&#x2F;h3&gt;
&lt;p&gt;If you work in IT security, you hate file integrity monitoring.  It&#x27;s
because of products like Tripwire, which do file integrity monitoring, but
are so loud and obnoxious about it, that the value is drowned in a sea of
unending harassment of correct, but uninteresting events.  I desperately
wanted out of what I found largely unactionable alerts and incessant
pointing and clicking.&lt;&#x2F;p&gt;
&lt;p&gt;OSSEC&#x27;s implementation of FIM is similar to every other product in this
space.  You can monitor changes in modification times, access times,
checksums (MD5 and SHA1), owners, groups, and permissions or any combination
of those attributes.  OSSEC&#x27;s file integrity events occur inside the same
framework as the log analysis, so it&#x27;s possible to write rules to evaluate
and correlate them.  It&#x27;s even possible to fire off scripts when they occur
to perform validation against other databases.&lt;&#x2F;p&gt;
&lt;p&gt;Just this week I discovered this post on &lt;a href=&quot;http:&#x2F;&#x2F;blog.rootshell.be&#x2F;2013&#x2F;05&#x2F;13&#x2F;improving-file-integrity-monitoring-with-ossec&#x2F;&quot;&gt;Improving file integrity
monitoring with
OSSEC&lt;&#x2F;a&gt;
that allows the servers to keep a validated list of checksums to ignore!
It&#x27;s also possible to audit the Windows Registry for changes using OSSEC&#x27;s
file integrity daemon.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;rootkit-detection&quot;&gt;Rootkit detection&lt;&#x2F;h3&gt;
&lt;p&gt;Rootkit detection by itself isn&#x27;t that valuable.  In the context of the
logging events and file integrity data, it can be &lt;strong&gt;invaluable&lt;&#x2F;strong&gt; in quickly
identifying compromised servers.  OSSEC uses a rootkit and trojan&#x27;d database
of file checksums to detect bad files on your system.  It also performs
checks similar to
&lt;a href=&quot;http:&#x2F;&#x2F;freecode.com&#x2F;projects&#x2F;chkrootkit&quot;&gt;chkrootkit&lt;&#x2F;a&gt; or
&lt;a href=&quot;http:&#x2F;&#x2F;aide.sourceforge.net&quot;&gt;AIDE&lt;&#x2F;a&gt; by looking for hidden files, programs,
or open ports and checking for out of place files and directories.&lt;&#x2F;p&gt;
&lt;p&gt;Again, the events occur inside the OSSEC framework and can be customized
using the OSSEC rules.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;policy-auditing&quot;&gt;Policy auditing&lt;&#x2F;h3&gt;
&lt;p&gt;If you find yourself in an environment covered by a set of IT regulations,
you may have to perform audits of critical daemons configurations to verify
you are implementing &amp;quot;Industry Best Practices.&amp;quot;  While, as a rational human
being, the phrase &amp;quot;industry best practices&amp;quot; makes my skin crawl, security
professionals need to establish and ensure compliance with a security
baseline.  OSSEC provides a policy auditing framework which runs alongside
it&#x27;s file integrity monitoring daemon to look for both wanted and unwanted
phrases in configuration files.  OSSEC ships with policies for auditing
according to the &lt;a href=&quot;http:&#x2F;&#x2F;benchmarks.cisecurity.org&#x2F;downloads&#x2F;&quot;&gt;Security Benchmarks from the Center for Internet
Security&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;alerting-and-active-response&quot;&gt;Alerting and active response&lt;&#x2F;h3&gt;
&lt;p&gt;The real power comes from the flexibility of OSSEC&#x27;s alerting system.  Using
rules you can modify alert levels and perform aggregate analyis.  It&#x27;s
simple to write a rule that says &amp;quot;If an IP fails logins 5 times in 10 ten
minutes, alert.&amp;quot;  You may recognize this functionality as provided by the
popular &lt;a href=&quot;http:&#x2F;&#x2F;www.fail2ban.org&quot;&gt;Fail2Ban&lt;&#x2F;a&gt; project.  The real power is
OSSEC&#x27;s analysis can occur across &lt;em&gt;every&lt;&#x2F;em&gt; device on your network.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;http:&#x2F;&#x2F;www.ossec.net&#x2F;doc&#x2F;syntax&#x2F;head_rules.html&quot;&gt;A detailed overview of the rules syntax can be found on the OSSEC
site&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;With the OSSEC active response system, it&#x27;s possible to implement the
Fail2Ban functionality and blacklist IP&#x27;s that are misbehaving.  Active
response is a fancy way of saying &amp;quot;when an alert is triggered, run a script
somewhere.&amp;quot;  You can specify running the script on the agent that generated
the alarm, a specific agent on your network, or &lt;em&gt;every&lt;&#x2F;em&gt; agent on your
network.&lt;&#x2F;p&gt;
&lt;p&gt;The active response system is incredibly powerful and flexible.  I encourage
you to check the project out, whether or not you buy my book.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;updates&quot;&gt;Updates&lt;&#x2F;h3&gt;
&lt;p&gt;This post was graciously &lt;a href=&quot;https:&#x2F;&#x2F;indepthguide.com&#x2F;translations&#x2F;#doc-ru:in-which-he-authors-a-book-on-ossec&quot;&gt;translated into
Russian&lt;&#x2F;a&gt;
by a volunteer transalation team.&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>ElasticSearch for Logging</title>
		<published>2012-12-26T00:00:00+00:00</published>
		<updated>2012-12-26T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/elasticsearch-for-logging/" type="text/html"/>
		<id>https://divisionbyzero.net/elasticsearch-for-logging/</id>
		<content type="html">&lt;p&gt;We use &lt;a href=&quot;http:&#x2F;&#x2F;elasticsearch.org&quot;&gt;ElasticSearch&lt;&#x2F;a&gt; at my job for web front-end
searches.  Performance is critical, and for our purposes, the data is mostly
static.  We update the search indexes daily, but have no problems running on
old indexes for weeks.  The majority of the traffic to this cluster is
search; it is a &amp;quot;read heavy&amp;quot; cluster.  We had some performance hiccups at
the beginning, but we worked closely with Shay Bannon of ElasticSearch to
eliminate those problems.  Now our front end clusters are very reliable,
resilient, and fast.&lt;&#x2F;p&gt;
&lt;p&gt;I am now working to implement a centralized logging infrastructure that
meets compliance requirements, but is also useful.  The goal of the logging
infrastructure is to emulate as much of the Splunk functionality as
possible.  My &lt;a href=&quot;http:&#x2F;&#x2F;edgeofsanity.net&#x2F;article&#x2F;2012&#x2F;06&#x2F;17&#x2F;central-logging-with-open-source-software.html&quot;&gt;previous write-up on
logging&lt;&#x2F;a&gt;
explains why we decided against &lt;a href=&quot;http:&#x2F;&#x2F;splunk.com&quot;&gt;Splunk&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;After evaluating a number of options, I&#x27;ve decided to utilize ElasticSearch
as the storage back-end for that system.  This type of cluster is &lt;strong&gt;very
different&lt;&#x2F;strong&gt; from the cluster we&#x27;ve implemented for heavy search loads.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h3 id=&quot;translations&quot;&gt;Translations&lt;&#x2F;h3&gt;
&lt;p&gt;A &lt;a href=&quot;http:&#x2F;&#x2F;www.everycloudtech.com&#x2F;elasticsearch-for-logging&quot;&gt;Russian translation&lt;&#x2F;a&gt; of this post provided by &lt;a href=&quot;http:&#x2F;&#x2F;www.everycloudtech.com&#x2F;&quot;&gt;EveryCloud&lt;&#x2F;a&gt;.&lt;br &#x2F;&gt;
A &lt;a href=&quot;http:&#x2F;&#x2F;fangpeishi.com&#x2F;elasticsearch-for-logging_zh.html&quot;&gt;Chinese translation&lt;&#x2F;a&gt; of this post provided by &lt;a href=&quot;http:&#x2F;&#x2F;fangpeishi.com&#x2F;&quot;&gt;FangPeishi&lt;&#x2F;a&gt;.&lt;br &#x2F;&gt;
A &lt;a href=&quot;http:&#x2F;&#x2F;www.opensourceinitiative.net&#x2F;edu&#x2F;elasticsearch-for-logging&quot;&gt;Ukrainian translation&lt;&#x2F;a&gt; of this post provided by &lt;a href=&quot;http:&#x2F;&#x2F;www.opensourceinitiative.net&#x2F;&quot;&gt;Open Source Initiative&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;index-layouts&quot;&gt;Index Layouts&lt;&#x2F;h2&gt;
&lt;p&gt;The two popular open source log routing systems are
&lt;a href=&quot;http:&#x2F;&#x2F;graylog2.org&quot;&gt;Graylog2&lt;&#x2F;a&gt; and &lt;a href=&quot;http:&#x2F;&#x2F;logstash.net&quot;&gt;LogStash&lt;&#x2F;a&gt;.  As of
this writing, the &lt;strong&gt;stable&lt;&#x2F;strong&gt; Graylog2 release supports only writing&#x2F;reading
from a single index.  As I pointed out in a prior article, this presents
&lt;strong&gt;enormous&lt;&#x2F;strong&gt; scaling issues for Graylog2.  The 0.10.0 release of Graylog2
will include the ability to index to multiple indexes.  However, my
experience has been with LogStash indexes as that was the only scalable
option in the past.&lt;&#x2F;p&gt;
&lt;p&gt;In order to get the most out of ElasticSearch for logging, you need to use
multiple indexes.  There are a few ways to handle when to rollover the
index, but LogStash&#x27;s default automatic daily rotation turns out to make the
most sense.  So, you&#x27;ll have something like:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;logstash-2012.12.19&lt;&#x2F;li&gt;
&lt;li&gt;logstash-2012.12.20&lt;&#x2F;li&gt;
&lt;li&gt;logstash-2012.12.21&lt;&#x2F;li&gt;
&lt;li&gt;logstash-THE_WORLD_HAS_ENDED&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;You could keep track of how many documents are in each index.  Then roll after
after a million or billion or whatever arbitrary number you decide, but
you&#x27;re just creating more work for yourself later.  There are some edge
cases where other indexing schemes maybe more efficient, but for most users,
an index a day is the simplest, most efficient use of your resources.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;get-serious-or-go-home&quot;&gt;Get serious, or go home.&lt;&#x2F;h2&gt;
&lt;p&gt;Both LogStash and Graylog2 ship with built-in ElasticSearch implementations.
This is great for demonstration or development purposes.  &lt;em&gt;&lt;strong&gt;DO NOT USE THIS
BUILT-IN SERVER FOR REAL PURPOSES!&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;  I am surprised by the number of
LogStash and Graylog2 users ending up in #elasticsearch on irc.freenode.org
who are using the built-in ElasticSearch storage engine and surprised that
it falls over!&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Run a standalone ElasticSearch Cluster!&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;You will need separate hardware for this.  Java applications like LogStash
and ElasticSearch are memory and disk-cache intensive.  Commit the hardware
to the log processing boxes and &lt;strong&gt;separate&lt;&#x2F;strong&gt; boxes to the ElasticSearch
cluster.  Java has some weird issues with memory.  We&#x27;ve found that you
don&#x27;t want to go past 32 GB of RAM dedicated to ElasticSearch and reserve
atleast 8 GB to the OS for file-system caching.&lt;&#x2F;p&gt;
&lt;p&gt;My cluster is handling ~60 GB of log data a day in my development
environment with 3 search nodes at 24 GB of RAM each and is underwhelmed.
This brings up the next question, &lt;em&gt;how many servers for my cluster?&lt;&#x2F;em&gt;   Start
with 3 servers in your ElasticSearch cluster.  This gives you the
flexibility to shutdown a server and maintain full use of your cluster.  You
can always add more hardware!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;installing-elasticsearch&quot;&gt;Installing ElasticSearch&lt;&#x2F;h2&gt;
&lt;p&gt;I&#x27;m not going to cover installing ElasticSearch, you can &lt;a href=&quot;http:&#x2F;&#x2F;www.elasticsearch.org&#x2F;guide&#x2F;reference&#x2F;setup&#x2F;installation.html&quot;&gt;read more about
it&lt;&#x2F;a&gt; on
the documentation site.  You may even decided to utilize the .deb or
possibly &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;tavisto&#x2F;elasticsearch-rpms&quot;&gt;roll an rpm&lt;&#x2F;a&gt; and
create a recipe for managing ElasticSearch with
&lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;Aethylred&#x2F;puppet-elasticsearch&quot;&gt;Puppet&lt;&#x2F;a&gt; or
&lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;karmi&#x2F;cookbook-elasticsearch&quot;&gt;Chef&lt;&#x2F;a&gt;.  The only thing I
will say about installation, is despite how much it hurts, it&#x27;s best to run
ElasticSearch under the &lt;a href=&quot;http:&#x2F;&#x2F;www.java.com&#x2F;en&#x2F;download&#x2F;index.jsp&quot;&gt;Sun
JVM&lt;&#x2F;a&gt;.  This is how the developers
of ElasticSearch run ElasticSearch and so can you!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;elasticsearch-configuration-os-and-java&quot;&gt;ElasticSearch Configuration: OS and Java&lt;&#x2F;h2&gt;
&lt;p&gt;There are some things you &lt;em&gt;really&lt;&#x2F;em&gt; need to configure on the host system.
I&#x27;m assuming you&#x27;re running Linux as the host system here.  You should run
ElasticSearch as an unprivileged user.  My cluster runs as the
&#x27;elasticsearch&#x27; user, so we tweak the kernel limits on processes and memory
in &#x27;&#x2F;etc&#x2F;security&#x2F;limits.conf&#x27;:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Ensure ElasticSearch can open files and lock memory!
&lt;&#x2F;span&gt;&lt;span&gt;elasticsearch   soft    nofile          65536
&lt;&#x2F;span&gt;&lt;span&gt;elasticsearch   hard    nofile          65536
&lt;&#x2F;span&gt;&lt;span&gt;elasticsearch   -       memlock         unlimited
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You should also configure ElasticSearch&#x27;s minimum and maximum pool of memory
be set to the &lt;em&gt;same value&lt;&#x2F;em&gt;.  This takes care of all the memory allocation at
startup, so you don&#x27;t have threads waiting to get more memory from the
kernel.  I&#x27;ve built ElasticSearch on a RedHat system and have this in my
&#x27;&#x2F;etc&#x2F;sysconfig&#x2F;elasticsearch&#x27; which sets environment variables for the
daemon at startup:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Allocate 14 Gigs of RAM
&lt;&#x2F;span&gt;&lt;span&gt;ES_MIN_MEM=14g
&lt;&#x2F;span&gt;&lt;span&gt;ES_MAX_MEM=&amp;quot;$ES_MIN_MEM&amp;quot;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This file is managed by Puppet and sets the memory equal to 50% of the RAM +
2 gigs.  This isn&#x27;t rocket science, and it&#x27;s covered in &lt;strong&gt;every&lt;&#x2F;strong&gt;
ElasticSearch tuning guide.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;elasticsearch-configuration-elasticsearch-yml&quot;&gt;ElasticSearch Configuration: elasticsearch.yml&lt;&#x2F;h2&gt;
&lt;p&gt;There are some things we can tune in the &#x27;elasticsearch.yml&#x27; file which will
dramatically improve performance for write-heavy nodes.  The first is to set
&lt;code&gt;bootstrap.mlockall&lt;&#x2F;code&gt; to true.  This forces the JVM to allocate all of
&lt;code&gt;ES_MIN_MEM&lt;&#x2F;code&gt; immediately.  This means Java has all the memory it needs at
start up!  Another concern of a write heavy cluster is the imbalance of
memory allocating to the indexing&#x2F;bulk engine.&lt;&#x2F;p&gt;
&lt;p&gt;ElasticSearch is assuming you&#x27;re going to be using it mostly for searches,
so the majority of your memory allocation is safe guarded for those
searches.  This isn&#x27;t the case with this cluster, so by tweaking
&lt;code&gt;indices.memory.index_buffer_size&lt;&#x2F;code&gt; to 50% we can restore the balance we need
for this use case.  In my setup, I also up the refresh interval and the
transaction count for log flushing.  Otherwise, ElasticSearch would be
flushing the translog nearly every second.&lt;&#x2F;p&gt;
&lt;p&gt;The other thing we need to tweak to avoid catastrophic fail is the
threadpool settings.  ElasticSearch will do what it believes is best to
achieve the best performance.  We&#x27;ve found out, in production, that this can
mean spawning thousands upon thousands of threads to handle incoming
requests.  This will knock your whole cluster over quickly under heavy load.
To avoid this, we set the max number of threads per pool; search, index, and
bulk.  The majority of our operations will be bulks, so we give that 60
threads, and other operations 20.  We also set the maximum number of
requests that can queue for processing to 200 for bulk, and 100 for
everything else.  This way, if the cluster becomes overloaded it will turn
down new requests, but it will leave you enough file descriptors and PID&#x27;s
to ssh into the boxes and figure out what went wrong.&lt;&#x2F;p&gt;
&lt;p&gt;Pulling that all together, here&#x27;s my config file:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;yaml&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-yaml &quot;&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;##################################################################
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# &#x2F;etc&#x2F;elasticsearch&#x2F;elasticsearch.yml
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Base configuration for a write heavy cluster
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Cluster &#x2F; Node Basics
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;cluster.name&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;logng
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Node can have abritrary attributes we can use for routing
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;node.name&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;logsearch-01
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;node.datacenter&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;amsterdam
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Force all memory to be locked, forcing the JVM to never swap
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;bootstrap.mlockall&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;true
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;## Threadpool Settings ##
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Search pool
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;threadpool.search.type&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;fixed
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;threadpool.search.size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;20
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;threadpool.search.queue_size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;100
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Bulk pool
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;threadpool.bulk.type&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;fixed
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;threadpool.bulk.size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;60
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;threadpool.bulk.queue_size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;300
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Index pool
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;threadpool.index.type&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;fixed
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;threadpool.index.size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;20
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;threadpool.index.queue_size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;100
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Indices settings
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;indices.memory.index_buffer_size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;30%
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;indices.memory.min_shard_index_buffer_size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;12mb
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;indices.memory.min_index_buffer_size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;96mb
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Cache Sizes
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;indices.fielddata.cache.size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;15%
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;indices.fielddata.cache.expire&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;6h
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;indices.cache.filter.size&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;15%
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;indices.cache.filter.expire&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;6h
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Indexing Settings for Writes
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;index.refresh_interval&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;30s
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;index.translog.flush_threshold_ops&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;50000
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Minimum nodes alive to constitute an operational cluster
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;discovery.zen.minimum_master_nodes&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;2
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Unicast Discovery (disable multicast)
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;discovery.zen.ping.multicast.enabled&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;false
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;discovery.zen.ping.unicast.hosts&lt;&#x2F;span&gt;&lt;span&gt;: [ &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;logsearch-01&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;logsearch-02&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;logsearch-03&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;]
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;elasticsearch-configuration-index-templates&quot;&gt;ElasticSearch Configuration: Index Templates&lt;&#x2F;h2&gt;
&lt;p&gt;As I stated, I developed this cluster based on LogStash due to the short
comings of the Graylog2 implementation at the time.  This section will
contain the word &amp;quot;logstash&amp;quot;, but you can easily adapt this to a Graylog2 or
homemade index mapping.&lt;&#x2F;p&gt;
&lt;p&gt;Since we&#x27;ve decided to create an index a day, there&#x27;s two ways to configure
the mapping and features of each index.  We can either create the indexes
explicitly with the settings we want, or we can use a template such that any
index created implicitly by writing data to it, has the features and
configurations we want!  Templates make the most sense in this case, you
we&#x27;ll create them on the now running cluster!&lt;&#x2F;p&gt;
&lt;p&gt;My template settings are:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;javascript&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-javascript &quot;&gt;&lt;code class=&quot;language-javascript&quot; data-lang=&quot;javascript&quot;&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;template&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;logstash-*&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;,
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;settings&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt; : {
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index.number_of_shards&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;3&lt;&#x2F;span&gt;&lt;span&gt;,
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index.number_of_replicas&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;,
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index.query.default_field&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@message&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;,
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index.routing.allocation.total_shards_per_node&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;2&lt;&#x2F;span&gt;&lt;span&gt;,
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index.auto_expand_replicas&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;false
&lt;&#x2F;span&gt;&lt;span&gt;    },
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;mappings&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: {
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;_default_&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: {
&lt;&#x2F;span&gt;&lt;span&gt;            &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;_all&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;enabled&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;false &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;            &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;_source&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;compress&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;false &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;            &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;dynamic_templates&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: [
&lt;&#x2F;span&gt;&lt;span&gt;                {
&lt;&#x2F;span&gt;&lt;span&gt;                    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;fields_template&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: {
&lt;&#x2F;span&gt;&lt;span&gt;                        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;mapping&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;type&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;string&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;not_analyzed&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;                        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;path_match&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@fields.*&amp;quot;
&lt;&#x2F;span&gt;&lt;span&gt;                    }
&lt;&#x2F;span&gt;&lt;span&gt;                },
&lt;&#x2F;span&gt;&lt;span&gt;                {
&lt;&#x2F;span&gt;&lt;span&gt;                    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;tags_template&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: {
&lt;&#x2F;span&gt;&lt;span&gt;                        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;mapping&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;type&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;string&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;not_analyzed&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;                        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;path_match&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@tags.*&amp;quot;
&lt;&#x2F;span&gt;&lt;span&gt;                    }
&lt;&#x2F;span&gt;&lt;span&gt;                }
&lt;&#x2F;span&gt;&lt;span&gt;            ],
&lt;&#x2F;span&gt;&lt;span&gt;            &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;properties&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: {
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@fields&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;type&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;object&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;dynamic&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;true&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;path&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;full&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@source&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;type&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;string&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;not_analyzed&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@source_host&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;type&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;string&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;not_analyzed&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@source_path&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;type&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;string&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;not_analyzed&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@timestamp&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;type&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;date&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;not_analyzed&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@type&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;type&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;string&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;index&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;not_analyzed&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;},
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;@message&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: { &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;type&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;string&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;analyzer&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;: &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;whitespace&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;             }
&lt;&#x2F;span&gt;&lt;span&gt;        }
&lt;&#x2F;span&gt;&lt;span&gt;    }
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;To apply the settings to the cluster, we create or update the template with
a PUT:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;curl -XPUT &amp;#39;http:&#x2F;&#x2F;localhost:9200&#x2F;_template&#x2F;template_logstash&#x2F;&amp;#39; -d @logstash-template.json
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Setting to the template to &lt;code&gt;logstash-*&lt;&#x2F;code&gt; means all new indexes created that
start with &#x27;logstash-&#x27; will have these settings applied.  I override the
default search behavior by disabling the &lt;code&gt;_all&lt;&#x2F;code&gt; fields search and set the
default attribute to &lt;code&gt;@message&lt;&#x2F;code&gt;.  This field will be the raw syslog message.
It&#x27;s also the only field that doesn&#x27;t have the analyzer disabled.  Don&#x27;t
freak out.  This is saving space and indexing time.  It means searching
other fields in the document will match using exact matches rather than
fuzzy searches, but that&#x27;s O.K.  We can still get that warm fuzzy feeling by
searching the &lt;code&gt;@message&lt;&#x2F;code&gt; field!  This will dramatically reduce the storage
size.&lt;&#x2F;p&gt;
&lt;p&gt;In previous write-ups, before ElasticSearch 0.19, you may have seen the
&lt;code&gt;&amp;quot;_source&amp;quot;: { &amp;quot;compress&amp;quot;: true }&lt;&#x2F;code&gt; attribute set.  This is not recommended
for logging data.  This attribute determines whether &lt;em&gt;each&lt;&#x2F;em&gt; document (read:
log message) is stored using compression.  As these documents tend to be
&lt;em&gt;very&lt;&#x2F;em&gt; small, compression doesn&#x27;t really save much space.  It does cost
extra processing at the time of indexing and retrieval.  It&#x27;s best to
explicitly disable compression for a logging cluster.  The setting which
enabled store compression in our &lt;code&gt;elasticsearch.yml&lt;&#x2F;code&gt; uses block level
compression which is much more efficient.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;index-settings&quot;&gt;Index Settings&lt;&#x2F;h3&gt;
&lt;p&gt;The index settings are tuned to a 3 node cluster.  We can change everything
but the &lt;code&gt;index.number_of_shards&lt;&#x2F;code&gt; on the fly if we need to grow or shrink the
cluster.  This setup isn&#x27;t exactly perfect, as we sometimes end up with
orphaned (unallocated) shards.  This is easy enough to correct by moving
shards around with &lt;a href=&quot;http:&#x2F;&#x2F;www.elasticsearch.org&#x2F;guide&#x2F;reference&#x2F;api&#x2F;admin-cluster-reroute.html&quot;&gt;the ElasticSearch
API&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Instead of replicating the entire index to the entire cluster, we add
storage capacity as we add nodes.  This way we have a &amp;quot;RAID like&amp;quot; setup for
shard allocation.  I have a 3 node cluster, and I create 3 shards per index.
This means the master or &amp;quot;write&amp;quot; shard can be balanced to one on each node.
For redundancy, I set the number of replicas to one.  This means there are 6
shards for each index.  Each node is only allowed to have 2 shards per
index.&lt;&#x2F;p&gt;
&lt;p&gt;You&#x27;ll need to experiment with these settings for your needs.  Take into
account how many nodes you can afford to lose before you lose functionality.
You&#x27;ll need to adjust the number of replicas based on that.  I&#x27;ve gone with
a simple recipe here of simply having 1 shard replica.  This means I can
only spare to have a single node out of the cluster.  So far, I&#x27;ve found
that having &lt;code&gt;number_of_replicas&lt;&#x2F;code&gt; equal to ( 2&#x2F;3 * number of nodes) - 1 to be
a good number, YMMV.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;automatically-expand-replicas&quot;&gt;Automatically Expand Replicas&lt;&#x2F;h3&gt;
&lt;p&gt;It&#x27;s also best to disable ElasticSearch&#x27;s default behavior to automatically
expand the number of replicas based on how many nodes are in the cluster.
We assume responsibility for managing this manually and gain performance,
especially when we need to stop or restart a node in the cluster.
Auto-expansion is a great feature for search-heavy indexes with small to
medium data sets.  Without reconfiguring, adding another node will increase
performance.  However, if you have a lot of data in your indexes and this
feature is enabled here&#x27;s what happens when a node restarts:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Everything is good. Number of replicas = 1.&lt;&#x2F;li&gt;
&lt;li&gt;Node A shuts down&lt;&#x2F;li&gt;
&lt;li&gt;Cluster notices node down, goes yellow
&lt;ul&gt;
&lt;li&gt;replicas = 0, expected 1&lt;&#x2F;li&gt;
&lt;li&gt;number of nodes now = 1&lt;&#x2F;li&gt;
&lt;li&gt;number of replicas expected = 0 now&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;Cluster health upgraded to green, Everything Spiffy&lt;&#x2F;li&gt;
&lt;li&gt;Node A comes back online&lt;&#x2F;li&gt;
&lt;li&gt;Cluster sends number of replicas expected and actual for all indexes&lt;&#x2F;li&gt;
&lt;li&gt;Node A realizes it&#x27;s shards are unnecessary, and deletes data&lt;&#x2F;li&gt;
&lt;li&gt;Cluster increments number of nodes, replicas expected = 1, actual = 0&lt;&#x2F;li&gt;
&lt;li&gt;Node A is  notified that number of replicas is not yet met&lt;&#x2F;li&gt;
&lt;li&gt;Node A replicates &lt;em&gt;every&lt;&#x2F;em&gt; shard back into it&#x27;s index, over the network&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;As you can see, this is less than desirable, especially with a busy cluster.
Please be aware of this behavior in production and watch your network graphs
when you add&#x2F;remove nodes from your cluster.  If you see spikes, you may
want to manage this manually.  You lose some of the magic, but you may find
it to be black magic anyways.  By disabling the auto expansion of replicas,
this happens:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Everything is good.&lt;&#x2F;li&gt;
&lt;li&gt;Node A shuts down&lt;&#x2F;li&gt;
&lt;li&gt;Cluster notices node down, cluster status yellow&lt;&#x2F;li&gt;
&lt;li&gt;Cluster health does not recover, expected replicas != actual replicas&lt;&#x2F;li&gt;
&lt;li&gt;Node A comes back up&lt;&#x2F;li&gt;
&lt;li&gt;Cluster sends number of replicas expected and actual for all indexes&lt;&#x2F;li&gt;
&lt;li&gt;Node A notifies cluster that it has copies of shards&lt;&#x2F;li&gt;
&lt;li&gt;Cluster expected and actual replicas now equal, health green&lt;&#x2F;li&gt;
&lt;li&gt;Cluster checksums the shards and replicates any out-of-date shards to Node A&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This is what most people expect the cluster to do by default, but the logic
involved in determining cluster state makes it difficult to accomplish.
Again, the magic behavior of &lt;code&gt;auto_expand_replicas&lt;&#x2F;code&gt; makes sense in most use
cases, but not in our case.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;maintenance-and-monitoring&quot;&gt;Maintenance and Monitoring&lt;&#x2F;h2&gt;
&lt;p&gt;I wrote a &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;reyjrar&#x2F;es-utils&quot;&gt;few scripts&lt;&#x2F;a&gt; for working
with ElasticSearch in a production environment.  It includes exporting
metrics to Graphite and Cacti.  There is also a Nagios monitoring check
which is very configurable.  We use these utilities to keep track of the
performance and health of our various clusters including the logging
cluster.  I&#x27;ll be updating that in the next few days to include my logstash
index maintenance script.&lt;&#x2F;p&gt;
&lt;p&gt;As you write your data to the log cluster, ElasticSearch is creating Lucence
indexes of the log messages in the background.  There is a buffer of
incoming documents and based on your settings, that data is flushed to a
Lucene index.  Lucene indexes are expensive to create&#x2F;update, but &lt;strong&gt;fast&lt;&#x2F;strong&gt;
to search.  This means a single shard may contain hundreds of Lucence
indexes, often referred to as segments.  These segments can each be searched
quickly, but only one can be processed per thread.  This can begin to have
negative effect on performance.  We have seen a 10% degradation in search
speed with indexes with 20+ segments.&lt;&#x2F;p&gt;
&lt;p&gt;Luckily, ElasticSearch provides &lt;a href=&quot;http:&#x2F;&#x2F;www.elasticsearch.org&#x2F;guide&#x2F;reference&#x2F;api&#x2F;admin-indices-optimize.html&quot;&gt;an API for optimizing the Lucene
segments&lt;&#x2F;a&gt;.
You shouldn&#x27;t optimize an index that&#x27;s currently indexing data.  The new
data will just create more segments on those shards.  So how do we know that
we&#x27;re done writing to an index?  Well, if you remember, I recommended using
daily indexes.  This means, you can run a cron job daily (or hourly) to
check for any indexes with yesterday&#x27;s date or older and make sure they&#x27;re
optimized (or &lt;code&gt;max_num_segments = 1&lt;&#x2F;code&gt;).  If you&#x27;ve chosen some other schema
for creating index names, you&#x27;ve just created more work for yourself.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;future-explorations&quot;&gt;Future explorations&lt;&#x2F;h2&gt;
&lt;p&gt;This post is substantially longer than I expected.  I&#x27;m just scratching the
surface on the design and implementation of ElasticSearch clusters for
logging data.  My cluster will be moving from development into production
soon (thought it currently provides production functionality).  When I do,
I&#x27;m going to face some additional challenges and I have a notebook full of
ideas on how to structure indexes and the cluster to handle the load and
some of the privacy-related problems that arise when you suddenly provide
simple, fast access to massive amounts of data.&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>OSSEC HIDS Extension - Accumulator</title>
		<published>2012-11-26T00:00:00+00:00</published>
		<updated>2012-11-26T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/ossec-hids-accumulator/" type="text/html"/>
		<id>https://divisionbyzero.net/ossec-hids-accumulator/</id>
		<content type="html">&lt;p&gt;If you haven&#x27;t looked at &lt;a href=&quot;http:&#x2F;&#x2F;ossec.net&quot;&gt;OSSEC HIDS&lt;&#x2F;a&gt;, here&#x27;s the overview:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;OSSEC is a scalable, multi-platform, open source Host-based Intrusion
Detection System (HIDS). It has a powerful correlation and analysis engine,
integrating log analysis, file integrity checking, Windows registry
monitoring, centralized policy enforcement, rootkit detection, real-time
alerting and active response.&lt;&#x2F;p&gt;
&lt;p&gt;It runs on most operating systems, including Linux, OpenBSD, FreeBSD, MacOS,
Solaris and Windows.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;OSSEC is a great product, but I ran into an issue when attempting to fulfill
a require for PCI-DSS which involved reviewing our LDAP logs.  I &lt;em&gt;knew&lt;&#x2F;em&gt;
OSSEC would make this simple.  I started writing a rule and realized I had
hit a significant roadblock.  OpenLDAP logs events as they happen and only
logs data relevant to that particular event.  A connect event has the ports
and IPs, and the bind event contains the username, but only the connection
id is the same in the two events.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;Out of the box, OSSEC can handle multiline events &lt;strong&gt;if&lt;&#x2F;strong&gt; those events are a
fixed number of sequential lines (which Windows Event Logging).
Unfortunately, the OpenLDAP logs were not fixed line and the ability to
alert on multiple unsuccessful logins from the same ip is not available for
this type of logging demonstrated below:&lt;&#x2F;p&gt;
&lt;p&gt;A connection is established:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;Jan 11 09:26:57 hostname slapd[20872]: conn=999999 fd=64 ACCEPT from IP=10.1.2.37:33957 (IP=10.1.2.2:389)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A bind (or login event):&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;Jan 11 09:26:57 hostname slapd[20872]: conn=999999 op=0 BIND dn=&amp;quot;uid=example,ou=People,dc=example,dc=com&amp;quot; method=128
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;An unsuccessful login:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;Jan 11 09:26:57 hostname slapd[20872]: conn=999999 op=0 RESULT tag=97 err=49 text=
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A retry:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;Jan 11 09:26:57 hostname slapd[20872]: conn=999999 op=1 BIND dn=&amp;quot;uid=example,ou=People,dc=example,dc=com&amp;quot; method=128
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Success:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;Jan 11 09:26:57 hostname slapd[20872]: conn=999999 op=1 RESULT tag=97 err=0 text=
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Connection is closed:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;Jan 11 09:26:57 hostname slapd[20872]: conn=999999 op=2 UNBIND
&lt;&#x2F;span&gt;&lt;span&gt;Jan 11 09:26:57 hostname slapd[20872]: conn=999999 fd=64
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You can see that all the events have the same &lt;code&gt;conn=999999&lt;&#x2F;code&gt; in the log
event.  My accumulator &lt;a href=&quot;https:&#x2F;&#x2F;gist.github.com&#x2F;4150352&quot;&gt;patch&lt;&#x2F;a&gt; is currently
merged into my &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;reyjrar&#x2F;ossec-hids&#x2F;&quot;&gt;OSSEC GitHub Repo&lt;&#x2F;a&gt;
allows events to accumulate data using this connection id by means of a
decoder extension.&lt;&#x2F;p&gt;
&lt;p&gt;The patch replaces the standard &lt;code&gt;openldap&lt;&#x2F;code&gt; decoder with my decoder
extension, which is as simple as adding an &lt;code&gt;&amp;lt;accumulate&#x2F;&amp;gt;&lt;&#x2F;code&gt; tag to every
decoded event you&#x27;d like to accumulate data.  Here&#x27;s the relevant section
from &lt;code&gt;&#x2F;var&#x2F;ossec&#x2F;etc&#x2F;decoders.xml&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;xml&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-xml &quot;&gt;&lt;code class=&quot;language-xml&quot; data-lang=&quot;xml&quot;&gt;&lt;span&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoder &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;name&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;openldap&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;program_name&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;^slapd&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;program_name&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;accumulate&lt;&#x2F;span&gt;&lt;span&gt;&#x2F;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoder&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoder &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;name&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;openldap-connect&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;parent&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;openldap&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;parent&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;prematch&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;ACCEPT&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;prematch&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;regex&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;^conn=(\d+) fd=\d+ ACCEPT from IP=(\S+):&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;regex&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;order&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;id, srcip&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;order&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;accumulate&lt;&#x2F;span&gt;&lt;span&gt;&#x2F;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoder&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoder &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;name&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;openldap-bind&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;parent&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;openldap&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;parent&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;prematch&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;BIND &amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;prematch&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;regex&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;^conn=(\d+) op=\d+ BIND dn=&amp;quot;\w+=(\w+),&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;regex&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;order&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;id, dstuser&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;order&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;accumulate&lt;&#x2F;span&gt;&lt;span&gt;&#x2F;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoder&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoder &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;name&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;openldap-result&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;accumulate&lt;&#x2F;span&gt;&lt;span&gt;&#x2F;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;parent&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;openldap&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;parent&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;prematch&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt; RESULT &amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;prematch&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;regex&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;^conn=(\d+) op=\d+ RESULT &amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;regex&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;order&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;id&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;order&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoder&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;In the above decoder, when the connect event happens, we start an accumulate
cache for the event with a key of &lt;em&gt;&amp;quot;hostname openldap id&amp;quot;&lt;&#x2F;em&gt; which stores srcip.
The subsequently decoded BIND event uses the same key and adds &amp;quot;dstuser&amp;quot; to the
&lt;em&gt;&amp;quot;hostname openldap id&amp;quot;&lt;&#x2F;em&gt; cache.  When we get to a result line, that line pulls
both those values from the accumulator cache and they will be available for the
rules to trigger using a &lt;code&gt;&amp;lt;sameip&#x2F;&amp;gt;&lt;&#x2F;code&gt; aggregation.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;Accumulate()&lt;&#x2F;code&gt; requires an id to be extracted by the decoder, and uses that
recurring id to add data to events as they are parsed using an in memory cache.
The default, right now only configurable at build, is to keep the data in
memory for up to 5 minutes.  Old entries are expired from the hash every 100
lookups or 10 minutes, whichever happens first.&lt;&#x2F;p&gt;
&lt;p&gt;Using this patch, as new data comes in, that data is then available to be used
by the rule engine for smarter rules.  Here&#x27;s an example using the above decoder
configuraiton to alert with 5 unsuccessful login attempts in 60 minutes from the
same IP address:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;xml&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-xml &quot;&gt;&lt;code class=&quot;language-xml&quot; data-lang=&quot;xml&quot;&gt;&lt;span&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;rule &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;id&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;100000&amp;quot; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;level&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;1&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoded_as&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;openldap&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;decoded_as&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;match&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt; RESULT tag=97 err=49&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;match&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;rule&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;rule &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;id&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;100001&amp;quot; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;level&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;10&amp;quot; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;frequency&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;5&amp;quot; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;timeframe&lt;&#x2F;span&gt;&lt;span&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;60&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if_matched_sid&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;100000&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if_matched_sid&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;same_source_ip&lt;&#x2F;span&gt;&lt;span&gt;&#x2F;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;description&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;Multiple failed-logins from same source IP&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;description&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;span&gt;&amp;lt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;rule&lt;&#x2F;span&gt;&lt;span&gt;&amp;gt;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now, I can close an item on my PCI-DSS backlog!  The other nice piece here is
because OSSEC is distributed, I can do this match across multiple OpenLDAP
servers.  I&#x27;m working on an &lt;a href=&quot;http:&#x2F;&#x2F;www.ossec.net&#x2F;doc&#x2F;manual&#x2F;ar&#x2F;ar-unix.html&quot;&gt;active
response&lt;&#x2F;a&gt; script to
automatically disable accounts across a multi-datacenter, OpenLDAP
infrastructure with proper reporting and automatic Help Desk notificaitons!&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;m looking for testers and any suggestions for improvements on the interface as
there is now a decent amount of interest on the OSSEC development mailing list
for incorporating this feature into the 2.8 release.&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Using a ProxyCommand to Leap Frog Your Bastions</title>
		<published>2012-10-15T00:00:00+00:00</published>
		<updated>2012-10-15T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/ssh-leap-frog/" type="text/html"/>
		<id>https://divisionbyzero.net/ssh-leap-frog/</id>
		<content type="html">&lt;p&gt;I do most of my work over SSH.  Even when I&#x27;m working in my browser or
pgAdminIII, I&#x27;m &lt;em&gt;usually&lt;&#x2F;em&gt; doing that over SSH tunnels.  VPN Software has been
around for quite some time and it&#x27;s still mostly disappointing and usually run
by the least competent group in any IT department.  I developed a workflow using
SSH from my laptop, either on the corporate network or at home, I can ssh
&#x2F;directly&#x2F; to the server I&#x27;m interested in working on.&lt;&#x2F;p&gt;
&lt;p&gt;In order to accomplish this, I have made some compromises.  First off, if I&#x27;m
SSH-ing from my home, I am &#x2F;required&#x2F; to type the fully qualified domain names
(FQDN) when workign remotely.  I use the presence of the domain name to activate
the proper leap frogging.  I also decided to use ControlMaster&#x27;s with SSH that
can leave me with a terminal without a prompt when I forget which shell is my
master.  Overall, the pros outweigh the cons and I&#x27;m more productive because of
it.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;controlmaster&quot;&gt;ControlMaster&lt;&#x2F;h2&gt;
&lt;p&gt;Using a ControlMaster with ssh allows multiple connections to the same tcp
connection.  This means subsequent connections are &lt;em&gt;much&lt;&#x2F;em&gt; faster to open, but
places a limit on the original connection that all connections riding on it must
be closed before the ControlMaster connection closes.  This may or may not be
desirable, but does come in handy when using ProxyCommand to bounce around
through jump hosts as the connection establishment overhead is removed.&lt;&#x2F;p&gt;
&lt;p&gt;Adding this line to your &lt;code&gt;~&#x2F;.ssh&#x2F;config&lt;&#x2F;code&gt; will enable ControlMaster for all
connections:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Use Control Master so we type passwords less
&lt;&#x2F;span&gt;&lt;span&gt;ControlMaster auto
&lt;&#x2F;span&gt;&lt;span&gt;ControlPath ~&#x2F;.ssh&#x2F;ssh_control_%h_%p_%r
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;estasblishing-a-jump-host&quot;&gt;Estasblishing a Jump Host&lt;&#x2F;h2&gt;
&lt;p&gt;I find it&#x27;s best to alias the jump hosts to host names that don&#x27;t exist in DNS.
Ideally, I&#x27;ll never log in to these hosts directly, so I can even forget these
names.  Let&#x27;s create an alias for our bastion host, lets call it &#x27;bastion&#x27; and
it run ssh on 65022.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Bastion Host
&lt;&#x2F;span&gt;&lt;span&gt;Host bastion
&lt;&#x2F;span&gt;&lt;span&gt;  Hostname corporate-bastion.example.com
&lt;&#x2F;span&gt;&lt;span&gt;  Port 65022
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;using-your-aliases&quot;&gt;Using Your Aliases&lt;&#x2F;h2&gt;
&lt;p&gt;It&#x27;s really simple:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Use the bastion to connect to internal resources
&lt;&#x2F;span&gt;&lt;span&gt;Host *.internal.example.com
&lt;&#x2F;span&gt;&lt;span&gt;    ProxyCommand ssh bastion nc %h %p
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you&#x27;re configuring your networks in a way that make sense, this configuration
will work from home or work.  Usually, the internal.example.com zones live
inside your DNS search path while you&#x27;re on the corporate network.  You probably
also have SSH access directly to your servers from the corporate network, so
while at work you:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ ssh webserver-001
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And then once you go home you can access the same server, directly by:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ ssh webserver-001.internal.example.com
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Sure, it&#x27;s more typing, but it gets you exactly where you want quickly.  Chances
are you are using zsh or bash with autocompletion and you can just hit &#x2F;tab&#x2F;
when you get to the first &#x27;.&#x27; to have the autocomplete work it&#x27;s magic.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;it-s-complicated&quot;&gt;It&#x27;s Complicated ..&lt;&#x2F;h1&gt;
&lt;p&gt;Sure it is.  It&#x27;s always more complicated.  And we can achieve the same things
with your insanely complicated series of jump hosts.  Chances are, you&#x27;ve got
some &amp;quot;high security&amp;quot; shit going on with your network, and you need to use jump
hosts internally.  Maybe your external bastion machine only provides SSH access
to the other internal bastions on your network.  Well, that&#x27;s plenty O.K.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;multiple-jump-hosts&quot;&gt;Multiple Jump Hosts&lt;&#x2F;h2&gt;
&lt;p&gt;Again, I like to pick aliases not in DNS to avoid confusion.  DNS should be the
authoritative place for things on your network, so don&#x27;t collide with it.  I
can&#x27;t help you if you insist on being stupid.  Let&#x27;s say we have a setup where
we need to connect to an external bastion, then to our internal bastion host
from the outside.  This gives us multiple layers of security, but can drive a
man insane with all the non-sense required to scp, rsync, or tunnel to those
hosts behind two bastions.&lt;&#x2F;p&gt;
&lt;p&gt;When you have multiple jump hosts in play, ControlMaster comes in handy.  It
dramatically reduces connection time and complexity.  Should one of the bastion
hosts require a two-factor authentication scheme, ControlMaster will make you
life incredibly easy.  Here&#x27;s an example of how I might set this up:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Use Control Master so we type passwords less
&lt;&#x2F;span&gt;&lt;span&gt;ControlMaster auto
&lt;&#x2F;span&gt;&lt;span&gt;ControlPath ~&#x2F;.ssh&#x2F;ssh_control_%h_%p_%r
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;# External Bastion Host
&lt;&#x2F;span&gt;&lt;span&gt;Host extbastion
&lt;&#x2F;span&gt;&lt;span&gt;  Hostname external-bastion.example.com
&lt;&#x2F;span&gt;&lt;span&gt;  Port 65022
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;# Internal Bastion Host
&lt;&#x2F;span&gt;&lt;span&gt;Host intbastion
&lt;&#x2F;span&gt;&lt;span&gt;    ProxyCommand ssh extbastion nc internal-bastion.example.com 22
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;So everything looks the same as we saw earlier.  So to utilize the two jump
hosts together, we can just chain to the internal bastion host!&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Use the internal bastion from the external bastion to connect to internal resources
&lt;&#x2F;span&gt;&lt;span&gt;Host *.internal.example.com
&lt;&#x2F;span&gt;&lt;span&gt;    ProxyCommand ssh intbastion nc %h %p
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And there you go, keep chaining on the jump hosts and it will keep working.
Again, I don&#x27;t like to overlap my Host aliases with DNS, so make good decisions
while naming your aliases.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;bonus-round&quot;&gt;Bonus Round!&lt;&#x2F;h2&gt;
&lt;p&gt;As an FYI, you can use environment variables in your ProxyCommands! So, maybe
you do this in your &lt;code&gt;~&#x2F;.bashrc&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;export &lt;&#x2F;span&gt;&lt;span&gt;SSH_PROXY&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;int&amp;#39;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;alias &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;extssh&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;SSH_PROXY=ext ssh&amp;quot;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And add these entries to your &lt;code&gt;~&#x2F;.ssh&#x2F;config&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# External Bastion Host
&lt;&#x2F;span&gt;&lt;span&gt;Host jumper
&lt;&#x2F;span&gt;&lt;span&gt;  Hostname external-bastion.example.com
&lt;&#x2F;span&gt;&lt;span&gt;  Port 65022
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;# Internal Bastion Host
&lt;&#x2F;span&gt;&lt;span&gt;Host bastion-int
&lt;&#x2F;span&gt;&lt;span&gt;    Hostname internal-bastion.example.com
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;# External Access to Internal Bastion Host
&lt;&#x2F;span&gt;&lt;span&gt;Host bastion-ext
&lt;&#x2F;span&gt;&lt;span&gt;    ProxyCommand ssh jumper nc internal-bastion.example.com 22
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;# Proxy based on environment variable SSH_PROXY
&lt;&#x2F;span&gt;&lt;span&gt;Host *.internal.example.com
&lt;&#x2F;span&gt;&lt;span&gt;    ProxyCommand ssh bastion-$SSH_PROXY nc %h %p
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now you can retrain choose how to use your ssh jump hosts using shell
environment variables:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;$ ssh webserver-001.internal.example.com
&lt;&#x2F;span&gt;&lt;span&gt;# Desktop -&amp;gt; internal-bastion.example.com -&amp;gt; webserver-001.internal.ample.com
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;$ extssh webserver-001.internal.example.com
&lt;&#x2F;span&gt;&lt;span&gt;# Desktop -&amp;gt; external-bastion.example.com -&amp;gt; internal-bastion.example.com -&amp;gt; webserver-001.internal.example.com
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Enjoy!&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Silly Graphite Trick with ElasticSearch</title>
		<published>2012-07-09T00:00:00+00:00</published>
		<updated>2012-07-09T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/silly-graphite-trick/" type="text/html"/>
		<id>https://divisionbyzero.net/silly-graphite-trick/</id>
		<content type="html">&lt;p&gt;First things first.  I&#x27;ve stated that you should drop everything and install
&lt;a href=&quot;http:&#x2F;&#x2F;graphite.wikidot.com&quot;&gt;Graphite&lt;&#x2F;a&gt;.  If you didn&#x27;t already, please do
that now.  Go ahead, I&#x27;ll wait.&lt;&#x2F;p&gt;
&lt;p&gt;Good?  Good.  I don&#x27;t frequently insist on anything like I do with Graphite.
There&#x27;s a lot of reasons for that.  If you don&#x27;t believe me, please see
&lt;a href=&quot;http:&#x2F;&#x2F;twitter.com&#x2F;obfuscurity&quot;&gt;@obfuscurity&lt;&#x2F;a&gt;&#x27;s awesome &lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;Tags&#x2F;Graphite&quot;&gt;Graphite series on
his blog&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;When you get back we&#x27;ll talk about how to monitor
&lt;a href=&quot;http:&#x2F;&#x2F;elasticsearch.org&quot;&gt;ElasticSearch&lt;&#x2F;a&gt; with Graphite for fun and profit!&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;monitoring-elasticsearch-baseline&quot;&gt;Monitoring ElasticSearch : Baseline&lt;&#x2F;h2&gt;
&lt;p&gt;Today&#x27;s silly graphite trick involves monitoring for
&lt;a href=&quot;http:&#x2F;&#x2F;elasticsearch.org&quot;&gt;ElasticSearch&lt;&#x2F;a&gt;.  There are a number of solutions
available for live monitoring of an ElasticSearch cluster,
&lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;lukas-vlcek&#x2F;bigdesk&quot;&gt;BigDesk&lt;&#x2F;a&gt; and
&lt;a href=&quot;http:&#x2F;&#x2F;mobz.github.com&#x2F;elasticsearch-head&#x2F;&quot;&gt;Head&lt;&#x2F;a&gt; are great solutions for
running and monitoring your cluster real-time.&lt;&#x2F;p&gt;
&lt;p&gt;However, BigDesk lacks the customization and the incorporation of data from
other sources.  Graphite makes it so brain dead simple to store and track
metrics that it&#x27;s really only your own fault for not tracking the surface area
of the illuminated portion of the moon.  Then you &lt;em&gt;really&lt;&#x2F;em&gt; can track your
metrics against phases of the moon!  Graphite also allows you to store your
raw data and manipulate it for display purposes.&lt;&#x2F;p&gt;
&lt;p&gt;I strongly recommend you consider
&lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;BrightcoveOS&#x2F;Diamond&quot;&gt;Diamond&lt;&#x2F;a&gt; for polling system metrics
and outputting them to your Graphite infrastructure.  After a lot of fiddling
with the &lt;a href=&quot;http:&#x2F;&#x2F;joemiller.me&#x2F;2011&#x2F;04&#x2F;14&#x2F;collectd-graphite-plugin&#x2F;&quot;&gt;collectd graphite
plugin&lt;&#x2F;a&gt;, we were
forced to go with Diamond because the collectd plugin required a
carbon-aggregator to translate and flatten our metrics namespace into a sane
and useable format.  That worked at first, but running hundreds or thousands
of servers through a carbon-aggregator that has to manipulate almost every
metric doesn&#x27;t scale.  Even with 20 aggregators running on multiple
collectors, we struggled to map the data real-time.  Diamond allowed us to
scale horizontally by mapping metrics on the end point hosts and shipping them
pickled to a carbon relay.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;getting-es-metrics-into-graphite&quot;&gt;Getting ES Metrics into Graphite&lt;&#x2F;h2&gt;
&lt;p&gt;So, use Diamond to get base system data into Graphite.  Now, we need to get
some insight into the ElasticSearch daemon itself.  I looked a while back and
was unable to find anything useful, so I made it myself.  ElasticSearch
provides a number of API calls to retrieve statistics, but I chose the &lt;a href=&quot;http:&#x2F;&#x2F;www.elasticsearch.org&#x2F;guide&#x2F;reference&#x2F;api&#x2F;admin-cluster-nodes-stats.html&quot;&gt;Node
Stats&lt;&#x2F;a&gt;
as a starting point for pulling data out of ES.&lt;&#x2F;p&gt;
&lt;p&gt;At the time, we weren&#x27;t sure that Graphite was going to displace Cacti as our
monitoring solution (&lt;em&gt;well, I was&lt;&#x2F;em&gt;) so my
&lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;reyjrar&#x2F;graphite-scripts&#x2F;blob&#x2F;master&#x2F;bin&#x2F;perf_elastic_search.pl&quot;&gt;perf_elastic_search.pl&lt;&#x2F;a&gt;
script will output the metrics in the correct format for Graphite or Cacti.
Though Cacti output is untested because I gave up after 3 or 4 hours of trying
to figure out how to get data into Cacti.&lt;&#x2F;p&gt;
&lt;p&gt;I recommend running this on the ElasticSearch nodes with the --local options
specified on a 1 minute cron job:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;   * * * * * &#x2F;path&#x2F;to&#x2F;perf_elastic_search.pl --local --carbon-base=es
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;my-silly-trick&quot;&gt;My Silly Trick&lt;&#x2F;h2&gt;
&lt;p&gt;Now, if you read through Jason&#x27;s awesome series of posts, you&#x27;re probably able
to do a lot of cool stuff with Graphite.  If you&#x27;re studious, you can do
something far cooler than the parlor trick I want to show now!  Enough
suspense, here it is:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;divisionbyzero.net&#x2F;silly-graphite-trick&#x2F;Graphite-ES-JVM-Pauses.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;AWESOME, RIGHT?!@#!@? I know.  So, first off,
&lt;a href=&quot;http:&#x2F;&#x2F;graphite.readthedocs.org&#x2F;en&#x2F;0.9.10&#x2F;functions.html#graphite.render.functions.drawAsInfinite&quot;&gt;drawAsInfinite&lt;&#x2F;a&gt;
is awesome.  How awesome?  Well, I&#x27;ll tell you.  Any value &lt;strong&gt;greater than
zero&lt;&#x2F;strong&gt; is displayed as vertical line on your graph.  Sounds neat right?
Combine that with the
&lt;a href=&quot;http:&#x2F;&#x2F;graphite.readthedocs.org&#x2F;en&#x2F;0.9.10&#x2F;functions.html#graphite.render.functions.offset&quot;&gt;offset&lt;&#x2F;a&gt;
function and you can strike a vertical line anywhere on the graph where a data
point is greater than a certain value.&lt;&#x2F;p&gt;
&lt;p&gt;In the above graph, I&#x27;ve chosen to mark a vertical line anywhere the JVM
paused for greater than 1 second with a vertical line.  BUT WAIT!  ES output
MILLISECONDS, and a lot of times it&#x27;s pausing for only just a few milliseconds
or even nano seconds.  Enough to be greater than 1, but too low for me to
care.  I really want to know when the JVM pauses for more than 1 second, so I
extract and graph that data like so:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt; drawAsInfinite(offset(nonNegativeDerivative(es.searchnode-01.jvm.gc.time_ms), -1000))
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;divisionbyzero.net&#x2F;silly-graphite-trick&#x2F;01-Graphite-ES-JVM-Pauses.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;ES reports total time spent in GC as counter, so we use
&lt;a href=&quot;http:&#x2F;&#x2F;graphite.readthedocs.org&#x2F;en&#x2F;0.9.10&#x2F;functions.html#graphite.render.functions.nonNegativeDerivative&quot;&gt;nonNegativeDerivative&lt;&#x2F;a&gt;
to extract the changes.  Since we&#x27;re in milliseconds, we substract 1,000 and
now only data points with 1 second or higher will be displayed.  That by
itself is interesting, but where it gets fun is when you start correlating
that data with other events.&lt;&#x2F;p&gt;
&lt;p&gt;I added the red line, which is the size of the JVM Heap:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;color(es.searchnode-01.jvm.mem.heap.used_bytes,&amp;quot;red&amp;quot;)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;divisionbyzero.net&#x2F;silly-graphite-trick&#x2F;02-Graphite-ES-Heap.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;That was interesting, but this box is doing a lot of indexing as it&#x27;s
receiving my log data, so it can easily hit 7-8,000 messages per second.  So I
wanted to graph time spent indexing along with these points.  The ES tracker
keeps a counter of the time, which increments until restarts which you can see
by drops here:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;es.searchnode-01.indices.indexing.time_ms
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;divisionbyzero.net&#x2F;silly-graphite-trick&#x2F;03-Graphite-ES-Indexing.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;No worries, using the
&lt;a href=&quot;http:&#x2F;&#x2F;graphite.readthedocs.org&#x2F;en&#x2F;0.9.10&#x2F;functions.html#graphite.render.functions.derivative&quot;&gt;derivative&lt;&#x2F;a&gt;
or
&lt;a href=&quot;http:&#x2F;&#x2F;graphite.readthedocs.org&#x2F;en&#x2F;0.9.10&#x2F;functions.html#graphite.render.functions.nonNegativeDerivative&quot;&gt;nonNegativeDerivative&lt;&#x2F;a&gt;
functions you extract the change and graph the line.  Since we have restarts,
we&#x27;re going to use the nonNegativeDerivative function to ignore the drops:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;nonNegativeDerivative(es.searchnode-01.indices.indexing.time_ms)
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;divisionbyzero.net&#x2F;silly-graphite-trick&#x2F;04-Graphite-ES-Derive.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;When we add this back into the graph, you&#x27;ll notice that the line hugs the X
axis pretty tightly:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;divisionbyzero.net&#x2F;silly-graphite-trick&#x2F;05-Graphite-ES-Flattened.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This is mainly because that number is so small compared to the bigger JVM Heap
numbers.  No worries, we&#x27;ll call
&lt;a href=&quot;http:&#x2F;&#x2F;graphite.readthedocs.org&#x2F;en&#x2F;0.9.10&#x2F;functions.html#graphite.render.functions.secondYAxis&quot;&gt;secondYAxis&lt;&#x2F;a&gt;
to cram as much data into these pixels as we can!&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;secondYAxis(derivative(es.searchnode-01.indices.indexing.time_ms))
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;divisionbyzero.net&#x2F;silly-graphite-trick&#x2F;Graphite-ES-Finished.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Yay!&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Follow-up Central Logging</title>
		<published>2012-06-18T00:00:00+00:00</published>
		<updated>2012-06-18T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/follow-up-central-logging/" type="text/html"/>
		<id>https://divisionbyzero.net/follow-up-central-logging/</id>
		<content type="html">&lt;p&gt;The reaction to my &lt;a href=&quot;http:&#x2F;&#x2F;divisionbyzero.net&#x2F;article&#x2F;2012&#x2F;06&#x2F;17&#x2F;central-logging-with-open-source-software.html&quot;&gt;Central
Logging&lt;&#x2F;a&gt;
post has been significantly greater and more positive than I could&#x27;ve
expected, so I wanted to recap some of the conversation that came out of this.
I am pleasantly surprised by most of the comments on the &lt;a href=&quot;http:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=4122991&quot;&gt;Hacker News
Thread&lt;&#x2F;a&gt;.  So, here&#x27;s a real quick
recap of the responses I&#x27;ve received.  I will continue this series this
weekend with more technical details.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;So here&#x27;s some reflections on the feedback so far.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;another-alternative-elsa&quot;&gt;Another Alternative: ELSA&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;#!&#x2F;spazm&quot;&gt;@spazm&lt;&#x2F;a&gt; recommended checking out
&lt;a href=&quot;http:&#x2F;&#x2F;code.google.com&#x2F;p&#x2F;enterprise-log-search-and-archive&#x2F;&quot;&gt;ELSA&lt;&#x2F;a&gt;.  Not alone,
as &lt;a href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;#!&#x2F;_viq&quot;&gt;@_viq&lt;&#x2F;a&gt; echo&#x27;d @spazm&#x27;s suggestion.  This trend
continued on HackerNews with another recommendation for ELSA from
&lt;a href=&quot;http:&#x2F;&#x2F;news.ycombinator.com&#x2F;user?id=ova&quot;&gt;ova&lt;&#x2F;a&gt;.  I figured this warranted an
investigation.  Unfortunately I have not had a chance to play around with ELSA
yet.&lt;&#x2F;p&gt;
&lt;p&gt;I read through the docs and found it&#x27;s using
&lt;a href=&quot;http:&#x2F;&#x2F;sphinxsearch.com&#x2F;&quot;&gt;Sphinx&lt;&#x2F;a&gt; as the search backend.  From my cursory
research, the main differences between &lt;a href=&quot;http:&#x2F;&#x2F;elasticsearch.org&quot;&gt;ElasticSearch&lt;&#x2F;a&gt;
and Sphinx seems to be ease in configuration and setup of clusters with
ElasticSearch winning.  That said, Sphinx seems to crush ElasticSearch on single
node search capabilities.  This is based on the limited information I could find
in the few minutes I had to spend on researching it.&lt;&#x2F;p&gt;
&lt;p&gt;I will spend some time researching ELSA as it is a &lt;a href=&quot;http:&#x2F;&#x2F;perl.org&quot;&gt;Perl&lt;&#x2F;a&gt;
project.  I am a sucker for Perl apps!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;diy&quot;&gt;DIY&lt;&#x2F;h2&gt;
&lt;p&gt;A number of folks are currently involved in the evaluation of logging tools.  I
was a bit disheartened by the number of people considering rolling their own.
While I love the idea of reinventing the wheel and have done so many, many
times, I have to agree with &lt;a href=&quot;http:&#x2F;&#x2F;ranum.com&#x2F;security&#x2F;computer_security&#x2F;archives&#x2F;logging-notes.pdf&quot;&gt;Marcus Ranum&lt;&#x2F;a&gt;.
Logging is hard, and you&#x27;re probably over your head.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;d urge those going down this road to investigate contributing to Open Source
Software already in this space.  If we could strengthen a few projects in this
sphere rather than just constantly building more disposable wheels, we all win.
Again, believe me, I sincerely understand that you want to build your own,
however, this is more complicated than you can imagine.  Why do I know that?
Because I built &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;reyjrar&#x2F;eris&quot;&gt;my&lt;&#x2F;a&gt;
&lt;a href=&quot;https:&#x2F;&#x2F;metacpan.org&#x2F;module&#x2F;POE::Component::Server::eris&quot;&gt;own&lt;&#x2F;a&gt;
&lt;a href=&quot;https:&#x2F;&#x2F;metacpan.org&#x2F;module&#x2F;POE::Component::Client::eris&quot;&gt;wheels&lt;&#x2F;a&gt; in this space as well!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;splunk-vs&quot;&gt;Splunk vs.???&lt;&#x2F;h2&gt;
&lt;p&gt;I did haphazardly request feedback from someone who&#x27;s had experience with
Splunk.  I realize now that I really need to write-up and screenshot the
capabilities of the &lt;a href=&quot;http:&#x2F;&#x2F;logstash.org&quot;&gt;Logstash&lt;&#x2F;a&gt; &#x2F;
&lt;a href=&quot;http:&#x2F;&#x2F;rashidkpc.github.com&#x2F;Kibana&#x2F;&quot;&gt;Kibana&lt;&#x2F;a&gt; &#x2F;
&lt;a href=&quot;http:&#x2F;&#x2F;graphite.wikidot.com&quot;&gt;Graphite&lt;&#x2F;a&gt; setup.  I plan on doing that later this
week, so until then I&#x27;ll assume responsibility for the poor feedback in this
area.  I read the Hacker News comments and got the impression that you either
use Splunk and &lt;em&gt;never use anything else again&lt;&#x2F;em&gt; &lt;strong&gt;or&lt;&#x2F;strong&gt; you think Splunk is too
expensive.  Both positions lack the evidence and rigor I was looking to elicit,
but again, my fault.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;thank-you&quot;&gt;Thank you!&lt;&#x2F;h1&gt;
&lt;p&gt;A big thank you to everyone who engaged in a discussion or helped spread the
word.  I was surprised at the amazing response.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;translations&quot;&gt;Translations!&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Vicky Rotarova kindly volunteered to translate this page into Belarussian.  You can find the &lt;a href=&quot;http:&#x2F;&#x2F;www.piecesdiscount24.fr&#x2F;edu&#x2F;?p=10951&quot;&gt;Belarussian translation here&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.homeyou.com&#x2F;~edu&#x2F;&quot;&gt;Artur Weber&lt;&#x2F;a&gt; kindly volunteered to translate this page into Portuguese. You can find the &lt;a href=&quot;https:&#x2F;&#x2F;www.homeyou.com&#x2F;~edu&#x2F;registro-central-de-acompanhamento&quot;&gt;Portuguese translation here&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Central Logging with Open Source Software</title>
		<published>2012-06-17T00:00:00+00:00</published>
		<updated>2012-06-17T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/central-logging-with-open-source-software/" type="text/html"/>
		<id>https://divisionbyzero.net/central-logging-with-open-source-software/</id>
		<content type="html">&lt;p&gt;I have worn many hats over the past few years: System Administrator,
&lt;a href=&quot;http:&#x2F;&#x2F;www.postgresql.org&quot;&gt;PostgreSQL&lt;&#x2F;a&gt; and MySQL DBA, &lt;a href=&quot;http:&#x2F;&#x2F;perl.org&quot;&gt;Perl&lt;&#x2F;a&gt;
Programmer, PHP Programmer, Network Administrator, and Security
Engineer&#x2F;Officer.  The common thread is having the data I need available,
&lt;strong&gt;searchable&lt;&#x2F;strong&gt;, and &lt;strong&gt;visible&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;So what data am I talking about?  Honestly, &lt;em&gt;everything&lt;&#x2F;em&gt;.  System logs,
application logs, events, system performance data, and network traffic data
are key requirements to making any tough infrastructure decision, if not key
to the trivial infrastructure and implementation decisions we have to make
everyday.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;m in the midst of implementing a comprehensive solution, and this post is a
brain dump and road map for how I went about it, and why.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;step-1-syslog&quot;&gt;Step 1: syslog&lt;&#x2F;h2&gt;
&lt;p&gt;I usually start with a sane solution for transporting the events occurring on
UNIX and Windows servers to a central log host.  You may need to aggregate
events at a data center level depending on throughput.  There are a number of
options available to you for centrally logging with syslog, the favorites seem
to be:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;www.balabit.com&#x2F;network-security&#x2F;syslog-ng&quot;&gt;syslog-ng&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;www.rsyslog.com&#x2F;&quot;&gt;rsyslog&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Both are excellent choices, but if you have a tight budget and are reading
between the lines of popular regulatory policy (SOX,PCI-DSS,FISMA,FERPA,etc),
you may want to give some thought to 2 features in particular: guaranteed
delivery, and encrypted transfer.  These are not hard and fast rules that
auditors check for right now, but they will in the near future.&lt;&#x2F;p&gt;
&lt;p&gt;With rsyslog, both features are available in the open source solution, where
as this is not the case with syslog-ng.  However, rsyslog does not run on
Windows, so if you have a large number of Windows Servers, you probably need
to spend money on a central logging solution anyways.  I am not in this
position, so I choose &lt;strong&gt;rsyslog&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The main drawback to rsyslog is the configuration file syntax.  Syslog-ng
decided to do away with legacy syslog config file syntax in favor of a
readable, sensical format.  Rsyslog, decided to maintain the legacy syslog
configuration syntax and extend it for new features.  This is maddening, but
if you don&#x27;t have the budget for syslog-ng and need encryption and&#x2F;or
guaranteed delivery, you can make it work.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;central-log-server&quot;&gt;Central Log Server&lt;&#x2F;h3&gt;
&lt;p&gt;First, we need to setup a place for our logs to land.  Configuring the rsyslog
central server means configuring where we want the logs to live, and how we&#x27;d
like to receive them.  I&#x27;m calling this host &#x27;logstorage-01.&#x27;&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ll explain the configuration step by step.  The first part sets the default
templates, work directory, and loads the modules we need:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Rsyslog Defaults
&lt;&#x2F;span&gt;&lt;span&gt;$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
&lt;&#x2F;span&gt;&lt;span&gt;$WorkDirectory &#x2F;var&#x2F;run&#x2F;rsyslog
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;# Modules
&lt;&#x2F;span&gt;&lt;span&gt;$ModLoad immark
&lt;&#x2F;span&gt;&lt;span&gt;$ModLoad imudp
&lt;&#x2F;span&gt;&lt;span&gt;$ModLoad imtcp
&lt;&#x2F;span&gt;&lt;span&gt;$ModLoad imklog
&lt;&#x2F;span&gt;&lt;span&gt;$ModLoad imuxsock
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now, I establish that I want to listen on tcp and udp 514:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;## Enable Listeners
&lt;&#x2F;span&gt;&lt;span&gt;$InputTCPServerRun 514
&lt;&#x2F;span&gt;&lt;span&gt;$UDPServerRun 514
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Rsyslog uses templates for both filenames and output.  This is an example of
both.  The RemoteHost template will be used to determine the filename foreach
message that comes in.  The ArcSightFormat is going to be used to reformat the
message in a way that an ArcSight Agent can handle.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Templates
&lt;&#x2F;span&gt;&lt;span&gt;$template RemoteHost,&amp;quot;&#x2F;var&#x2F;log&#x2F;remote&#x2F;%HOSTNAME%&#x2F;%$YEAR%&#x2F;%$MONTH%-%$DAY%.log&amp;quot;
&lt;&#x2F;span&gt;&lt;span&gt;$template ArcSightFormat,&amp;quot;&amp;lt;%PRI%&amp;gt;%TIMESTAMP% %fromhost-ip% %syslogtag%%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n&amp;quot;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now, we&#x27;re ready to start doing things with messages.  The first action I
choose is to discard all connection related messages from snmpd as these
consume a lot of disk space.  You can disable this type of logging in snmpd,
but it also serves a good example of log filtering and the discard action &#x27;~&#x27;.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Discard SNMPD Connection Messages
&lt;&#x2F;span&gt;&lt;span&gt;if $programname == &amp;#39;snmpd&amp;#39; and ( $msg contains &amp;#39;Connection from UDP&amp;#39; or $msg contains &amp;#39;Received SNMP packet(s) from UDP&amp;#39; ) then ~
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;At this point, due to the &#x27;~&#x27; any message matching snmpd and the strings I&#x27;ve
specified will have been discarded.  I now want to log &lt;em&gt;everything&lt;&#x2F;em&gt; to disk
using my &lt;strong&gt;RemoteHost&lt;&#x2F;strong&gt; template.  It is important to note that local syslog
messages will also be caught by this next rule:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Archival Storage
&lt;&#x2F;span&gt;&lt;span&gt;#    All Messages, locally and remote stored to these rules
&lt;&#x2F;span&gt;&lt;span&gt;*.* ?RemoteHost
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The *.* tells rsyslog to log everything, the ?RemoteHost, is the template
used for the file name.  This next rule demonstrates how to send selected
messages to a UDP listener using a message format:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# ArcSight
&lt;&#x2F;span&gt;&lt;span&gt;if $programname == &amp;#39;named&amp;#39; then @arcsight.example.com;ArcSightFormat
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;So, in this example, anything from named is forwarded to acrsight.example.com
over udp (@) port 514 (default) using the format (;) ArcSightFormat for the
message.&lt;&#x2F;p&gt;
&lt;p&gt;It&#x27;s at this point that our log archival and any additional remote forwarding
we need is complete.  The next thing we do is discard any messages not sourced
from &#x27;logstorage-01&#x27;:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# If not sourced locally, stop processing message.
&lt;&#x2F;span&gt;&lt;span&gt;:source , !isequal , &amp;quot;logstorage-01&amp;quot; ~
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;So now only local events are left and we implement local logging.  This format
should be familiar to anyone who&#x27;s worked with syslogd before:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Local Logging
&lt;&#x2F;span&gt;&lt;span&gt;*.info;mail.none;authpriv.none;cron.none                &#x2F;var&#x2F;log&#x2F;messages
&lt;&#x2F;span&gt;&lt;span&gt;authpriv.*                                              &#x2F;var&#x2F;log&#x2F;secure
&lt;&#x2F;span&gt;&lt;span&gt;mail.*                                                  -&#x2F;var&#x2F;log&#x2F;maillog
&lt;&#x2F;span&gt;&lt;span&gt;kern.*                                                  &#x2F;var&#x2F;log&#x2F;kern.log
&lt;&#x2F;span&gt;&lt;span&gt;cron.*                                                  &#x2F;var&#x2F;log&#x2F;cron
&lt;&#x2F;span&gt;&lt;span&gt;*.emerg                                                 *
&lt;&#x2F;span&gt;&lt;span&gt;uucp,news.crit                                          &#x2F;var&#x2F;log&#x2F;spooler
&lt;&#x2F;span&gt;&lt;span&gt;local7.*                                                &#x2F;var&#x2F;log&#x2F;boot.log
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;setting-up-the-clients&quot;&gt;Setting up the clients&lt;&#x2F;h2&gt;
&lt;p&gt;Next, we&#x27;d like to receive logs on the central server, so we need to setup our
clients to send messages.  At this point, I&#x27;m not configuring encryption of
messages.  I would like guaranteed delivery of the messages to the central log
server.  rsyslog has a few ways to do this, including it&#x27;s own protocol for
delivery.  I don&#x27;t need insane amounts of guarantee; using TCP and an on-disk
queue will get me most of the way there and is simple to implement.&lt;&#x2F;p&gt;
&lt;p&gt;So, here&#x27;s my rsyslog.conf, one step at a time:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Rsyslog Defaults
&lt;&#x2F;span&gt;&lt;span&gt;$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
&lt;&#x2F;span&gt;&lt;span&gt;$WorkDirectory &#x2F;var&#x2F;run&#x2F;rsyslog  # Default Location for Work Files
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;# Modules
&lt;&#x2F;span&gt;&lt;span&gt;$ModLoad immark
&lt;&#x2F;span&gt;&lt;span&gt;$ModLoad imklog
&lt;&#x2F;span&gt;&lt;span&gt;$ModLoad imuxsock
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Nothing crazy there, load the modules we need, set the standard template for messages.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Local Logging
&lt;&#x2F;span&gt;&lt;span&gt;*.info;mail.none;authpriv.none;cron.none                &#x2F;var&#x2F;log&#x2F;messages
&lt;&#x2F;span&gt;&lt;span&gt;authpriv.*                                              &#x2F;var&#x2F;log&#x2F;secure
&lt;&#x2F;span&gt;&lt;span&gt;mail.*                                                  -&#x2F;var&#x2F;log&#x2F;maillog
&lt;&#x2F;span&gt;&lt;span&gt;kern.*                                                  &#x2F;var&#x2F;log&#x2F;kern.log
&lt;&#x2F;span&gt;&lt;span&gt;cron.*                                                  &#x2F;var&#x2F;log&#x2F;cron
&lt;&#x2F;span&gt;&lt;span&gt;*.emerg                                                 *
&lt;&#x2F;span&gt;&lt;span&gt;uucp,news.crit                                          &#x2F;var&#x2F;log&#x2F;spooler
&lt;&#x2F;span&gt;&lt;span&gt;local7.*                                                &#x2F;var&#x2F;log&#x2F;boot.log
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Again, nothing earth shattering.  Basic syslogd style capture of messages to
disk.  Now we&#x27;re ready to send messages to our central log storage server.  So
this would be a good time to remove anything we don&#x27;t want to send from the
stream.  Again, I&#x27;ve used an snmpd connection message filter as a
demonstration.  Anything matching it will be discarded by the &#x27;~&#x27;.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Discard SNMPD Spam
&lt;&#x2F;span&gt;&lt;span&gt;if $programname == &amp;#39;snmpd&amp;#39; and ( $msg contains &amp;#39;Connection from UDP&amp;#39; or $msg contains &amp;#39;Received SNMP packet(s) from UDP&amp;#39; ) then ~
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;So, now we&#x27;re ready to actually send log messages to the central server.  The
first thing we need to configure is the on-disk queue.  We do this as follows:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;# Remote Logging with On Disk Queuring Enabled
&lt;&#x2F;span&gt;&lt;span&gt;$ActionQueueType LinkedList         # Asynchronous Forwarding Mechanism
&lt;&#x2F;span&gt;&lt;span&gt;$ActionQueueFileName centralwork    # Enable disk mode queue
&lt;&#x2F;span&gt;&lt;span&gt;$ActionResumeRetryCount -1          # Infinite Retries
&lt;&#x2F;span&gt;&lt;span&gt;$ActionQueueSaveOnShutdown on       # Save Queue on Exit for reprocessing
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And the last thing we need is a destination for the logs for this queue:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;*.*     @@logstorage-01.example.com:514
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;One thing to point out is the use of &#x27;@@&#x27;, this specifies we want to use TCP.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;reflections&quot;&gt;Reflections&lt;&#x2F;h2&gt;
&lt;p&gt;What we have at this point is a fairly reliable transport of our syslog
messages from our UNIX hosts to our central log server.  Remember, we
configured the central log server with both TCP and UDP listeners.  This means
that for systems which rsyslog doesn&#x27;t support and may not be able to use TCP
delivery, we can send legacy UDP messages to logstorage-01 and it will work.&lt;&#x2F;p&gt;
&lt;p&gt;To get Windows servers participating, you may want to investigate:
&lt;a href=&quot;http:&#x2F;&#x2F;syslog-win32.sourceforge.net&#x2F;&quot;&gt;syslog-win32&lt;&#x2F;a&gt;,
&lt;a href=&quot;http:&#x2F;&#x2F;www.intersectalliance.com&#x2F;projects&#x2F;BackLogNT&#x2F;&quot;&gt;S.N.A.R.E.&lt;&#x2F;a&gt;, or
&lt;a href=&quot;http:&#x2F;&#x2F;code.google.com&#x2F;p&#x2F;eventlog-to-syslog&#x2F;&quot;&gt;eventlog-to-syslog&lt;&#x2F;a&gt;.  I don&#x27;t
have much experience with them, but they will communicate to rsyslog in this
setup.&lt;&#x2F;p&gt;
&lt;p&gt;Now that we have rsyslog configured, we could use syslog as the backend for
our application logging.  Don&#x27;t get too angry, you can always use something
like &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;scribe&quot;&gt;scribe&lt;&#x2F;a&gt; or
&lt;a href=&quot;https:&#x2F;&#x2F;cwiki.apache.org&#x2F;FLUME&#x2F;&quot;&gt;Flume&lt;&#x2F;a&gt; as well.  That&#x27;s the subject of
another write-up though.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;step-2-doing-something-with-these-messages&quot;&gt;Step 2: Doing something with these messages&lt;&#x2F;h1&gt;
&lt;p&gt;Before going any further, I&#x27;d like to address the 1,000 lb Gorilla in the
room, &lt;a href=&quot;http:&#x2F;&#x2F;www.splunk.com&#x2F;&quot;&gt;Splunk&lt;&#x2F;a&gt;.  I have never worked with Splunk.
Even for my ~150 servers in my previous job, I exceeded the 500mb of logs
allowed per day.  (I am a fan of syslog, why invent another logging protocol
if there&#x27;s already one available?)  That being said, people I trust who have
experience with it say it&#x27;s amazing.&lt;&#x2F;p&gt;
&lt;p&gt;As a matter of fact, I&#x27;ve never encountered someone who&#x27;s used Splunk and had
anything remotely negative to say about it&#x27;s performance, scalability, or user
experience.  The only complaint I&#x27;ve ever heard is that it is expensive; not
&amp;quot;organic beef&amp;quot; expensive, but Aston Martin expensive.  If you have that kind
of budget to spend on logging, go for it.  I&#x27;ve been working for far too many
poor companies for too long and could not fathom spending hundreds of
thousands of dollars on logging software.&lt;&#x2F;p&gt;
&lt;p&gt;I am insanely curious about how close to Splunk&#x27;s interface and utility I can
get with open source software.  What follow is my attempt to do just that.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;graylog2&quot;&gt;Graylog2&lt;&#x2F;h2&gt;
&lt;p&gt;When I first started down this road, someone suggested I take a look at
&lt;a href=&quot;http:&#x2F;&#x2F;graylog2.org&#x2F;&quot;&gt;Graylog2&lt;&#x2F;a&gt;.  It&#x27;s user interface is fantastic and it
leverages cool sounding technologies like &amp;quot;MongoDB&amp;quot; and uses a cartoon gorilla
from &lt;a href=&quot;http:&#x2F;&#x2F;theoatmeal.com&quot;&gt;The Oatmeal&lt;&#x2F;a&gt;.  When you log in, as the interface
loads it says: &amp;quot;Mounting party hats!&amp;quot;  How cutting edge is that?  Awesome.&lt;&#x2F;p&gt;
&lt;p&gt;I love software that has a sense of humor.  It adds to the user experience.
Under the hood, Graylog2 has a number of awesome features including it&#x27;s own
log format for passing messages around in a way that allows for easy
serialization and deserialization of data in the log stream.  This format is
&lt;a href=&quot;http:&#x2F;&#x2F;graylog2.org&#x2F;about&#x2F;gelf&quot;&gt;GELF&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;I didn&#x27;t have too many problems installing Graylog2, and honestly I was very
happy with the interface and configurability.  I sent only a small stream of
log traffic to it and was able to get data out very quickly.  I also noticed
that the release I downloaded was the first to support
&lt;a href=&quot;http:&#x2F;&#x2F;elasticsearch.org&quot;&gt;ElasticSearch&lt;&#x2F;a&gt; as a storage backend in addition to
MongoDB.&lt;&#x2F;p&gt;
&lt;p&gt;For those of you unfamiliar with ElasticSearch, it&#x27;s a clustered full-text
search platform based on &lt;a href=&quot;http:&#x2F;&#x2F;lucene.apache.org&#x2F;core&#x2F;&quot;&gt;Lucene&lt;&#x2F;a&gt;.  If you&#x27;ve
never had the privilege of working with ElasticSearch, I can tell you it is
magic.  I support several ElasticSearch clusters in a production environment
and can tell you first hand it&#x27;s a wonderful product from my stand point.  The
only complaint I have is it&#x27;s too much magic.  It makes me feel insignificant
as a system administrator because I have to do very little to support it.
It&#x27;s also incredibly fast and incredibly scalable.&lt;&#x2F;p&gt;
&lt;p&gt;Well, it&#x27;s scalable if you design your indexes in a certain way.  And this is
where the show stops for Graylog2.  You see, ElasticSearch uses sharding to
distribute data across the cluster.  You can specify how many shards you want
an index to have when you create it.  You can also specify how many copies of
each shard you&#x27;d like to keep across the cluster for redundancy.  You can even
do some neat things like saying &amp;quot;keep a copy at each datacenter and never have
all copies of one shard in the same rack in the same datacenter.&amp;quot;  This
means you can scale performance at the time of index creation.  If I have
5 shards, I can scale up to 5 cluster nodes and gain performance, after
that, I&#x27;m simply gaining redundancy as only 1 shard will be the master at
any one time.&lt;&#x2F;p&gt;
&lt;p&gt;So why is this a problem with Graylog2?  Well, Graylog2 uses a single index
for it&#x27;s entire database.  Perhaps this is a side-effect of their relatively
late adoption of ElasticSearch as a backend for log storage.  But it means
that you need to build the index at the time of initial installation to
cope with the load of the logs for the future of your logging solution.
Sound easy?  Well, it&#x27;s not.  If you have a large volume of logs and you
intend on keeping them around for compliance reasons for a long period of
time, Graylog2&#x27;s use of ElasticSearch will cause significant performance
problems for you, even if you were to know you need 20 nodes in your
cluster.&lt;&#x2F;p&gt;
&lt;p&gt;So, for a large installation with high volume, I cannot recommend Graylog2.
It&#x27;s beautiful, it&#x27;s fun, but the ElasticSearch indexing scheme is currently
broken.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;logstash&quot;&gt;Logstash&lt;&#x2F;h2&gt;
&lt;p&gt;So, what else is there?  Well, there&#x27;s &lt;a href=&quot;http:&#x2F;&#x2F;logstash.net&quot;&gt;Logstash&lt;&#x2F;a&gt;.
Logstash is more of a log routing or translation protocol than anything else.
Take a look at the list of inputs, filter, and outputs it supports:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#272822;color:#f8f8f2;&quot;&gt;&lt;code&gt;&lt;span&gt;inputs      filters          outputs
&lt;&#x2F;span&gt;&lt;span&gt;-------     -------------    -------------------
&lt;&#x2F;span&gt;&lt;span&gt;amqp        date             amqp
&lt;&#x2F;span&gt;&lt;span&gt;exec        dns              elasticsearch
&lt;&#x2F;span&gt;&lt;span&gt;file        gelfify          elasticsearch_river
&lt;&#x2F;span&gt;&lt;span&gt;gelf        grep             file
&lt;&#x2F;span&gt;&lt;span&gt;redis       grok             ganglia
&lt;&#x2F;span&gt;&lt;span&gt;stdin       grokdiscovery    gelf
&lt;&#x2F;span&gt;&lt;span&gt;stomp       json             graphite
&lt;&#x2F;span&gt;&lt;span&gt;syslog      multiline        internal
&lt;&#x2F;span&gt;&lt;span&gt;tcp         mutate           loggly
&lt;&#x2F;span&gt;&lt;span&gt;twitter     split            mongodb
&lt;&#x2F;span&gt;&lt;span&gt;xmpp                         nagios
&lt;&#x2F;span&gt;&lt;span&gt;zeromq                       null
&lt;&#x2F;span&gt;&lt;span&gt;                             redis
&lt;&#x2F;span&gt;&lt;span&gt;                             statsd
&lt;&#x2F;span&gt;&lt;span&gt;                             stdout
&lt;&#x2F;span&gt;&lt;span&gt;                             stomp
&lt;&#x2F;span&gt;&lt;span&gt;                             tcp
&lt;&#x2F;span&gt;&lt;span&gt;                             websocket
&lt;&#x2F;span&gt;&lt;span&gt;                             xmpp
&lt;&#x2F;span&gt;&lt;span&gt;                             zabbix
&lt;&#x2F;span&gt;&lt;span&gt;                             zeromq
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;AH HA!  You&#x27;ll notice that one of the outputs logstash supports is
ElasticSearch.  So why use logstash instead of Graylog2?  It has to do with
the indexes.  Graylog2 implements a single index &#x27;graylog2&#x27; in the
ElasticSearch cluster.  This makes the search API fairly simple, as I simply
specify that index to search from and give my filter criteria.  The downside,
this index is ENORMOUS, so simple searches, or unbounded searches could
dramatically impact the availability of the entire cluster.&lt;&#x2F;p&gt;
&lt;p&gt;Logstash&#x27;s developers seem to have more experience with the ElasticSearch
model and designed it with scalability in mind.  The logstash elasticsearch
output mechanism creates a new index &lt;em&gt;every day&lt;&#x2F;em&gt;.  This means there&#x27;s a little
more logic needed on the search front-end to specify which indexes to look in
for the data you&#x27;re querying, &lt;strong&gt;but&lt;&#x2F;strong&gt; you can change the sharding definitions
on a daily basis and grow your cluster as your needs change.  This also
allows for index optimization.  Someone much smarter than me can explain
better, but if an index is in a readonly (or infrequent write) state, like
yesterday&#x27;s index, it can be highly optimized with Lucene to yield better
performance.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;extracting-custom-data&quot;&gt;Extracting Custom Data&lt;&#x2F;h3&gt;
&lt;p&gt;One of my first uses for Logstash was to provide a better UI for
&lt;a href=&quot;http:&#x2F;&#x2F;www.ossec.net&quot;&gt;OSSEC-HIDS&lt;&#x2F;a&gt;.  OSSEC does an amazing job of security
monitoring hosts and aggregations of hosts, but the interface is fairly behind
where I feel it needs to be.  However, I could say the same thing about
Logstash.  That&#x27;s fine, because we can leverage the strengths of Logstash to
provide better interfaces.&lt;&#x2F;p&gt;
&lt;p&gt;I found &lt;a href=&quot;http:&#x2F;&#x2F;ddpbsd.blogspot.nl&#x2F;2011&#x2F;10&#x2F;3woo-you-got-your-ossec-in-my-logstash_26.html&quot;&gt;this awesome write-up on getting OSSEC alerts to
Logstash&lt;&#x2F;a&gt;
for processing.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;where-logstash-fails&quot;&gt;Where Logstash fails&lt;&#x2F;h3&gt;
&lt;p&gt;Logstash isn&#x27;t perfect, it&#x27;s front-end leaves MUCH to be desired.  However,
the infrastructure and flexibility it affords, I&#x27;d prefer the developers focus
on the inputs, filter, and outputs than waste valuable resources on
front-ends.  If you&#x27;ve learned anything in the open source community, someone
will fix that problem.  And it turns out they have:&lt;&#x2F;p&gt;
&lt;h2 id=&quot;searchable-and-visible-kibana&quot;&gt;Searchable and Visible: Kibana&lt;&#x2F;h2&gt;
&lt;p&gt;As I&#x27;ve pointed out, the ElasticSearch storage in logstash is excellent.  It
uses the indexes exactly as they were designed to be used.  This means the
performance, reliability, and scalability of logstash&#x27;s storage backend is on
par with Splunk.  However, it&#x27;s front-end has &lt;em&gt;nothing&lt;&#x2F;em&gt; on Splunk.  Enter
&lt;a href=&quot;http:&#x2F;&#x2F;rashidkpc.github.com&#x2F;Kibana&#x2F;&quot;&gt;Kibana&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Kibana is a &lt;a href=&quot;http:&#x2F;&#x2F;twitter.github.com&#x2F;bootstrap&#x2F;&quot;&gt;Bootstrap&lt;&#x2F;a&gt;-based PHP
front-end which leverages the indexes Logstash creates in ElasticSearch to
provide a beautiful front-end to the log searching.  It also adds in the
functionality to do trending and analysis of logs from Logstash!  Keep in mind
you can create your own fields in logstash using grok, so we can extract,
trend, score, and analyze data in real-time in a fairly beautiful and powerful
interface.&lt;&#x2F;p&gt;
&lt;p&gt;Kibana fills the gap with the Logstash interface so perfectly.  It doesn&#x27;t
give me everything I&#x27;d get with Splunk, but I&#x27;ve just touched the
functionality I can extract with Logstash.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;step-3-all-your-stats-are-belong-to-graphite&quot;&gt;Step 3: All your stats are belong to Graphite&lt;&#x2F;h1&gt;
&lt;p&gt;&lt;a href=&quot;http:&#x2F;&#x2F;graphite.wikidot.com&#x2F;&quot;&gt;Graphite&lt;&#x2F;a&gt; is awesome.  If you&#x27;re not using it,
&lt;em&gt;&lt;strong&gt;DROP EVERYTHING AND GET IT RUNNING NOW&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;.  For a foray into it&#x27;s
awesomeness, here&#x27;s a &lt;a href=&quot;http:&#x2F;&#x2F;www.slideshare.net&#x2F;reyjrar&#x2F;graphite-overview&quot;&gt;quick overview of it&#x27;s
features&lt;&#x2F;a&gt;.  This
presentation is based off something that &lt;a href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;obfuscurity&quot;&gt;Jason
Dixon&lt;&#x2F;a&gt; shared with me, so please make sure
you visit his blog and read his series of misnamed &amp;quot;Unhelpful Graphite Tips&amp;quot;:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-1&quot;&gt;Unhelpful Graphite Tip #1&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-2&quot;&gt;Unhelpful Graphite Tip #2&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-3&quot;&gt;Unhelpful Graphite Tip #3&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-4&quot;&gt;Unhelpful Graphite Tip #4&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-5&quot;&gt;Unhelpful Graphite Tip #5&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-6&quot;&gt;Unhelpful Graphite Tip #6&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-7&quot;&gt;Unhelpful Graphite Tip #7&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-8&quot;&gt;Unhelpful Graphite Tip #8&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-9&quot;&gt;Unhelpful Graphite Tip #9&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;obfuscurity.com&#x2F;2012&#x2F;04&#x2F;Unhelpful-Graphite-Tip-10&quot;&gt;Unhelpful Graphite Tip #10&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;If you&#x27;ve taken the time to read the overview and those tips, you&#x27;re probably
thinking &amp;quot;OMG I CAN GRAPH EVERYTHING.&amp;quot;  If you&#x27;re not thinking that, re-read
everything and reconsider.  Perhaps you missed
&lt;a href=&quot;http:&#x2F;&#x2F;graphite.readthedocs.org&#x2F;en&#x2F;0.9.9&#x2F;functions.html#graphite.render.functions.timeShift&quot;&gt;timeShitft()&lt;&#x2F;a&gt;
? Perhaps you&#x27;re not a wanna-be statistics nerd like me?  I strongly suggest
you become one, maybe &lt;a href=&quot;http:&#x2F;&#x2F;greenteapress.com&#x2F;thinkstats&#x2F;&quot;&gt;start here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;So, you can write grok patterns to extract metrics from your logs, you can
then use &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;etsy&#x2F;statsd&quot;&gt;statsd&lt;&#x2F;a&gt; to track those metrics in
Graphite.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;stfu-information-overload&quot;&gt;STFU, Information Overload&lt;&#x2F;h1&gt;
&lt;p&gt;Agreed, this is a lot for a single post, and I still have so much to say on
all the technology here.  I will try to keep brain-dumping as I fine-tune my
setup.  I aimed to answer the question, how close can I get to Splunk with
open source software?  I honestly don&#x27;t know.  I&#x27;m relying on you folks to
find out.  Start a conversation with me on twitter,
&lt;a href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;reyjrar&#x2F;&quot;&gt;@reyjrar&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;update&quot;&gt;UPDATE!&lt;&#x2F;h1&gt;
&lt;ul&gt;
&lt;li&gt;Anja Skrba kindly volunteered to translate this page into Serbian.  You can find his &lt;a href=&quot;http:&#x2F;&#x2F;science.webhostinggeeks.com&#x2F;centralno-logovanje-sa-softverom-otvorenog-izvora&quot;&gt;Serbian translation here&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;10-themes.com&#x2F;&quot;&gt;Atilla Debredeni&lt;&#x2F;a&gt; volunteered to translate this page into Hungarian. You can find his &lt;a href=&quot;http:&#x2F;&#x2F;10-themes.com&#x2F;edu&#x2F;kozponti-naplozasa-open-source-szoftver&#x2F;&quot;&gt;Hungarian translation here&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;theautoz.com&quot;&gt;Irakli Nishnianidze&lt;&#x2F;a&gt; volunteered to translate this page into Georgian. You can find his &lt;a href=&quot;http:&#x2F;&#x2F;theautoz.com&#x2F;blog&#x2F;central-logging-with-open-source-software&#x2F;&quot;&gt;Georgian translation here&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Statistics, Risk Analysis, and Misunderstandings</title>
		<published>2010-06-11T00:00:00+00:00</published>
		<updated>2010-06-11T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/statistics-risk-analysis-and-misunderstandings/" type="text/html"/>
		<id>https://divisionbyzero.net/statistics-risk-analysis-and-misunderstandings/</id>
		<content type="html">&lt;p&gt;I married a Statistician, so &lt;a href=&quot;http:&#x2F;&#x2F;lesswrong.com&#x2F;lw&#x2F;2bu&#x2F;your_intuitions_are_not_magic&quot;&gt;this
article&lt;&#x2F;a&gt; sums the
lectures I receive on a daily basis.  Risk Management is statistical analysis,
and I&#x27;m not sure how many folks in IT Security have Graduate level Stat
exposure.  So, the understanding of our statistical shortcomings is key.  You
need to read that entire article, twice. &lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;This statement struck me, as I&#x27;ve noticed a scary trend in IT Security:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;People who know a little bit of statistics - enough to use statistical
techniques, not enough to understand why or how they work - often end up
horribly misusing them.  Statistical tests are complicated mathematical
techniques, and to work, they tend to make numerous assumptions.  The
problem is that if those assumptions are not valid, most statistical tests
do not cleanly fail and produce obviously false results.&amp;quot;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;As we outsource more security, and buy more products, we must be careful, as
this statement is also true:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;People who know a little bit of &lt;em&gt;IT Security&lt;&#x2F;em&gt; - enough to use an &lt;em&gt;IDS or
SIEM&lt;&#x2F;em&gt;, not enough to understand why or how they work - often end up
horribly misusing them.  &lt;em&gt;Security tools&lt;&#x2F;em&gt; use &lt;em&gt;complicated technical
techniques&lt;&#x2F;em&gt;, and to work, they tend to make numerous assumptions. The
problem is that if those assumptions are not valid, most security tools do
not cleanly fail and produce obviously false results.&amp;quot;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;My wife&#x27;s constant guidance in Statistics has been invaluable to my evaluations
of IT Security Policy and Implementation.  When I came across this article
thanks to &lt;a href=&quot;http:&#x2F;&#x2F;twitter.com&#x2F;alexhutton&quot;&gt;@alexhutton&lt;&#x2F;a&gt;, I had to share it!&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Screen Scraping HTML</title>
		<published>2005-04-06T00:00:00+00:00</published>
		<updated>2005-04-06T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/html-parsing/" type="text/html"/>
		<id>https://divisionbyzero.net/html-parsing/</id>
		<content type="html">&lt;p&gt;We&#x27;ve all found useful information on the web.  Occassionally, its even
necessary to retrieve that information in an automated fashion.  It could be
just for your own amusement, possibly a new web service that hasn&#x27;t yet
published an API, or even a critical business partner who only exposes a web
based interface to you.&lt;&#x2F;p&gt;
&lt;p&gt;Of course, screen scraping web pages is not the optimal solution to any
problem, and I highly advise you to look into APIs or formal web services
that will provide a more consistent and intentional programming interface.
Potential problems could arise for a number of reasons.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;step-0-considerations&quot;&gt;Step 0 : Considerations&lt;&#x2F;h2&gt;
&lt;p&gt;Most obvious and annoying problem is you are not guaranteed any form of
consistency in the presentation of your data.  Websites are under
construction constantly.  Even when they look the same, programmers and
designers are behind the scenes tweaking little pieces to optimize,
straighten, or update.  This means that your data is likely to move or
disappear entirely.  As you can imagine, this can lead to erroneous data or
your program failing to complete.&lt;&#x2F;p&gt;
&lt;p&gt;A problem that you might not think of immediately is the impact of your
screen scraping on the target&#x27;s web server.  During the development phase
especially, you should give serious thought the mirroring the website using
any number of mirroing applications available on the web.  This will protect
against you accidentally Denial of Servicing the target&#x27;s web site.  Once
you move to production, out of common courtesy, you should limit the running
of your program to as few times as possible to provide you with the accuracy
your required.  Obviously, if this is a business-to-business transaction,
you should keep the other guy in the loop.  It won&#x27;t be good for your
business relationships should you trip the other companies Intrusion
Detection System and then have to explain what you&#x27;re to a defensive
security administrator.&lt;&#x2F;p&gt;
&lt;p&gt;Along the same lines, consider the legality of the screen scraping.  To a
web server, your traffic could masquerade as 100% interactive, valid
traffic, but upon closer inspection, a wise system administrator will likely
put the pieces together.  Search that companies website for &amp;quot;Acceptable Use
Policies&amp;quot; and &amp;quot;Terms of Service.&amp;quot;  In some cases, they may not apply but it&#x27;s
likely that the privilege to access the data is granted only after agreeing
to one of the two aforementioned documents.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-1-research&quot;&gt;Step 1 : Research&lt;&#x2F;h2&gt;
&lt;P&gt;
At this point, it&#x27;s necessary to dive into the task at hand.  Go through the
motions manually in a web browser that supports thorough debugging.  My
experience with &lt;a
href=&quot;http:&#x2F;&#x2F;www.mozilla.org&#x2F;products&#x2F;firefox&#x2F;&quot;&gt;Firefox&lt;&#x2F;a&gt; has always been
a positive one.  Through the use of tools like the &lt;a
href=&quot;http:&#x2F;&#x2F;www.mozilla.org&#x2F;projects&#x2F;inspector&#x2F;&quot;&gt;DOM Inspector&lt;&#x2F;a&gt;, the
built in Javascript Debugger, and extensions like &lt;a
href=&quot;http:&#x2F;&#x2F;www.chrispederick.com&#x2F;work&#x2F;firefox&#x2F;webdeveloper&#x2F;&quot;&gt;Web
Developer&lt;&#x2F;a&gt;, &lt;a
href=&quot;http:&#x2F;&#x2F;www.webalice.it&#x2F;davide.ficano&#x2F;firefox&#x2F;viewsourcewith.html&quot;&gt;View
Source With ..&lt;&#x2F;a&gt;, and &lt;a
href=&quot;http:&#x2F;&#x2F;www.hacksrus.com&#x2F;~ginda&#x2F;venkman&#x2F;&quot;&gt;Venkman&lt;&#x2F;a&gt; its been one of
the best platforms for web development I&#x27;ve encountered.  Incidentally, the
elements of web design are critical to the automated extraction of that
data. There are two phases to debug to write a good screen scraper.
&lt;&#x2F;p&gt;
&lt;h3&gt;The Request&lt;&#x2F;h3&gt;
&lt;p&gt;A web server is not a mind reader, it has to know what you&#x27;re after.
HTTP Requests tell the web server what document to serve and how to serve
it.  The request can be issued through the address bar, a form, or a link.
As you navigate the site, take note of the parameters passed in the Query
String of the URL.  If you need to login, use the &lt;b&gt;Web Developer&lt;&#x2F;b&gt;
Extension to &quot;Display Form Details&quot; and take note of the names of the login
prompt and the form objects themselves.  Also, its important to take note of
the &quot;METHOD&quot; the form is going to use, either &quot;GET&quot; or &quot;POST&quot;.  As you go
through, sketch out the process on a scrap piece of paper with details on
the parameters along the way.  If you&#x27;re clicking on links to get where you
need, use the right click option of &quot;View Link Properties&quot; to get details.
&lt;&#x2F;p&gt;
&lt;p&gt;
A key thing people often miss when doing web automation is the effect of
client side scripting.  You can use &lt;b&gt;Venkman&lt;&#x2F;b&gt; to step through the
entire run of client side code.  You want to pay attention to hidden form
fields that are often set &quot;onClick&quot; of the submit button, or through other
types of normal user interaction.  Without knowing and setting these hidden
fields to the correct value, the page will refuse to load or cause problems.
Granted, this isn&#x27;t good practice on the site designer&#x27;s part as a growing
number of security aware web surfers are limiting, or disabling client side
scripting entirely.
&lt;&#x2F;P&gt;
&lt;h3&gt;The Response&lt;&#x2F;h3&gt;
&lt;p&gt;After sketching out the path to your data, you&#x27;ve finally arrived at the
page that contains the data itself.  You now need to map out the page in a
way that your data can be identified from the rest of the insignificant
details, styling, and advertisements!  I&#x27;ve always believed in syntax
highlighting and have become accustomed to &lt;a
href=&quot;http:&#x2F;&#x2F;www.vim.org&quot;&gt;vim&#x27;s&lt;&#x2F;a&gt; flavor of highlighting.  I&#x27;ve got the
&lt;b&gt;View Source With ..&lt;&#x2F;b&gt; Extension configured to use gvim.  So I right
click and with any luck, the page source is displayed in the gvim buffer
with syntax highlighting enabled.  If the page has a weird extension, or no
extension, I might have to &quot;set syntax=html&quot; if its not presenting the
proper page headers.  Search through the source file, correlating the visual
representations in the browser with the source code that&#x27;s generating them.
You&#x27;ll need to find landmarks in the HTML to use as a means to guide your
parser through an obscure landscape of markup language.  If you&#x27;re having
problems, another indispensible tool provided by &lt;b&gt;Firefox&lt;&#x2F;b&gt; is the &quot;View
Selection Source&quot;.  To use it, simply highlight some content and then right
click -&amp;gt; &quot;View Selection Source&quot;.  A Mozilla Source viewer opens with
just the HTML that generated the selected content highlighted with some
surrounding HTML to provide context.
&lt;&#x2F;p&gt;
&lt;p&gt;You&#x27;re going to have to start thinking like a machine.  Think Simple, 1&#x27;s
and 0&#x27;s, true and false!  I usually start at my data and start working back,
looking for a unique tag or pattern that I can use to locate the data moving
forward.  Look not only at the HTML Elements (&amp;lt;b&amp;gt;,&amp;lt;td&amp;gt;, etc) but
at their attributes (color=&quot;#FF000&quot;,colspan=&quot;3&quot;) to profile the areas
containing and surrounding your data.
&lt;&#x2F;p&gt;
&lt;p&gt;The lay of the land is changing these days.  It should be getting much
easier to treat HTML as a data source thanks Web Standards and the alarming
number of web designers pushing whole-heartedly for their adoption.  The old
table based layouts, styled by font tags and animated GIFs is giving way to
&quot;Document Object Model&quot; aware design and styling fueled mostly by Cascading
Style Sheets (CSS).  CSS works most effectively when the document layout
emulates an object.  There are &quot;classes&quot;, &quot;ids&quot;, and tags establish
relationships.  CSS makes it trivial for Web Designers with passion and
experience in Design Arts, to cooperate with Web Programmers whose passion
is the Art of Programming and whose idea of &quot;progressive design&quot; is white
text on a black background!  The cues that Programmers and Designers specify
to insure interoperability of Content and Presentation gives the Screen
Scraper a legible road map by which to extract their data.  If you see
&quot;div&quot;, &quot;span&quot;, &quot;tbody&quot;, &quot;theader&quot; elements bearing attributes like &quot;class&quot;
and &quot;id&quot; favor using these elements as landmarks.  Though nothing is
guaranteed, it&#x27;s much more likely that these elements will maintain their
relationships as they&#x27;re often the result of divisional cooperation than
entropy.
&lt;&#x2F;P&gt;
&lt;p&gt;One of the simplest ways to keep your bearing is to print out the section
of HTML you&#x27;re targetting, and sketch out some simple logic to be able to
quickly identify it.  I use a highlighter and a red pen to make notes on the
print out that I can glance at as a sanity check.&lt;&#x2F;p&gt;
&lt;h2&gt;Step 2 : Automated Retrieval of Your Content&lt;&#x2F;h2&gt;
&lt;p&gt;Depending on how complicated the path to your data, there are a number of
tools available.  Basic &quot;GET&quot; method requests that don&#x27;t require cookies,
session management, or form tracking can take advantage of the simple
interface provided by the &lt;a
href=&quot;http:&#x2F;&#x2F;search.cpan.org&#x2F;~gaas&#x2F;libwww-perl-5.803&#x2F;lib&#x2F;LWP&#x2F;Simple.pm&quot;&gt;LWP::Simple&lt;&#x2F;a&gt;
package.  &lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;strict;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;LWP::Simple;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$url &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;q&lt;&#x2F;span&gt;&lt;span&gt;|&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;http:&#x2F;&#x2F;www.weather.com&#x2F;weather&#x2F;local&#x2F;21224&lt;&#x2F;span&gt;&lt;span&gt;|;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$content &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;get $url;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;print &lt;&#x2F;span&gt;&lt;span&gt;$content;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That&#x27;s it.  Simple.&lt;&#x2F;p&gt;
&lt;p&gt;More complex problems with cookies and login&#x27;s will require a more
sophisticated tool.  &lt;a
href=&quot;http:&#x2F;&#x2F;search.cpan.org&#x2F;~petdance&#x2F;WWW-Mechanize-1.12&#x2F;lib&#x2F;WWW&#x2F;Mechanize.pm&quot;&gt;WWW::Mechanize&lt;&#x2F;a&gt;
offers a simple a solution to a complex path to your data with the ability
to store cookies and construct form objects that can intelligently
initialize themselves. An example:
&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;strict;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;WWW::Mechanize;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$authPage &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;q&lt;&#x2F;span&gt;&lt;span&gt;|&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;http:&#x2F;&#x2F;www.weather.com&lt;&#x2F;span&gt;&lt;span&gt;|;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$authForm &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;whatwhere&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;%formVars &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;(
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;where   &lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;21224&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;,
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;what    &lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;Weather36HourUndeclared&amp;#39;
&lt;&#x2F;span&gt;&lt;span&gt;);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# or optionally, set the fields in visible order
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;@visible &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;qw&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;21224&lt;&#x2F;span&gt;&lt;span&gt;);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Create a &amp;quot;bot&amp;quot;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$bot &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;new WWW::Mechanize();
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Masquerade as Mac Firefox
&lt;&#x2F;span&gt;&lt;span&gt;$bot-&amp;gt;agent_alias(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;Mac Mozilla&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Retrieve the page with our &amp;quot;login form&amp;quot;
&lt;&#x2F;span&gt;&lt;span&gt;$bot-&amp;gt;get($authPage);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# fill out the form!
&lt;&#x2F;span&gt;&lt;span&gt;$bot-&amp;gt;form_name($authForm);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;while&lt;&#x2F;span&gt;&lt;span&gt;( &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;($k,$v) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;each &lt;&#x2F;span&gt;&lt;span&gt;%formVars ) {
&lt;&#x2F;span&gt;&lt;span&gt;    $bot-&amp;gt;field($k,$v);
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# OR
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# $bot-&amp;gt;set_visible(@visible);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# submit the form!
&lt;&#x2F;span&gt;&lt;span&gt;$bot-&amp;gt;submit();
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Print the Content
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;print &lt;&#x2F;span&gt;&lt;span&gt;$bot-&amp;gt;content();
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;step-3-data-processing&quot;&gt;Step 3 : Data Processing&lt;&#x2F;h2&gt;
&lt;p&gt;
There are two main ways to parse markup languages like HTML, XHTML, and XML.
I&#x27;ve always preferred dealing with the &quot;Event Driven&quot; methodology.
Essentially, as the document is parsed, new tags trigger events in the code,
calling functions you&#x27;ve defined with the attributes of the tag included as
arguments.  The content between a start and end tag is handled through
another callback function that you&#x27;ve defined.  This method requires that
you build your own data structures.  The second method parses the entire
document, building a tree like object from it which it then returns to the
programmer as an object.  This second method is very useful when you have to
process an entire document, modify its contents and then transform it back
into markup language.  Usually, a screen scraping program cares very little
for the &quot;entire document&quot; and more for the interesting tidbits, everything
else can be ignored.
&lt;&#x2F;P&gt;
&lt;h3&gt;HTML::Parser&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;a
href=&quot;http:&#x2F;&#x2F;search.cpan.org&#x2F;~gaas&#x2F;HTML-Parser-3.45&#x2F;Parser.pm&quot;&gt;HTML::Parser&lt;&#x2F;a&gt;
is an event driven HTML parser module available on &lt;a
href=&quot;http:&#x2F;&#x2F;www.cpan.org&quot;&gt;CPAN&lt;&#x2F;a&gt;. Using the above content retrieval code
snippet, delete the &quot;print $bot-&gt;content();&quot; line, and insert this code,
with &quot;use&quot; statements at the top for consistency.
&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;HTML::Parser;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# store the content;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$content &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;$bot-&amp;gt;content();
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# variables for use in our parsing sub routines:
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$grabText &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;undef&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$textStr &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Parser Engine
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$parser &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;new HTML::Parser(
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;start_h &lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt; [ &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;\&amp;amp;&lt;&#x2F;span&gt;&lt;span&gt;tagStart, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;tagname, attr&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;],
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;end_h   &lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt; [ &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;\&amp;amp;&lt;&#x2F;span&gt;&lt;span&gt;tagStop, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;tagname&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;],
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;text_h  &lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt; [ &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;\&amp;amp;&lt;&#x2F;span&gt;&lt;span&gt;handleText, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;dtext&amp;quot; &lt;&#x2F;span&gt;&lt;span&gt;]
&lt;&#x2F;span&gt;&lt;span&gt;);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Call the parser!
&lt;&#x2F;span&gt;&lt;span&gt;$parser-&amp;gt;parse($content);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Display the results between the tag
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;print &lt;&#x2F;span&gt;&lt;span&gt;$textStr;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Handle the start tag
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;tagStart &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;($tagname,$attr) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;@_;
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span&gt;((&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;lc &lt;&#x2F;span&gt;&lt;span&gt;$tagname &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;eq &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;b&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;&amp;amp;&amp;amp; &lt;&#x2F;span&gt;&lt;span&gt;$attr-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;class&lt;&#x2F;span&gt;&lt;span&gt;} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;eq &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;obsTempTextA&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;) {
&lt;&#x2F;span&gt;&lt;span&gt;                $grabText &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;        }
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Handle the end tag
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;tagStop &lt;&#x2F;span&gt;&lt;span&gt;{   $grabText &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;undef&lt;&#x2F;span&gt;&lt;span&gt;; }
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# check to see if we&amp;#39;re grabbing the text;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;handleText &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;        $textStr &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;.= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;shift &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if &lt;&#x2F;span&gt;&lt;span&gt;$grabText;
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Using this, its simple to extract the temperature from the variable
$textStr.  If you wanted to extract more information, you could use a more
complex data structure to hold all the variables.  The important thing to
remember about the event based model is everything happens linearly.  It&#x27;s
good practice to keep state, either through a simple scalar, like the
$grabText var above, or in an array or hash.  If you&#x27;re dealing with data
that&#x27;s nested in several layers of tags, you might consider something like
this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;@nestedTags &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;();
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;tagStart &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;($tag,$attr) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;@_;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span&gt;($tag &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;eq &lt;&#x2F;span&gt;&lt;span&gt;$tagWeAreLookingFor) {
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;push &lt;&#x2F;span&gt;&lt;span&gt;@nestedTags,$tag;
&lt;&#x2F;span&gt;&lt;span&gt;    }
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;handleText &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$text &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;shift&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# In here, we can check where in the @nestedTag array we are, and do
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# different things based on location
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;scalar &lt;&#x2F;span&gt;&lt;span&gt;@nestedTags &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;== &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;4&lt;&#x2F;span&gt;&lt;span&gt;) {
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;print &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;Four Tags deep, we found: &lt;&#x2F;span&gt;&lt;span&gt;$text&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;!&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;    }
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;tagStop &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$tag &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;shift&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;pop &lt;&#x2F;span&gt;&lt;span&gt;@nestedTags &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if &lt;&#x2F;span&gt;&lt;span&gt;$tag &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;eq &lt;&#x2F;span&gt;&lt;span&gt;$tagWeAreLookingFor;
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This model works great for most screen scraping as we&#x27;re usually
interested in key pieces of data on a page byh page basis.  However,
this can quickly turn your program into a mess of handler subroutines and
complex tracking variables that make managing your screen scraper closer to
voodoo than programming.  Thankfully, HTML::Parser is fully prepared to make
our lives easier by supporting subclassing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-4-subclassing-for-sanity&quot;&gt;Step 4 : SubClassing for Sanity&lt;&#x2F;h2&gt;
&lt;p&gt;I usually like to have 1 subclassed HTML::Parser class per page.  In that
class I&#x27;ll include accessors to the relevant data on that page.  That way, I
can just &quot;use&quot; my class where I&#x27;m processing the data for that one page and
I can keep the main program relatively clean from unnecessary clutter.
&lt;&#x2F;p&gt;
&lt;p&gt;The following script, uses a simple interface to pull down the current
temperature in Fahrenheit.  The accessor method  allows the user to specify
the units they&#x27;d like the temperature back in.
&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#!&#x2F;usr&#x2F;bin&#x2F;perl
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;strict;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;LWP::Simple;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;MyParsers::Weather::Current;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$parser &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;new MyParsers::Weather::Current;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$content &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;get &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;http:&#x2F;&#x2F;www.weather.com&#x2F;weather&#x2F;local&#x2F;21224&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;$parser-&amp;gt;parse($content);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;print &lt;&#x2F;span&gt;&lt;span&gt;$parser-&amp;gt;getTemperature, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot; degrees fahrenheit.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;print &lt;&#x2F;span&gt;&lt;span&gt;$parser-&amp;gt;getTemperature(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;celsius&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;), &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot; degrees celsius.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;print &lt;&#x2F;span&gt;&lt;span&gt;$parser-&amp;gt;getTemperature(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;kelvin&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;), &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot; degrees kelvin.&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The script uses a homemade module &quot;MyParsers::Weather::Current&quot; to handle
all the parsing.  The code for that module is provided below.
&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;package &lt;&#x2F;span&gt;&lt;span&gt;MyParsers::Weather::Current;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;strict;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;use &lt;&#x2F;span&gt;&lt;span&gt;HTML::Parser;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Inherit
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;our &lt;&#x2F;span&gt;&lt;span&gt;@ISA &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;qw&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;HTML::Parser&lt;&#x2F;span&gt;&lt;span&gt;);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;%ExtraVariables &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;(
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;_found		&lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;undef&lt;&#x2F;span&gt;&lt;span&gt;,
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;_grabText	&lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;undef&lt;&#x2F;span&gt;&lt;span&gt;,
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;temp_F		&lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;undef&lt;&#x2F;span&gt;&lt;span&gt;,
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;temp_C		&lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;undef
&lt;&#x2F;span&gt;&lt;span&gt;);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Class Functions
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;new &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Call the Parent Constructor
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$self &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;HTML::Parser::new(@_);
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Call our local initialization function
&lt;&#x2F;span&gt;&lt;span&gt;    $self-&amp;gt;_init();
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;return &lt;&#x2F;span&gt;&lt;span&gt;$self;
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Internal Init Function to Setup the Parser.
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;_init &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$self &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;shift&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# init() is provided by the parent class
&lt;&#x2F;span&gt;&lt;span&gt;    $self-&amp;gt;init(
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;start_h	&lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt;  [ &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;\&amp;amp;&lt;&#x2F;span&gt;&lt;span&gt;_handler_tagStart, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;self, tagname, attr&amp;#39; &lt;&#x2F;span&gt;&lt;span&gt;],
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;end_h	&lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt;  [ &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;\&amp;amp;&lt;&#x2F;span&gt;&lt;span&gt;_handler_tagStop, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;self, tagname&amp;#39; &lt;&#x2F;span&gt;&lt;span&gt;],
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;text_h	&lt;&#x2F;span&gt;&lt;span&gt;=&amp;gt;  [ &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;\&amp;amp;&lt;&#x2F;span&gt;&lt;span&gt;_handler_text, &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;self, dtext&amp;#39; &lt;&#x2F;span&gt;&lt;span&gt;],
&lt;&#x2F;span&gt;&lt;span&gt;    );
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Set up the rest of the object
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;foreach &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$k (&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;keys &lt;&#x2F;span&gt;&lt;span&gt;%ExtraVariables) {
&lt;&#x2F;span&gt;&lt;span&gt;        $self-&amp;gt;{$k} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;$ExtraVariables{$k};
&lt;&#x2F;span&gt;&lt;span&gt;    }
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Accessors
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;getTemperature &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;($self,$type) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;@_;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;unless&lt;&#x2F;span&gt;&lt;span&gt;( $self-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;_found&lt;&#x2F;span&gt;&lt;span&gt;} ) {
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;print &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;STDERR &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;either you forgot to call parse, or the temp data was
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;    not found!&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;return&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;    }
&lt;&#x2F;span&gt;&lt;span&gt;    $type &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;fahrenheit&amp;#39; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;unless &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;length &lt;&#x2F;span&gt;&lt;span&gt;$type;
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Remove the first character from the temperature string
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$t &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;temp_&amp;#39; &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;. &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;uc substr&lt;&#x2F;span&gt;&lt;span&gt;($type,&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;);
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;return &lt;&#x2F;span&gt;&lt;span&gt;$self-&amp;gt;{$t} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;exists &lt;&#x2F;span&gt;&lt;span&gt;$self-&amp;gt;{$t};
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;print &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;STDERR &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;Unknown Temperature Type (&lt;&#x2F;span&gt;&lt;span&gt;$type&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;) !&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;\n&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;return &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;undef&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Parsing Functions
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;_handler_tagStart &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;($self,$tag,$attr) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;@_;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span&gt;((&lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;lc &lt;&#x2F;span&gt;&lt;span&gt;$tag &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;eq &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;b&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;&amp;amp;&amp;amp; &lt;&#x2F;span&gt;&lt;span&gt;$attr-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;class&lt;&#x2F;span&gt;&lt;span&gt;} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;eq &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;obsTempTextA&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;) {
&lt;&#x2F;span&gt;&lt;span&gt;        $self-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;_grabText&lt;&#x2F;span&gt;&lt;span&gt;} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;        $self-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;_found&lt;&#x2F;span&gt;&lt;span&gt;} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;    }
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;_handler_tagStop &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$self &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;shift&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;    $self-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;_grabText&lt;&#x2F;span&gt;&lt;span&gt;} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;undef&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#f92672;&quot;&gt;sub &lt;&#x2F;span&gt;&lt;span style=&quot;color:#a6e22e;&quot;&gt;_handler_text &lt;&#x2F;span&gt;&lt;span&gt;{
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;($self,$text) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;@_;
&lt;&#x2F;span&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span&gt;($self-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;_grabText&lt;&#x2F;span&gt;&lt;span&gt;}) {
&lt;&#x2F;span&gt;&lt;span&gt;        &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my&lt;&#x2F;span&gt;&lt;span&gt;($temp,$forc) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;($text &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;=~ &lt;&#x2F;span&gt;&lt;span&gt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;(\d+).*([&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;CF&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;])&lt;&#x2F;span&gt;&lt;span&gt;&#x2F;)) {
&lt;&#x2F;span&gt;&lt;span&gt;            &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span&gt;($forc &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;eq &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;C&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;) {
&lt;&#x2F;span&gt;&lt;span&gt;                $self-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;temp_C&lt;&#x2F;span&gt;&lt;span&gt;} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;$temp;
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Fahrenheit doesn&amp;#39;t really make decimals places useful
&lt;&#x2F;span&gt;&lt;span&gt;                $self-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;temp_F&lt;&#x2F;span&gt;&lt;span&gt;} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;int&lt;&#x2F;span&gt;&lt;span&gt;((&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;9&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;5&lt;&#x2F;span&gt;&lt;span&gt;) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;* &lt;&#x2F;span&gt;&lt;span&gt;($temp&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;+&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;32&lt;&#x2F;span&gt;&lt;span&gt;));
&lt;&#x2F;span&gt;&lt;span&gt;            }
&lt;&#x2F;span&gt;&lt;span&gt;            &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;elsif&lt;&#x2F;span&gt;&lt;span&gt;($forc &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;eq &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;F&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;) {
&lt;&#x2F;span&gt;&lt;span&gt;                $self-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;temp_F&lt;&#x2F;span&gt;&lt;span&gt;} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;$temp;
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;#
&lt;&#x2F;span&gt;&lt;span&gt;                &lt;&#x2F;span&gt;&lt;span style=&quot;color:#75715e;&quot;&gt;# Use precision to 2 decimal places
&lt;&#x2F;span&gt;&lt;span&gt;                $self-&amp;gt;{&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;temp_C&lt;&#x2F;span&gt;&lt;span&gt;} &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#66d9ef;&quot;&gt;sprintf&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;%.2f&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;, (&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;5&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;9&lt;&#x2F;span&gt;&lt;span&gt;) &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;* &lt;&#x2F;span&gt;&lt;span&gt;($temp&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;-&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;32&lt;&#x2F;span&gt;&lt;span&gt;));
&lt;&#x2F;span&gt;&lt;span&gt;            }
&lt;&#x2F;span&gt;&lt;span&gt;        }
&lt;&#x2F;span&gt;&lt;span&gt;    }
&lt;&#x2F;span&gt;&lt;span&gt;}
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;&#x2F;h2&gt;
&lt;p&gt;HTML can be an incredibly effective transport mechanism for data, even if
the original author hadn&#x27;t intended it to be that way.  With the advent of
Web Services and Standards Compliant designs utilizing Cascading Style
Sheets, its becoming more and more interoperable and cooperative.  Learning
to use screen scraping techniques can provide a wealth of information for
the programmer to analyze and format to their heart&#x27;s content.&lt;&#x2F;p&gt;
&lt;p&gt;As an exercise, you might want to expand on the
&lt;code&gt;MyParsers::Weather::Current&lt;&#x2F;code&gt; object to pull additional information from
weather.com&#x27;s page, and add a few more accessors!  If you&#x27;d really like a
challenge, it&#x27;d be kind of fun to write a parser for each of the major
weather sites, pull the data for forecasting down, and use a weighted
average based on the individual sites accuracy in the past to get an
&amp;quot;educated guess&amp;quot; at the weather conditions!&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Regular Expression Primer</title>
		<published>2004-03-24T00:00:00+00:00</published>
		<updated>2004-03-24T00:00:00+00:00</updated>
		<link rel="alternate" href="https://divisionbyzero.net/regex-primer/" type="text/html"/>
		<id>https://divisionbyzero.net/regex-primer/</id>
		<content type="html">&lt;p&gt;&amp;quot;Regular Expression&amp;quot; is a fancy way to say &amp;quot;pattern matcher.&amp;quot;  Humans
can match patterns with relative ease.  A machine has a bit more difficulty
deciphering patterns, especially in text.  As computing became more
powerful, the methods for matching text grew into more flexible dialects.&lt;&#x2F;p&gt;
&lt;p&gt;Regular expressions can be one of the toughest concepts to grasp and use
effectively in any programming language.  Perl is no exception as its
regular expressions engine is perhaps the most advanced regex engine in
existence.  Its power and flexibility also serve to confuse and intimidate
many new comers. It is important to understand the Regular Expression engine
as its often the cause of serious bottlenecks in programs of all shapes and
sizes.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;&#x2F;h2&gt;
&lt;p&gt;This introduction aims to cover the basics of regular expressions as they
pertain specifically to host and network administration.  There are a large
number of resources available which present regular expressions in a much
broader context.  A few key things to note before proceeding:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;quot;regex&amp;quot; is short for &amp;quot;Regular Expression&amp;quot;&lt;&#x2F;li&gt;
&lt;li&gt;&amp;quot;regex engine&amp;quot; is the component which translates regex into patterns&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x2F;abcd&#x2F;&lt;&#x2F;b&gt; isn&#x27;t just &#x27;abcd&#x27;, its &#x27;a&#x27; followed by &#x27;b&#x27; followed by &#x27;c&#x27; followed by &#x27;d&#x27;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;meta-characters&quot;&gt;Meta-characters&lt;&#x2F;h2&gt;
&lt;p&gt;Meta-characters are those characters which the regex engine already has
special meaning for.  In order to match these special characters, its
necessary to prefix them with a back slash &amp;quot;&amp;quot;.  The following
meta-characters will be covered in depth:&lt;&#x2F;p&gt;
&lt;table border=0 cellspacing=1 cellpadding=1&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;\&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Escape the character proceeded&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;.&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Match any &lt;u&gt;single&lt;&#x2F;u&gt; character&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;a|z&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches &lt;i&gt;a&lt;&#x2F;i&gt; or &lt;i&gt;z&lt;&#x2F;i&gt;&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;^&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Anchor&lt;&#x2F;i&gt;, Matches the beginning of a string&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;$&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Anchor&lt;&#x2F;i&gt;, Matches the end of a string&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;*&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Quantifier&lt;&#x2F;i&gt;, 0 or more of previous group or
character&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;+&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Quantifier&lt;&#x2F;i&gt;, 1 or more of previous group or
character&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;?&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Quantifier&lt;&#x2F;i&gt;, 0 or 1 of previous group or
character&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;{a,z}&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Quantifier&lt;&#x2F;i&gt;, matches if the previous group was found
between &lt;i&gt;a&lt;&#x2F;i&gt; and &lt;i&gt;z&lt;&#x2F;i&gt; times.&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;{a,}&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Quantifier&lt;&#x2F;i&gt;, matches if the previous group was found
atleast &lt;i&gt;a&lt;&#x2F;i&gt; times.&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;{,z}&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Quantifier&lt;&#x2F;i&gt;, matches if the previous group was found
no more than &lt;i&gt;z&lt;&#x2F;i&gt; times.&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;{n}&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Quantifier&lt;&#x2F;i&gt;, matches if the previous group was found
&lt;u&gt;exactly&lt;&#x2F;u&gt; &lt;i&gt;n&lt;&#x2F;i&gt; times.&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;[abcd]&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Character Class&lt;&#x2F;i&gt;, matches if the character in this
position is either an a, b, c, or d&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;[^abcd]&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Inverted Character Class&lt;&#x2F;i&gt;, matches if the character in this
position is &lt;u&gt;not&lt;&#x2F;u&gt; an a, b, c, or d&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;(abcd)&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;&lt;i&gt;Grouping&lt;&#x2F;i&gt;, groups the matches in the parentheses
into a reference&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;&#x2F;table&gt;
&lt;h2&gt;Grouping&lt;&#x2F;h2&gt;
Grouping affords three main benefits, the ability to capture the data
that regex matches, the ability to link several patterns together for
quantifying, and the ability to reference the data matched by that group
later in the regex.  Capturing and linking are the two most common uses for
grouping.  Back references are rarely needed to accomplish most tasks and
are usually presented in a manner beyond the scope of this article.
&lt;p&gt;Matching text is useful, but usually a programmer is searching for a
word, IP address, or URL buried inside of some text relatively positioned
near a distinguishable mark of some sort.  Usually, that mark is not
important to the rest of the program, but the text recovered from knowing
its position is invaluable.  In this case, grouping is used to capture the
important data, while leaving the rest of the regex to be forgotten as soon
as its finished evaluating.  Perl &quot;remembers&quot; the results of the groups in
special variables &lt;b&gt;$1, $2, $3,&lt;&#x2F;b&gt; etc. based on the position of the
opening parenthesis.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$line &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;First Name:     Bob&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;$line &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;=~ &lt;&#x2F;span&gt;&lt;span&gt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;^&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;First Name :&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;\s+(\S+)&lt;&#x2F;span&gt;&lt;span&gt;&#x2F;;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$first_name &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;$1;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;b&gt;$first_name&lt;&#x2F;b&gt; will now contain &quot;Bob&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Using group to link together pieces of a pattern, it&#x27;s possible to
quantify that group as a whole.  This is the simplest use of grouping and is
incredibly powerful at the same time.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;&#x2F;^&lt;&#x2F;span&gt;&lt;span&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;ab&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;+&lt;&#x2F;span&gt;&lt;span&gt;$&#x2F;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This regex will match &lt;i&gt;&#x27;ab&#x27;&lt;&#x2F;i&gt;, as well as &lt;i&gt;&#x27;abab&#x27;&lt;&#x2F;i&gt;, and
&lt;i&gt;&#x27;abababababababababababab&#x27;&lt;&#x2F;i&gt;.&lt;&#x2F;P&gt;
&lt;h2&gt;Character Classes&lt;&#x2F;h2&gt;
&lt;p&gt;Character classes are sets of characters that can be in a set
position.  Assuming a line begins with a number, using a combination of the
&quot;beginning of string&quot; meta-character &#x27;^&#x27; and a character class which
represents any numeric character, it would be easy to match:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;&#x2F;^&lt;&#x2F;span&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;-&lt;&#x2F;span&gt;&lt;span style=&quot;color:#ae81ff;&quot;&gt;9&lt;&#x2F;span&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;&#x2F;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Matches the line.  It may be more desirable to match lines with one
or more numeric characters in the beginning:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;&#x2F;^[0-9]&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If more precision is required, its possible to specify the number
of digits that will satisfy our match using a more specific quantifier:
&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;&#x2F;^[0-9]{1,6}&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This will match a line that begins with one to six digits.
Surprisingly, this regex will still match lines with 7,8,9... digits. In
order to match that starts with 1 to 6 digits we need to tell the regex
engine that the next character can&#x27;t be a digit.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;&#x2F;^[0-9]{1,6}[^0-9]&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Using the Inverse Character Class we can be explicit and avoid any
confusion between our interpretation and the regex&#x27;s matching.&lt;&#x2F;p&gt;
&lt;p&gt;The caret in the Character Class served to invert the class.  Inside
the Character Class Meta-characters (&lt;b&gt;[]&lt;&#x2F;b&gt;) there are three
meta-characters:&lt;&#x2F;p&gt;
&lt;table border=0 cellspacing=1 cellpadding=1&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;^&lt;th&gt;
&lt;td align=&quot;left&quot;&gt;(as the first character only) Invert the character class&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;\&lt;th&gt;
&lt;td align=&quot;left&quot;&gt;Escape the next character&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;-&lt;th&gt;
&lt;td align=&quot;left&quot;&gt;Range modifier, translates a-d to abcd&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;&#x2F;table&gt;
&lt;p&gt;A simple example:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;&#x2F;[ab]&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Matches a or b&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;&#x2F;[^ab]&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Matches any character that&#x27;s NOT a or b&lt;&#x2F;p&gt;
&lt;p&gt;Breaking down the more complex regex, the engine reads it as:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;^&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Anchor, tells the engine to start the match at the beginning of the
string&lt;&#x2F;p&gt;
&lt;p&gt;&lt;i&gt;followed by ..&lt;&#x2F;i&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;[0-9]{1,6}&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Match 1 to 6 characters that are any of the following:
0,1,2,3,4,5,6,7,8,9&lt;&#x2F;p&gt;
&lt;p&gt;&lt;i&gt;followed by ..&lt;&#x2F;i&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;&lt;sup class=&quot;footnote-reference&quot;&gt;&lt;a href=&quot;#0-9&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;any character that is &lt;u&gt;not&lt;&#x2F;u&gt; one of the following:
0,1,2,3,4,5,6,7,8,9&lt;&#x2F;p&gt;
&lt;p&gt;Perl provides aliases to commonly used character classes to save typing
and reduce some of the complexity of regular expression authoring.
&lt;table border=0 cellspacing=1 cellpadding=1&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Alias&lt;&#x2F;th&gt;
&lt;th align=&quot;left&quot;&gt;Meaning&lt;&#x2F;th&gt;
&lt;th align=&quot;left&quot;&gt;Equivalent Character Class&lt;&#x2F;th&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;\d&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches a digit&lt;&#x2F;td&gt;
&lt;td align=&quot;left&quot;&gt;[0-9]&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;\D&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches a non-digit&lt;&#x2F;td&gt;
&lt;td align=&quot;left&quot;&gt;[^0-9]&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;\w&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches a word character, alphanumeric&lt;&#x2F;td&gt;
&lt;td align=&quot;left&quot;&gt;[a-zA-Z0-9]&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;\W&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches a non-word character, non-alphanumeric&lt;&#x2F;td&gt;
&lt;td align=&quot;left&quot;&gt;[^a-zA-Z0-9]&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;\s&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches a whitespace character&lt;&#x2F;td&gt;
&lt;td align=&quot;left&quot;&gt;[ \t\r\n\f]&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot; width=&quot;5em&quot;&gt;\S&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches a non-whitespace character&lt;&#x2F;td&gt;
&lt;td align=&quot;left&quot;&gt;[^ \t\r\n\f]&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;&#x2F;table&gt;
&lt;p&gt;Using these aliases, its possible to rewrite the previous example as:&lt;&#x2F;p&gt;
&lt;code&gt;&#x2F;^\d{1,6}\D&#x2F;&lt;&#x2F;code&gt;
&lt;h3&gt;Common Character Class Gotcha&lt;&#x2F;h3&gt;
&lt;p&gt;In an attempt to match an IP address, which can contain 4 numbers
ranged from 0 to 255 separated by a period, programmers often try something
along the lines of the following:&lt;&#x2F;p&gt;
&lt;code&gt;&#x2F;([0-255]\.){3}[0-255]&#x2F;&lt;&#x2F;code&gt;
&lt;p&gt;At first glance and close examination, its difficult to understand why
this regex does not match what the programmer is attempting to match.  When
using a character class, the key word is &lt;i&gt;character&lt;&#x2F;i&gt;.  In this context,
the regex engine is not concerned with numbers, but &lt;i&gt;characters&lt;&#x2F;i&gt;.  What
this regex optimizes to is:&lt;&#x2F;p&gt;
&lt;code&gt;&#x2F;([0125]\.){3}[0125]&#x2F;&lt;&#x2F;code&gt;
&lt;p&gt;This is most assuredly not what was intended.  The range modifier inside
of a character class evaluates the expression &quot;0-255&quot; as &quot;0-2&quot; + &quot;55&quot; or,
&quot;0125&quot; as duplicate entries in a character class are optimized out.  The
regex to properly match an IP address is very complicated and beyond the
scope of this article.  Assuming no one is attempting to enter IP&#x27;s in the
888.888.888.0&#x2F;24, a programmer might construct this regex:&lt;&#x2F;p&gt;
&lt;code&gt;&#x2F;(\d{1,3}\.?){4}&#x2F;&lt;&#x2F;code&gt;
&lt;p&gt;&lt;i&gt;Stay tuned for an in depth discussion on this regex.&lt;&#x2F;i&gt;&lt;&#x2F;p&gt;
&lt;h2&gt;Quantifiers&lt;&#x2F;h2&gt;
&lt;p&gt;Quantifiers allow a programmer to specify a determinately or
indeterminately scale the match of instances in their patterns.  There are
four quantifiers:&lt;&#x2F;p&gt;
&lt;table border=0 cellspacing=1 cellpadding=1&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;?&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches 0 or 1 consecutive instances of the previous group or
character&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;*&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches 0 or more consecutive instances of the previous group or
character&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;+&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches 1 or more consecutive instances of the previous group or
character&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;{a,z}&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Range quantifier, specify a minimum (&lt;i&gt;a&lt;&#x2F;i&gt;) and a
maximum (&lt;i&gt;z&lt;&#x2F;i&gt;) number of consecutive instances to match&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;&#x2F;table&gt;
&lt;h3&gt;Zero or More (*)&lt;&#x2F;h3&gt;
&lt;p&gt;The &#x27;*&#x27; quantifier is almost always misused.  Luckily, in most cases
its negligible, but could still have some unexpected results if a programmer
slips.  Given the lines:&lt;&#x2F;p&gt;
&lt;ol&gt;
    &lt;li&gt;a dog runs&lt;&#x2F;li&gt;
    &lt;li&gt;the dog jumps&lt;&#x2F;li&gt;
    &lt;li&gt;aaa is a car club&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Which lines will successfully match the following regex?&lt;&#x2F;p&gt;
&lt;code&gt;&#x2F;a*&#x2F;&lt;&#x2F;code&gt;
&lt;p&gt;Surprisingly, all three lines will match.  The regex engine will
always be able to find &quot;zero or more a&#x27;s&quot; in any line of text you send
it.&lt;&#x2F;p&gt;
&lt;p&gt;While this may not seem to be incredibly useful, it actually is.
There are times when a programmer needs to match some text if its there, or
just have an empty string or null if that text isn&#x27;t found.  This is where
the &quot;zero or more&quot; quantifier earns its keep.&lt;&#x2F;p&gt;
&lt;h3&gt;One or More (+)&lt;&#x2F;h3&gt;
&lt;p&gt;The &#x27;+&#x27; quantifier almost always is what a programmer means when they
use the &#x27;*&#x27; quantifier.  In the previous example:&lt;&#x2F;p&gt;
&lt;ol&gt;
    &lt;li&gt;a dog runs&lt;&#x2F;li&gt;
    &lt;li&gt;the dog jumps&lt;&#x2F;li&gt;
    &lt;li&gt;aaa is a car club&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Which lines will successfully match the following regex?&lt;&#x2F;p&gt;
&lt;code&gt;&#x2F;a+&#x2F;&lt;&#x2F;code&gt;
&lt;p&gt;In this case, only lines 1 and 3 will successfully match as there is
no &#x27;a&#x27; one or more times in line 2.  Often times, this is what was intended
when a &#x27;*&#x27; was used.&lt;&#x2F;p&gt;
&lt;h3&gt;Range Modifier ({a,z})&lt;&#x2F;h3&gt;
&lt;p&gt;The range modifier allows the programmer finer grain control of the
number of consecutive matches to consider.&lt;&#x2F;P&gt;
&lt;ol&gt;
    &lt;li&gt;a dog runs&lt;&#x2F;li&gt;
    &lt;li&gt;the dog jumps&lt;&#x2F;li&gt;
    &lt;li&gt;aaa is a car club&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Which lines will successfully match the following regex?&lt;&#x2F;p&gt;
&lt;code&gt;&#x2F;a{2,5}&#x2F;&lt;&#x2F;code&gt;
&lt;p&gt;In this case, only line 3 will successfully match.  The regex engine
is looking for 2 - 5 consecutive a&#x27;s.&lt;&#x2F;p&gt;
&lt;h2&gt;Greedy VS Non-Greedy&lt;&#x2F;h2&gt;
&lt;p&gt;Quantifiers come into two flavors, &quot;&lt;b&gt;Greedy&lt;&#x2F;b&gt;&quot; and
&quot;&lt;b&gt;Non-Greedy&lt;&#x2F;b&gt;&quot;.  The only difference between the two is their relative
ambition to match.  Most regex bottle necks are a direct result of poorly
written &lt;b&gt;Greedy&lt;&#x2F;b&gt; or &lt;b&gt;Non-Greedy&lt;&#x2F;b&gt; matches.&lt;&#x2F;p&gt;
&lt;p&gt;The regex engine wants to match every pattern its passed and it will
do everything in its power to match that regex.  This is why regex can slow
down a program so easily.  Misunderstanding the intention of the regex
engine could result in large regex being evaluated millions of times over
formidable text sample.&lt;&#x2F;p&gt;
&lt;h3&gt;Greedy Matching&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;b&gt;Greedy&lt;&#x2F;b&gt; matching seems to be the de-facto standard for most
regex tasks.  These quantifiers are dubbed &quot;greedy&quot; because they are very
ambitious and attempt to match as many times as they can while still
allowing the rest of the regular expression to match.  All of the
quantifiers presented thus far are greedy.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;perl&quot; style=&quot;background-color:#272822;color:#f8f8f2;&quot; class=&quot;language-perl &quot;&gt;&lt;code class=&quot;language-perl&quot; data-lang=&quot;perl&quot;&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$string&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span style=&quot;color:#e6db74;&quot;&gt;&amp;#39;1 2 3 4 5 6 7 8 9 10 12 12 13 14 15&amp;#39;&lt;&#x2F;span&gt;&lt;span&gt;;
&lt;&#x2F;span&gt;&lt;span&gt;$string &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;=~ &lt;&#x2F;span&gt;&lt;span&gt;&#x2F;&lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;^.*(\d+).*$&lt;&#x2F;span&gt;&lt;span&gt;&#x2F;;
&lt;&#x2F;span&gt;&lt;span style=&quot;font-style:italic;color:#66d9ef;&quot;&gt;my &lt;&#x2F;span&gt;&lt;span&gt;$number &lt;&#x2F;span&gt;&lt;span style=&quot;color:#f92672;&quot;&gt;= &lt;&#x2F;span&gt;&lt;span&gt;$1;
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;What will &lt;b&gt;$number&lt;&#x2F;b&gt; contain?  Without understanding greedy and
non-greedy, and how the regex engine goes about satisfying the pattern, its
very difficult for a beginner to answer correctly.  The answer is the &quot;5&quot;
highlighted in red below: &lt;p&gt;
&lt;code&gt;
&#x27;1 2 3 4 5 6 7 8 9 10 11 12 13 14 1&lt;font color=&quot;#FF000&quot;&gt;&lt;b&gt;5&lt;&#x2F;b&gt;&lt;&#x2F;font&gt;&#x27;
&lt;&#x2F;code&gt;
&lt;p&gt;This example is confusing to beginners.  Inspecting how the match is
made should clear things up and hopefully take a giant leap towards
understanding regex in general.  Greedy quantifiers match the maximum number
of instances they can on their first pass.  If the rest of the regex fails
as a result of the greedy quantifier, it will give up its bounty, one
character at a time until the entire regex can match.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;b&gt;&#x27;^&#x27;&lt;&#x2F;b&gt; - Start at the &quot;beginning of string&quot; anchor.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;.*&#x27;&lt;&#x2F;b&gt; - Match any character zero or more times.  The regex
engine matches this greedily, until it fails at the end of string.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;(\d+)&#x27;&lt;&#x2F;b&gt; - Fails.  There is no string left to match, so the
&lt;b&gt;.*&lt;&#x2F;b&gt; match gives up the character &#x27;5&#x27; to the regex.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;(\d+)&#x27;&lt;&#x2F;b&gt; - Succeeds matching &#x27;5&#x27; and storing it in $1.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;.*&#x27;&lt;&#x2F;b&gt; - Succeeds as it matches any character zero &lt;strike&gt;or more&lt;&#x2F;strike&gt;
times.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;$&#x27;&lt;&#x2F;b&gt; - Succeeds, anchor position is currently end of string&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h3&gt;Introducing Non-Greedy Quantifiers&lt;&#x2F;h3&gt;
&lt;p&gt;Non-Greedy quantifiers are the lazy quantifiers.  Where their greedy
counterparts match the maximum number of instances before allowing the regex
engine to continue, non-greedy quantifiers surrender control as soon as the
minimum number of instances is satisfied.  Non-greedy quantifiers will match
more than the minimum only when its necessary to have the entire regex
succeed.  The non-greedy quantifiers are the same as the greedy quantifiers
immediately followed by a &#x27;?&#x27;:&lt;&#x2F;p&gt;
&lt;table border=0 cellspacing=1 cellpadding=1&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;*?&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches 0 or more consecutive instances of the previous group or
character, &lt;b&gt;non-greedily&lt;&#x2F;b&gt;&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;+?&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Matches 1 or more consecutive instances of the previous group or
character, &lt;b&gt;non-greedily&lt;&#x2F;b&gt;&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;{a,z}?&lt;&#x2F;th&gt;
&lt;td align=&quot;left&quot;&gt;Range quantifier, specify a minimum (&lt;i&gt;a&lt;&#x2F;i&gt;) and a
maximum (&lt;i&gt;z&lt;&#x2F;i&gt;) number of consecutive instances to match,
&lt;b&gt;non-greedily&lt;&#x2F;b&gt;&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;&#x2F;table&gt;
&lt;p&gt;The previous example can also demonstrate the laziness of the non-greedy
match.&lt;&#x2F;p&gt;
```perl
my $string=&#x27;1 2 3 4 5 6 7 8 9 10 12 12 13 14 15&#x27;;
$string =~ &#x2F;^.*(\d+).*$&#x2F;;
my $number = $1;
```
&lt;p&gt;What will &lt;b&gt;$number&lt;&#x2F;b&gt; contain this time?  This time the answer is the &quot;1&quot;
highlighted in red below: &lt;p&gt;
&lt;code&gt;
&#x27;&lt;font color=&quot;#FF000&quot;&gt;&lt;b&gt;1&lt;&#x2F;b&gt;&lt;&#x2F;font&gt; 2 3 4 5 6 7 8 9 10 11 12 13 14 15&#x27;
&lt;&#x2F;code&gt;
&lt;ol&gt;
&lt;li&gt;&lt;b&gt;&#x27;^&#x27;&lt;&#x2F;b&gt; - Start at the &quot;beginning of string&quot; anchor.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;.*?&#x27;&lt;&#x2F;b&gt; - Match any character zero &lt;strike&gt;or more&lt;&#x2F;strike&gt; times.  The regex
engine lazily accepts no characters for this non-greedy match.  It will
allow characters to match this pattern only if those characters &lt;b&gt;must&lt;&#x2F;b&gt;
be matched to satisfy the entire regex.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;(\d+)&#x27;&lt;&#x2F;b&gt; - Succeeds matching &#x27;1&#x27; and storing it in $1.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;.*?&#x27;&lt;&#x2F;b&gt; - Succeeds as it matches any character zero &lt;strike&gt;or more&lt;&#x2F;strike&gt;
times.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;$&#x27;&lt;&#x2F;b&gt; - Fails, position is not currently end of string&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;(.*?)&#x27;&lt;&#x2F;b&gt; - In an attempt to match the entire regex, &lt;b&gt;.*?&lt;&#x2F;b&gt;
receives the rest of the string one character at a time until the position
is the end of the string.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;b&gt;&#x27;$&#x27;&lt;&#x2F;b&gt; - Position is now the end of string, the entire regex
matches!&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h2&gt;Notes on Greed&lt;&#x2F;h2&gt;
&lt;p&gt;There are two dangers when dealing with quantifiers.  Using the
wrong quantifier could match the wrong instance of text by being over or
under zealous in its attempt to position itself to satisfy the entire regex.
It could also lead to huge performance hits as the regex engine back tracks
across itself and the target text several times to find the &lt;b&gt;first&lt;&#x2F;b&gt;
arrangement that satisfies the regex entirely.&lt;&#x2F;p&gt;
&lt;p&gt;The use of &quot;&lt;b&gt;.*&lt;&#x2F;b&gt;&quot; is common with beginner&#x27;s, and is misused or
entirely unnecessary in most cases.  The use of anchors (&lt;b&gt;^&lt;&#x2F;b&gt;,&lt;b&gt;$&lt;&#x2F;b&gt;)
should allow the programmer enough freedom to maneuver to the regex engine
to the data they are seeking.&lt;&#x2F;p&gt;
&lt;p&gt;Headaches caused by matching the wrong data need to be addressed by
breaking down the regular expression as done in this article.  Remember, the
regex engine wants to match the regular expression at any cost, so long as
its the cheapest route.  The greedy matches will always get the maximum
instances still allowing the regex to match.  The non-greedy matches will
always get the minimum number of instances still allowing the entire regex
to match.&lt;&#x2F;p&gt;
&lt;h2&gt;Revisiting the IP Address Match&lt;&#x2F;h2&gt;
&lt;p&gt;Armed with knowledge of the regex engine&#x27;s inner workings, dissection
of the earlier IP Address match can reveal its short comings:&lt;&#x2F;p&gt;
&lt;code&gt;&#x2F;(\d{1,3}\.?){4}&#x2F;&lt;&#x2F;code&gt;
&lt;p&gt;This regex will match an IP Address such as &quot;192.168.0.1&quot;.  It will
however, also match strings like &quot;1234567.234&quot;.  Anxiously deciding to
optimize the regex to use quantifiers a hapless programmer noted that the
pattern &quot;digit digit digit period&quot; repeated 3 times in an IP Address and
then was followed by a &quot;digit digit digit.&quot; The &quot;digit digit digit&quot; was
repeated 4 times in the string!  The &quot;period&quot; happens &quot;0 or 1&quot; times
depending on which octet the cursor is at the end of.  So the attempt to
shorten the regex inadvertently led to it not being as specific as intended.
Had the programmer stopped at &quot;digit digit digit period&quot;, they would&#x27;ve had
a workable solution.  Again, this example doesn&#x27;t account for the fact that
IP addresses max value per octet is 255, but demonstrates a powerful regex.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;&#x2F;(\d{1,3}.){3}\d{1,3}&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Literally, &quot;one to three digits followed by a period, three times,
followed by one to three digits.&quot;  This regex would be good enough to pick
out IP Address-like strings from text, then validation could be done one
octet at a time.&lt;&#x2F;p&gt;
&lt;h2&gt;Closing Notes on Regular Expressions&lt;&#x2F;h2&gt;
&lt;p&gt;Writing the &quot;right&quot; Regular Expression is often very difficult to do
as machines and humans see text patterns completely different.  Humans are
keen to pick up on spatial patterns, while machines are left to process the
text one character at a time.  Learning to read regular expressions exactly
as the engine can help write more efficient, more effective regular
expressions.&lt;&#x2F;p&gt;
&lt;p&gt;In most circumstances it is possible for the programmer to have
access to the data they are attempting to match or extract from using
regular expressions.  A programmer should build their regular expressions
utilizing a relatively complete data set as the template.  Do not attempt to
write a regular expression to solve every problem.  Specialize regular
expressions as much as needed to get them to work right.&lt;&#x2F;p&gt;
&lt;p&gt;The regex engine in Perl is surrounded by millions of useful tools;
Perl.  Do not forget that.  Most regex beginner&#x27;s are content to solve
everything in a regex.  Questions like &quot;How do I loop in a regex in perl?&quot;
are not as uncommon as one might hope.  Regular Expressions match text, if
looping is necessary, use &lt;b&gt;foreach&lt;&#x2F;b&gt;, &lt;b&gt;for&lt;&#x2F;b&gt;, &lt;b&gt;while&lt;&#x2F;b&gt;, or
&lt;b&gt;until&lt;&#x2F;b&gt;.  Remember Perl is a huge tool chest with a million tools
inside, there&#x27;s no need to solve everything with a big hammer (or regex)
even if it might be more fun initially.&lt;&#x2F;p&gt;
</content>
	</entry>
</feed>
