Azure Batch on Azure Government

Azure Batch on Azure Government


[MUSIC]>>Hi. This is Steve Michelotti of the Azure Government
Engineering team. I’m joined here today
by Peter Schultz, a program Manager on
the Azure Batch team. Thanks for joining us, Peter.>>Great to be here.>>We’re here to talk about
Azure Batch in Azure Government. But before we do that, we should probably make sure
everyone understands, what is Azure Batch?>>Yes. So Azure Batch
is a service designed to run large-scale parallel and high-performance
computing applications efficiently in the Cloud. With Batch, you can
easily allocate and manage hundreds or even thousands of VMs into a logical grouping
that we call pools. Batch also makes it really
easy to distribute work across those VMs in the form of jobs
which are made up of tasks, and you can think of
tasks as basically just commands that you can
run on the command line.>>Okay. So I think, this is one of those things
for me that when we talk about the elastic
scalability of the Cloud, Azure Batch is something
that truly embodies that. Where some people are
just a provision of VM on the Cloud. Great, that’s not that interesting. But in this case, you can provision a series of
resources for parallel jobs.>>Yeah, absolutely. The ability to schedule jobs is
really what makes Batch standout.>>All right, cool. So
let’s dive a little deeper. What are some of the things we
should know about Azure Batch?>>Yeah. So batch has a
number of capabilities that make it the best Cloud-native way to run large-scale compute jobs. For one, there are
numerous access methods. So in addition to our REST APIs and command line tools
for Windows or Linux, we also have SDKs for.NET,
Java, Node.js and Python. There are also some highly
configurable VM options. So you can choose between
Windows or Linux, standard or custom images, and dedicated or low-priority VMs to create a VM configuration
that works best for you. Lastly we also enable pool scaling. So you can choose to manually scale the size of your pools
according to your needs, or you can create logic to automatically scale your pools for you according to
a formula that you write, which is pretty powerful.>>That’s funny because we’re
talking about spinning up resources to do work on demand. It almost sounds like we’re
talking about Azure functions, but you’re talking about VMs. Is there any limitations in terms of what programming language I can
use to utilize Azure Batch?>>No. Between.NET, Java,
Node.js, and Python, we found customers
are pretty happy with the array of programming languages
that we offer them. Functions in Batch are
a little different. Batch is really good for
when you want to choose the underlying hardware
that you’re running on. So any VM in Azure, you can use Batch to
schedule jobs on, and any task that requires any more than a few
milliseconds to execute, Batch is perfect for it, tasks of basically any size are
perfect to execute with Batch.>>All right, great. So what are some example workloads that we might talk about, especially
in the Government?>>Sure. So the array of
high-performance computing, applications is pretty broad to
begin with as you could imagine. So the array of workloads that run on Batch is going to be
pretty broad as well. But I’ll list a few examples
to try and get people’s minds go in and see if they can
imagine some things as well. So one would be transcoding, either audio or video
for different formats, resolutions, bit rates,
things like that. Another would be Monte-Carlo
simulations for risk analysis. You could have
weather simulations for either local, state,
federal agencies. Genome sequencing for
scientific agencies, or even applying
optical character recognition or OCR to hundreds or even thousands
of archived documents.>>Great. Definitely, in
the Government space, something I’m constantly
harping on is spending taxpayer dollars
more efficiently. So the ability to spin up resources and then
even more importantly, spin them down when you’re not using them is hugely important
to saving money.>>Yeah, and between manually resizing pools or using
our auto-scaling formulas, you can certainly use
those taxpayer dollars efficiently.>>So we’ve auto-scaling
formulas, I heard you say.>>Yeah, absolutely. So when I was talking earlier about creating logic to scale
your pools up and down, you can actually
tailor-write a formula according to things like
the number of tasks that are waiting to be run, number of tasks that have already been completed, things like that. If you think about these workloads, a lot of them are scalable and
parallelizable such that the work can be broken down into discrete pieces and
operated on independently. So if I go back to this slide here, you can see more distributing tasks to each VM in our pool
to transcode video. So you can imagine you have tens, hundreds of videos in
your archives that you’re looking to change into
a different resolution or format. So you can work on each of
those videos independently on a different VM and parallelize the work so that you get it done in a shorter amount of time.>>Great. So we’ve got
the various workloads, we’ve talked about the concepts
of these parallelizable jobs. So let’s talk a little bit
about some of the capabilities, some of the tools we have to
implement these workloads.>>Certainly. So I mentioned
SDKs, .NET, Java, Python, Node.js, and command line tools
for Windows and Linux. But we also have
a cross-platform application that we developed
called Batch Explorer, which makes it really easy to create, monitor and debug Batch applications.>>Batch Explorer works on Azure
Government very well as well.>>Yes, it does.>>So these tools not only work for Azure public, but also
Azure Government?>>Yeah.>>Okay. So I’ve heard you talk
about some of these tools, I’ve heard you mention
VMs a couple of times, I heard you mention low-priority VMs, and we have, I think a lot of
customers that don’t necessarily understand the concept of
what those low-pri VMs are. Can you talk a little
bit more about that?>>Yes, certainly. So when
you think of a VM in Azure, you’re probably thinking
of one that you can allocate and get access to whenever. Like once you deploy it, it’s yours. Those are what we on
the Batch team call dedicated VMs. Low-priority VMs are
a little different. Low-priority VMs are allocated
from excess capacity in Azure at about an 80 percent discount over their dedicated equivalent.>>Eighty-percent discount?
That sounds good for spending taxpayer dollars
more efficiently.>>Yes, absolutely. They
can be reclaimed at anytime if demand in
a region goes up. So we don’t offer an SLA for them, but the discount is significant. They’re great for workloads that have a flexible job completion time, and where the work is distributed really well across
multiple nodes or VMs.>>So if you don’t need
an SLA for this job getting done in X number
of seconds or minutes, they are perfect case for
more efficient workload.>>Yeah, and so there are
different ways you can mix and match dedicated and low-price VMs
to optimize for price, or speed, or both. If you have low-priority VMs serve as a variable proportion
of your Batch pools, you can get a far lower cost
for those pools.>>Great. All right. So we’ve
talked a lot about Azure Batch, the concepts, the capabilities. I know I’m ready for a demo.>>Sure.>>Why don’t we jump into a demo
here? What do you have for us?>>Yeah. So I’ll give
a quick overview here. So, imagine you’re an agency with a ton of scanned documents and you want to make them very
easily searchable. So what this demo is going
to show you is how you can upload those documents into
a storage blob and have a blob triggered function
automatically execute a Batch job which will apply
optical character recognition or OCR to those documents, and then Batch will
immediately write them back to that same storage account, a different blob container.>>Awesome. So if I look at the workflow that you have
shown in this diagram here, I can see that it goes from a
storage Azure Function to Batch. It’s not that I have to use Azure Functions to
kick off Azure Batch. It’s just that, if I wanted
to take advantage of serverless computing in
Azure government, I certainly can.>>Absolutely.>>But as you said, Azure
Batch has direct REST APIs. You can invoke it
directly if you want.>>Yes.>>Or you can integrate it within another alternative architecture,
either makes sense.>>Yes.>>Okay, great. All right.
So let’s see this in action.>>Super. So I’ll save us some time here and I have the storage account, Azure Function and Batch pool
is already allocated. But we have documentation
on our Website that can walk you
through how to do that. So what I’ll do right now is I will show you the function
that I have written. So what this function
does is it takes any file added to a storage
blob that I’ve pre-configured, and what it will do is it will
create a job for that file.>>So the Azure function itself
is using the batching, SDK, APIs?>>Yeah. It’s using
the batch.NET SDK, exactly, and so for every document
that gets added to our Storage Blob container, it will run this OCR
in my PDF commands.>>It looks like you’re
spinning up some Linux VMs.>>Yes.>>All right.>>That’s what I’m doing
some FV2 onto 18.04.>>Great.>>So what I can do right now
is show you up at the top here, this tutorial is available
on the batch docs page so viewers can go ahead
and use it themselves. But all that it requires is a batch account with
some credentials that I list here, a job ID that you’ve already created, and then a connection string
for your input container for your blobs and then an output container that you
want to write everything to.>>So the connection string might have slight differences
in Azure government, but other than that
the code is identical to what you would
run in Azure public?>>Exactly, and getting all this information is
just as easy as looking at the access keys for either your batch account
or your storage account.>>Great. Okay.>>So what I’m going to
do is I’m going to go into Storage Explorer here, which is another great tool that
works with Azure government, and I’m going to load
up a bunch of scans. Each of these is just a scanned
document with some text. I think there are about a
100 of them here, and so I’ll just open them in this blob and upload
them, these are PDFs.>>Okay, yes.>>So they are uploading here.>>So just by the fact that
they uploaded into storage, the Azure function is going to detect that new blobs came
in and it’s going to invoke batch for each
of those that are just detected using the automatic trigger functionality
of Azure Functions.>>Yeah. So we’re going to have a batch job for each of these scans. What that batch trap will
do is just apply that OCR in my PDF command that I was showing you earlier to
create a PDF file with OCR, so you could search
that PDF directly. It will also show a
text file as well, it will output one of those.>>Okay.>>So if you were just interested in the text and not the
actual PDF format, you’ll have that text file
there for you as well. So if we go over to
the batch pool here, shortly you’ll see a heat-map
of about 10 nodes. I’m spinning up and
running these jobs, one job for each file.>>I can see that something
is starting to happen. Now, we have a few running, looks like there are
several running up, now there’s all ten running. So this actually updates
in real time and you can see as the jobs are spinning up, you’ve visibility in some of the monitoring you were
talking about earlier.>>Yeah, absolutely.>>Inside as to what’s happening.>>Yeah, and so this
is available both in the Azure portal itself as
well as batch explorer.>>These tiles, are
these representing an individual VM or a core or
what do the tiles represent?>>Yes, individual VMs.>>Okay.>>Or what we would call nodes.>>Okay.>>Yeah. Interchangeable
vocabulary there.>>Okay. So these are running, how do we know when the job is done?>>Yes, so we have, I’ll go over here back to
the batch accounts page, we have a jobs blade
right here where you can check on the status of your jobs. So you can see that
this one is active, which means that there are tasks
still being run under that job.>>Okay.>>Yeah.>>Then the similar output
if you were going to look at the batch Explorer versus the portal
will give you different way. Okay, I what this fell into
the tasks as you can see.>>So you can see up
here that we have about 32 jobs or tasks left
before this job completes.>>But we can see how quickly
that this process is this.>>Yes.>>We see whatever that
is 85 percent complete.>>Yeah. We can just refresh and see, picked up another function.>>It’s interesting because we
are parallelizing these things. If we felt like oh,
that took two minutes. I want it to be one minute
or that took an hour, I want it to be a half hour. Okay, you’re in control of that, you can just spin up
more machines because it’s optimized for jobs
that are parallelizable, and therefore you you decide
how much money you want to spend, how fast you want it to be. It’s all customizable to whatever
is appropriate for your work.>>Correct. So when
you’re thinking of large-scale parallel compute jobs, you’re thinking of things
that you can break down into discrete pieces and operate on
those pieces independently.>>Right.>>So when you do that, you
enable a scale-out capability. So jobs of any size can be completed as quickly as
you need them to be. If you want it to be done quicker, simply spin up more VMs, if you don’t care about the time
that you get it completed, you can cut those VMs down into a smaller pool and wait
for it to finish up.>>Yeah, and because you
get to pick the compute, the core is does each VM have. What’s the processor? Again, you’re in control of that.>>Yeah, and you can optimize for your workload by using
specialized hardware underneath. We batch offer support for
GPU VMs or HPC VMs as well.>>Great. Okay. So let’s
see the status of our jobs.>>Sure.>>I’m guessing we’re probably
getting close to complete.>>Yeah, we are 100 percent complete.>>Okay, so what do we have?
Now what’s the result?>>Sure. So I’ll go into
storage explorer here and go into our output container right
here and if I refresh, you can see all of our scans that we added
into the input container, now have a corresponding
PDF and text file. So if I open up the text file, you can see that pretty much
all of the text that was in the initial PDF is now
available just as a text file.>>Right. So we had PDF’s, we did OCR, and now we have
text files as a result.>>Yeah. I could even
open up the new PDF files and to show that they have optical
character recognition in place. All you have to do is
hit something like control F and then you can
see each one is searchable.>>I see. Make sense. All right, so this demo has been really insightful in terms of seeing
these things running on Azure government not just
Azure Batch by itself and the processing
and parallelizable, but also the ability
to integrate it within a larger workflow using Azure Storage and serverless
computing and that.>>Yeah. Okay, great.>>All right so where can
someone go to get started? What are some resources and links
that people should know about?>>Yes, so we have documentation available on
docs.Microsoft.com website. We also offer some code
samples on GitHub, and then I have the link
for batch explorer right there as well if people
want to check that out.>>Awesome That sounds great.>>Yeah.>>Okay. This has been Steve Michelotti with Peter Schultz
of the Azure Batch team, talking about Azure Batch
on Azure Government. Thanks for watching. [MUSIC]

Leave a Reply

Your email address will not be published. Required fields are marked *