BTW, A huge amount of this is covered in this whitepaper.  I would most heartily recommend you read it if you are looking for more details.  I have cribbed mightily.
In reality, vSphere is a brand that represents VMware’s a cloud operating system.
vSphere is a collection of tools that:
manages large collections of infrastructure (such as CPUs, storage, and networking) as a seamless and dynamic operating environment, and also manages the complexity of a datacenter.
From the database point of view, you need to understand a few key technolgies from the VMware stack:

ESX

ESX is he hypervisor that is run on physical servers that abstracts processor, memory, storage, and resources into multiple virtual machines.  ESX runs directly on the physical server and it presents a virtual operating platform to the guest operating systems running on top of it.  Something handled by the Virtual Infrastructure team, not a DBA thing.

vCenter

vCenter  is the central point for configuring, provisioning, and managing virtualized IT environments.  It is a web- or windows-based application that allows you to view and control all the virtual machines across multiple ESX servers.  vCenter Server aggregates physical resources from multiple ESX/ESXi hosts and presents a central collection of simple and flexible resources for the system administrator to provision to virtual machines in the virtual environment.  Also used by the Virtual Infrastructure team.

vMotion

vMotion - Enables the live migration of running virtual machines from one physical server to another with zero down time, continuous service availability, and complete transaction integrity.  It is relatively easy to do, because you can:
  1. Take a snapshot of a VM.
  2. Copy the snapshot of the VM to a new location.
  3. Start the VM.
  4. Synchronize the changes in the VM since step 1.
  5. Cut over to the new VM.
With vSphere, all of this happens under the sheets, and is transparent to users.  Imagine having a database server that is too busy and being able to move your database to a more lightly used server with a single mouse-click while the database is running with zero downtime.

vSphere High Availability

High Availability (HA) provides high availability for applications running in virtual machines. If a server fails, affected virtual machines are restarted on other production servers that have spare capacity.  vSphere keeps a heartbeat connection to a machine under HA and is able to detect a hardware or software failure an initiate HA automatically.
Often with databases, HA is something that is either overlooked or overpaid for.  Most companies will not have HA in their dev/test environments, and will purchase incredibly expensive and proprietary solutions for production machines.  vSphere HA gives you the same features available to all machines at a fraction of the cost.

vSphere Fault Tolerance

When Fault Tolerance (FT) is enabled for a virtual machine, a secondary copy of the original (or primary) virtual machine is created. All actions completed on the primary virtual machine are also applied to the secondary virtual machine. If the primary virtual machine becomes unavailable, the secondary machine becomes active, providing continual availability.
FT takes place at the CPU level, and is limited to 1 CPU at a time, which makes it untenable for database operations.  As this feature is further refined it may become more useful for database operations.

Distributed Resource Scheduling

Distributed Resource Scheduling (DRS) allocates and balances computing capacity dynamically across collections of hardware resources for virtual machines. This feature includes distributed power management (DPM) capabilities that enable a datacenter to significantly reduce its power consumption.
Essentially, DRS uses vMotion to place VMs in the right location initially, and to balance VMs across multiple servers based on the state of entire system.  For example, if there are three databases running on a server and one takes a large number of resources, the other databases will automatically be moved away with vMotion while maintaining uptime and database performance.

Why This Matters

In practical terms vSphere is a layer that sits on top of your enterprise servers, networks, and storage and abstracts the details of operation away from the operating system.  This allows you incredible flexibility in how to operate your data center and all the applications within it.
There are additional challenges to making databases run in virtual infrastructure, but it is more than worth it to make the effort.

The Takeaway

This is pretty different from how most database shops run today.  Most DBAs, Analysts, and Developers really have to sweat the details on the infrastructure they are running on.  With vSphere, a huge amount of these worries are removed through abstraction.

References


When most people think of virtualization, what they really mean is VMware, and this is how I will focus this article.  If you aren’t familar with VMware technology, you should read the basics about virtual machines.

To quote:

 A virtual machine is a tightly isolated software container that can run its own operating systems and applications as if it were a physical computer. A virtual machine behaves exactly like a physical computer and contains it own virtual (ie, software-based) CPU, RAM hard disk and network interface card (NIC).

An operating system can’t tell the difference between a virtual machine and a physical machine, nor can applications or other computers on a network. Even the virtual machine thinks it is a “real” computer. Nevertheless, a virtual machine is composed entirely of software and contains no hardware components whatsoever. As a result, virtual machines offer a number of distinct advantages over physical hardware.

In general, VMware virtual machines possess four key characteristics that benefit the user:

  • Compatibility: Virtual machines are compatible with all standard x86 computers
  • Isolation: Virtual machines are isolated from each other as if physically separated
  • Encapsulation: Virtual machines encapsulate a complete computing environment
  • Hardware independence: Virtual machines run independently of underlying hardware

Multiple systems, one machine, no one any the wiser except for the system administrator who teaches them how to share.  From the database point of view, this is FANTASTIC, because it removes a huge number of headaches for DBAs that aren’t related to the database.

Also, because the operating system is encapsulated and machne independent, there are some important things things that you can do with VM’s that are difficult to do with physical machines.

Snapshots

A snapshot preserves the state and data of a virtual machine at a specific point in time.  The state includes the virtual machine’s power state (for example, powered-on, powered-off, suspended).  The data includes all of the files that make up the virtual machine. This includes disks, memory, and other devices, such as virtual network interface cards.

This operation is typically VERY fast compared to a database backup, typically only taking seconds.  This can be a great way to do a simple checkpoint or an incremental backup.  You just quiesce the database, take a VM snapshot and continue on your merry way.

Cloning

A full clone is an independent virtual machine, with no need to access the parent. Full clones do not require an ongoing connection to the parent virtual machine. Because a full clone does not share virtual disks with the parent virtual machine, full clones generally perform better than linked clones. However, full clones take longer to create than linked clones. Creating a full clone can take several minutes if the files involved are large.

This operation is also typically VERY fast compared to a database clone and the clone can be deployed anywhere that you want to.  While there is more work that is needed to be done inside the new VM around database configuration,  there isn’t anything that needs to be done around hardware.

Linked Clones

A linked clone is made from a snapshot of a parent virtual machines. All files available on the parent at the moment of the snapshot continue to remain available to the linked clone. Ongoing changes to the virtual disk of the parent do not affect the linked clone, and changes to the disk of the linked clone do not affect the parent.

Imagine finding a problem in production and in just a few minutes being able to hand your developer a full copy of a 1 TB database that they can query and change at will without impacting anything.  Very difficult to do in the traditional model, easy to do when virtualized.

Pretty cool stuff.

Why This Matters 

In reality virtualization wouldn’t be much more than a nice parlor trick if all you did was create individual VMs and run then on your laptop.  However, virtualizing resources really opens up a world of interesting things for databases, administration, server efficiency, and data movement.

The Takeaway

Virtualization makes running existing databases cheaper and easier.  Also, if you are clever operationally, you can manipulate the container to make the database inside run exactly the way you want it to and at the cost you want to pay.  This is the domain of virtual resource management, which we will cover in detail.  However, the first thing we need to understand is VMware’s virtualization stack.

References 

Introduction to vSphere

Snapshots

Understanding Clones

How does a Snapshot Work

 


In my work evangelizing VMware vFabric Data Director I have found an interesting problem:

  1. Databases and virtualization are both great technologies.
  2. DBAs typically don’t know or care much about virtualization.  It simply isn’t the way things are done.
  3. Virtual Infrastructure (VI) Admins typically don’t know or care much about databases.  A payload is a payload and a VM is a black box.

However, I understand exactly how it happens.  I have been working with big data, databases, data management, and ETL for the last 10 years and I had very little interaction with virtualization or centralized IT.  Databases are the crown jewels of the IT infrastructure, and are typically managed completely seperately than the rest of IT.

Now, it is possible to virtualize Oracle and SQL Server, and some companies have had some notable success doing it.  However, it takes a great deal of effort and isn’t something I have really seen at scale.  The power that enterprise virtualization offers is significantly beyond what can be accomplished by cobbling together a number of virtual machines.  However, both databases and virtual infrastructure have a huge amount to gain from working together and people from the database side of the house need to understand virtualization in order to exploit it.
So, I have put together a series on virtualization designed for the database professional.  Enjoy!

One of the most highly valued features of information architecture is accuracy. Everyone wants everything to be perfect: every answer should be as factually accurate as possible and available immediately to whomever needs it. This was the promise of the internet as a whole, and of the web specifically (especially the “semantic web“, which I have ranted about before).

The Economist has a great article about digital libraries and the effects of imprecision searching for information. Large voices have talked in great detail about the Long Tail of information on the web. That is, information doesn’t lose its value on the internet as quickly as it does elsewhere because it remains available forever. For example, an article printed in a newspaper could easily be lost in the dustbin of history simply because it can’t be easily indexed and read by the people who want to see it the most. The same information in a blog will be available through Google for the rest of eternity.

However, researchers have found that exactly the opposite is happening with scientific media. The number of links to older materials actually declined as it was moved from paper to the web. In other words, the newest, most popular material was linked to more often, creating a shorter tail than had otherwise existed. You can read the article itself for more details on the nature of the experiment itself.

I think that this study actually shows the value of inaccuracy in search. Humans don’t need less accurate information to do their job today. However, in the long term we need less accurate information in order to take us down paths we might not expect. The very act of turning pages in a book might allow some bit of information to catch the eye of the reader (either consciously or unconsciously) and take them down a path they might not otherwise have traveled. Think ask.com and iTunes cover flow vs. the bubble sort.

IMHO, this highlights the fact that the brain needs a level of accuracy in order to function, but also requires a bit of fuzziness in order to grow and thrive. This is just a part of our biology and our cognition. As Information Architects, we need to understand this in order to build the right systems for people.


News.com has an article about HorizonOne, a company that is utilizing technology in a way that makes a difference and makes a profit.  They have created new, high-tech vending machines for schools that dispense healthy food and teach children how to eat in a more nutritious manner.

According to the article:

Software installed in the refrigerated box connects the student IDs and purchase data to Horizon’s point of sale servers, which automatically track students’ prepaid account IDs, along with information on whether a student qualifies for free or discounted lunch rates based on household income. School districts get reimbursed by the government for a fraction of the discounted cost when they sell a balanced meal–under USDA rules, three of five of a bread, protein, dairy or fruit and vegetables–to low-income students.

Finally, and this is the selling point for parents who want oversight of their child’s eating habits, parents can log onto a secure Web site, called MealPayPlus, to see what their child ate for lunch, or how they snacked on any given day. They can also add money to their child’s account directly on the Web site

This is a very, very cool idea and a true win-win-win for school districts, parents, and children.  The kind of solution that really can only be done with integrated technology and cooperation between everyone involved.  The only losers here are the existing vending industry and suppliers.  They have stubbornly insisted on delivering food that maximizes their profits and tried to lock them in with exclusive contracts and payments to schools that border on bribes.

Kudos to HorizonOne for their innovation and vision!


In the IT world, I am not sure if I should take user-driven innovation to be a sign of progress or a sad display on how difficult things have become. With all the talk about Web 2.0, wikis, social media, and the cathedral and the bazaar, I would think that any technologist who isn’t at least passingly familiar with these ideas must be on a contract, working out of a cave, typing programs on dusty green screen terminal connected to the corporate mainframe by a thick token ring cable.

However, the ‘new trend’ of allowing users actually provide feedback into the products that they are using has produced a professorship at MIT and been mentioned in the New York Times ::sigh::  IMHO, this is simply the realization that ignoring people is less effective than communicating with them, and less effective means less profitable.  I think the technology world should be embarassed that it is so disfunctional that this is at all new or interesting enough to study.

On a similar note, Ben Stein has some interesting thoughts about how to have a business conversation.  If you plan on doing anything other than pure solo work for the rest of your life it is mandatory reading.  Interestingly, it seems that the best conversationalists are the ones who are the most considerate and do the most listening.  Sounds like organizations might want to focus on having an actual conversation with their customers …


In the IT world, I am not sure if I should take user-driven innovation to be a sign of progress or a sad display on how difficult things have become. With all the talk about Web 2.0, wikis, social media, and the cathedral and the bazaar, I would think that any technologist who isn’t at least passingly familiar with these ideas must be on a contract, working out of a cave, typing programs on dusty green screen terminal connected to the corporate mainframe by a thick token ring cable.

However, the ‘new trend’ of allowing users actually provide feedback into the products that they are using has produced a professorship at MIT and been mentioned in the New York Times ::sigh::  IMHO, this is simply the realization that ignoring people is less effective than communicating with them, and less effective means less profitable.  I think the technology world should be embarassed that it is so disfunctional that this is at all new or interesting enough to study.

On a similar note, Ben Stein has some interesting thoughts about how to have a business conversation.  If you plan on doing anything other than pure solo work for the rest of your life it is mandatory reading.  Interestingly, it seems that the best conversationalists are the ones who are the most considerate and do the most listening.  Sounds like organizations might want to focus on having an actual conversation with their customers …


Steve Hamm at Business Week has a short post about the hiring practices of Chinese IT firms, which hopefully will open the eyes of some of the leaders here in the west. Symbio is an outsourcing company that was facing a problem with not having enough qualified recruits. Their response?

[Symbio CEO Jacob] Hsu and his colleagues decided they needed a feeder program to prepare college students to work for them, so they recently established software institutes in the Harbin Institute of Technology and Shandong University, both in the coastal city of Weihai. That’s where Symbio is about to establish a new development center. Says Hsu, who grew up in San Francisco: “Other companies have university partnerships; we run the university departments.”

This isn’t something Symbio undertakes lightly. “We’re a human potential factory. We’re in the talent management business,” says Hsu. “In the next couple of years the companies that win will be the ones who manage talent the best.”

The Chinese have an abundance mentality, a positive outlook for their long-term future. Not only are they doing business successfully today, they are investing in building a generation of leaders for tomorrow. At the same time, western companies are shedding jobs and looking to outsource jobs and import labor to meet the needs of the moment. The west isn’t going to lose its edge because of quarterly profit outlooks, it is going to lose it because it lacks the vision to see the future and the audacity to put itself at the center of it!


GigaOM has an interesting article about the impact of web 2.0 on network engineers. Namely, that the maturation of the internet has made the skills of a good network person a lot less important:

I see the current state of the Internet as the ultimate success … You can deploy a wildly successful Web 2.0 application that serves millions of users and never know how a router, switch or load-balancer works. Even network security and firewalls that were making headline news not more than a few years ago are considered perfunctory. The success of these networking devices and technologies has enabled them to become part of the technology landscape that exists for all to use as they see fit, similar to the microprocessor or electricity.

It is always odd to see the once-glamorous jobs of your youth thrown onto the scrap heap of history (think about the differences in perception between the masons of the middle ages and your local bricklayer). Network Engineers were once the masters of a difficult and arcane field, literally bringing information from chaos. Now, the wizards have been trapped in tiny control panels for now, until they can be embedded in silicon for all time.

This has really got me to thinking about my own field, and its future. What specialties are going to dissapear if data becomes as reliable as electricity? For one thing, I think we would see ETL and Business Analysis become a single career path that is much more abstract and tools-based. With the advent of good BPM, I could see a lot of the scheduling and other mechanics pushed off towards the DBA’s and Systems Administrators. Also, I think that a lot of the hardware could be appliance based, or outsourced completely. Of course, this leaves a great opportunity for open source BI and for nimble players to attack the market and take advantage of the innovators dilemma.

A brave (an infinitely more useful) new world!


GigaOM has an interesting article about the impact of web 2.0 on network engineers. Namely, that the maturation of the internet has made the skills of a good network person a lot less important:

I see the current state of the Internet as the ultimate success … You can deploy a wildly successful Web 2.0 application that serves millions of users and never know how a router, switch or load-balancer works. Even network security and firewalls that were making headline news not more than a few years ago are considered perfunctory. The success of these networking devices and technologies has enabled them to become part of the technology landscape that exists for all to use as they see fit, similar to the microprocessor or electricity.

It is always odd to see the once-glamorous jobs of your youth thrown onto the scrap heap of history (think about the differences in perception between the masons of the middle ages and your local bricklayer). Network Engineers were once the masters of a difficult and arcane field, literally bringing information from chaos. Now, the wizards have been trapped in tiny control panels for now, until they can be embedded in silicon for all time.

This has really got me to thinking about my own field, and its future. What specialties are going to dissapear if data becomes as reliable as electricity? For one thing, I think we would see ETL and Business Analysis become a single career path that is much more abstract and tools-based. With the advent of good BPM, I could see a lot of the scheduling and other mechanics pushed off towards the DBA’s and Systems Administrators. Also, I think that a lot of the hardware could be appliance based, or outsourced completely. Of course, this leaves a great opportunity for open source BI and for nimble players to attack the market and take advantage of the innovators dilemma.

A brave (an infinitely more useful) new world!