You can isolate malicious programs or risky tasks by sandboxing
them in different ways to stop them from affecting your main system.
This article gives the reader a working knowledge of sandboxing in
Linux.
Securing your system is a big priority for every production
environment, whether you are a systems admin or a software developer.
The best way to secure your operating system from doubtful programs or
processes is by sandboxing (also termed as jailing). Sandboxing involves
providing a safe environment for a program or software so that you can
play around with it without hurting your system. It actually keeps your
program isolated from the rest of the system, by using any one of the
different methods available in the Linux kernel. Sandboxing can be
useful to systems administrators if they want to test their tasks
without any damage and also to developers for testing their pieces of
code. A sandbox can help you to create a different environment from your
base operating system. It has become trendy due to its extensive use by
PaaS and SaaS providers.
The idea of jailing is not new since it has been available in UNIX based
BSD OSs. For years, BSD has used the concept of jails, while Solaris
has used zones. But in Linux, this concept was started with chroot and
has been possible because namespaces are present in the Linux kernel.
Namespaces
Namespaces are features available in Linux to isolate processes in
different system resource aspects. There are six types of namespaces
available up to kernel 4.0. And more will be added in the future. These
are:
- mnt (mount points, file systems)
- pid (processes)
- net (network stack)
- ipc (system V IPC)
- uts (host name)
- user (UIDs)
Linux namespaces are not new. The first one was added to Linux in
2008 (Linux kernel 2.6), but they became more widely used only in Linux
kernel 3.6, when work on the most complex of them all the users
namespace was completed. Linux kernel uses
clone(), unshare() and
setns() system calls to create and control namespaces.
Creation of new namespaces is done by the
clone() system call, which is also used to start a process. The
setns() system call adds a running process to the existing namespace. The
unshare()
call works on a process inside the namespace, and makes the caller a
member of the namespace. Its main purpose is to isolate the namespace
without having to create a new process or thread (as is done by
clone()).You
can directly use some services to get the features of these namespaces.
CLONE_NEW* identifiers are used with these system calls to identify the
type of namespace. These three system calls make use of the CLONE_NEW*
as CLONE_NEWIPC, CLONE_NEWNS, CLONE_NEWNET, CLONE_NEWPID, CLONE_NEWUSER,
and CLONE_NEWUTS. A process in a namespace can be different because of
its unique inode number when it is created.
lrwxrwxrwx 1 root root 0 Feb 7 13:52 ipc -> ipc:[4026532253]
lrwxrwxrwx 1 root root 0 Feb 7 15:39 mnt -> mnt:[4026532251]
lrwxrwxrwx 1 root root 0 Feb 7 13:52 net -> net:[4026531957]
lrwxrwxrwx 1 root root 0 Feb 7 13:52 pid -> pid:[4026532254]
lrwxrwxrwx 1 root root 0 Feb 7 13:52 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Feb 7 15:39 uts -> uts:[4026532252]
|
Mount namespace: A process views different mount
points other than the original system mount point. It creates a separate
file system tree associated with different processes, which restricts
them from making changes to the root file system.
PID namespace: PID namespace isolates a process ID
from the main PID hierarchy. A process inside a PID namespace can have
the same PID as a process outside it, and even inside the namespace, you
can have different init with PID 1.
UTS namespace: In the UTS (UNIX Timesharing System)
namespace, a process can have a different set of domain names and host
names than the main system. It uses sethostname() and setdomainname() to
do that.
IPC namespace: This is used for inter-process communication resources isolation and POSIX message queues.
User namespace: This isolates user and group IDs inside
a namespace, which is allowed to have the same UID or GID in the
namespace as in the host machine. In your system, unprivileged processes
can create user namespaces in which they have full privileges.
Network namespace: Inside this namespace, processes can
have different network stacks, i.e., different network devices, IP
addresses, routing tables, etc.
Sandboxing tools available in Linux use this namespaces feature to
isolate a process or create a new virtual environment. A much more
secure tool will be that which uses maximum namespaces for isolation.
Now, lets talk about different methods of sandboxing, from soft to hard
isolation.
chroot
chroot is the oldest sandboxing tool available in Linux. Its work is the same as mount namespace, but it is implemented much earlier.
chroot changes the root directory for a process to any
chroot directory (like /
chroot).
As the root directory is the top of the file system hierarchy,
applications are unable to access directories higher up than the root
directory, and so are isolated from the rest of the system. This
prevents applications inside the
chroot from interfering with
files elsewhere on your computer. To create an isolated environment in
old SystemV based operating systems, you first need to copy all required
packages and libraries to that directory. For demonstration purposes, I
am running ls on the chroot directory.
First, create a directory to set as root a file system for a process:
Next, make the required directory inside it.
Now, the most important step is to copy the executable and libraries. To get the shell inside the
chroot, you also need
/bin/bash.
To see the libraries required for this script, run the following command:
linux-vdso.so.1 (0x00007fff70deb000)
libncurses.so.5 => /lib/x86_64-linux-gnu/libncurses .so.5 (0x00007f25e33a9000)
libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo .so.5 (0x00007f25e317f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl .so.2 (0x00007f25e2f7a000)
libc.so.6 => /lib/x86_64-linux-gnu/libc .so.6 (0x00007f25e2bd6000)
/lib64/ld-linux-x86-64 .so.2 (0x00007f25e360d000)
linux-vdso.so.1 (0x00007fff4f8e6000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux .so.1 (0x00007f9f00aec000)
libc.so.6 => /lib/x86_64-linux-gnu/libc .so.6 (0x00007f9f00748000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre .so.3 (0x00007f9f004d7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl .so.2 (0x00007f9f002d3000)
/lib64/ld-linux-x86-64 .so.2 (0x00007f9f00d4f000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread .so.0 (0x00007f9f000b6000)
|
Now, copy these files to the
lib or
lib64 of
/chroot
as required.
Once you have copied all the necessary files, its time to enter the
chroot.
You will be prompted with a shell running inside your virtual
environment. Here, you dont have much to run besides ls, but it has
changed the root file system for this process to
/chroot.
To get a more full-featured environment you can use the
debootstrap utility to bootstrap a basic Debian system:
It will download a minimal system to run under
chroot. You
can use this to even test 32-bit applications on 64-bit systems or for
testing your program before installation. To get process management,
mount
proc to the
chroot, and to make the contents of
home lost on exit, mount
tmpfs at
/home//:
To get Internet connection inside, use the following command:
After that, you are ready to enter your environment.
Here, you get a whole basic operating system inside your
chroot.
But it differs from your main system by mount point, because it only
uses the mount property as the isolator. It has the same hostname, IP
address and process running as in the main system. Thats why it is much
less secure (this is even mentioned in the man page of chroot), and any
running process can still harm your computer by killing your tasks or
affecting network based services.
Note: To run graphical applications inside chroot, open x server by running the following command on the main system:
On
systemd based systems, chrooting is pretty straightforward. Its needed to define the root directory on the processes unit file only.
[Unit]
Description=my_chroot_Service
[Service]
RootDirectory= /chroot/foobar
ExecStartPre= /usr/local/bin/pre .sh
ExecStart= /bin/my_program
RootDirectoryStartOnly= yes
|
Here
RootDirectory shows where the root directory is for the foobar process.
Note: The program script path has to be inside chroot, which makes the full path of that process script as /chroot/bin/my_program.
Before the daemon is started, a shell script pre.sh is invoked, the
purpose of which is to set up the chroot environment as necessary, i.e.,
mount
/proc and similar file systems into it, depending on
what the service might need. You can start your service by using the
following command:
Ip-netns
The Ip-netns utility is one of the few that directly use network
namespaces to create virtual interfaces. To create a new network
namespace, use the following command:
To check the interfaces inside, use the command shown below:
You can even get the shell inside it, as follows:
This will take you inside the network namespace, which has only a
single network interface with no IP. So, you are not connected with the
external network and also cant ping.
This will bring the loop interface up. But to connect to the external
network you need to create a virtual Ethernet and add it to
netns as follows:
Now, its time to set the IP to these devices, as follows:
Unshare
The unshare utility is used to create any namespace isolated environment and run a program or shell inside it.
To get a network namespace and run the shell inside it, use the command shown below:
The shell you get back will come with a different network stack. You can check this by using
#ip addr, as follows:
1: lo: mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link /loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
|
To create a user namespace environment, use the following command:
You can check your user inside the shell by using the command below:
To get the PID namespace, use the following command:
Inside this namespace, you can see all the processes but cannot kill any.
root 1110 42.6 11.0 1209424 436756 tty1 Sl 23:36 0:15 .firefox1/. /firefox
root 1208 0.0 0.0 12660 1648 pts /2 S+ 23:37 0:00 grep firefox
bash : kill : (1110) - No such process
|
To get a whole different degree of process tree isolation you need to mount another proc for the namespace, as follows:
unshare --pid --fork -- mount -proc /bin/bash
|
In this way, you can use unshare to create a single namespace. More about it can be found out on the man page of unshare.
Note: A namespace created by using unshare can also be combined to create a single shell which uses different namespaces. For example:
This will create an isolated environment using the PID and user namespaces.
Firejail
Firejail is an SUID sandbox program that is used to isolate programs for
testing or security purposes. It is written in C and can be configured
to use most of the namespaces. To start a service in firejail, use the
following command:
It will start Firefox in a sandbox with the root file system mounted as read only. To start Firefox with only
~/Downloads and
~/.mozilla mounted to write, use the following command:
Firejail, by default, uses the user namespace and mounts empty temporary file systems (
tmpfs) on top of the user home directory in private mode. To start a program in private mode, use the command given below:
To start firejail in a new network stack, use the following command:
To assign an IP address to the sandbox, use the following command:
Note: To sandbox all programs running by a user, you can change the default shell of that user to /usr/bin/firejail.
Containers
When learning about virtualisation technologies, what attracted me most
were containers because of their easy deployment. Containers (also known
as lightweight virtualisation) are tools for isolation, which use
namespaces for the purpose. They are a better sandboxing utility,
because they generally use more then one namespace and are more focused
on creating a whole virtual system instance rather than isolating a
single process.
Containers are not a new technology. They have been in UNIX and Linux
for decades but due to their increasing use in SaaS and PaaS, they have
become a hot topic since they provide the most secure environment to
deliver and use these services. They are called lightweight
virtualisation because they provide process level isolation, which means
they depend on the Linux kernel. Hence, only those instances can be
created which use the same base kernel. There are lots of containers
available for Linux that have gained popularity over the past few years.
Systemd-nspawn
This is a utility available by default with systemd, which creates
separate containers for isolation. It uses mount and PID namespaces by
default but another namespace can also be configured. To create a
container or isolated shell, you need to download a basic distribution
which we have done already, using debootstrap. To get inside this
container, use the code below:
This container is stronger then
chroot because it not only has a different mount point but also a separate process tree (check it by
ps -aux).
But still, the hostname and IP interfaces are the same as the host
system. To add your own network stack, you need to connect to the
existing network bridge.
This will start the container with the network namespace with a pair of
veth devices. You can even boot the instance by the
-b option, as follows:
Note: While booting the container, you will be
required to enter the password of the root user; so first run #passwd
inside to set the root password.
The whole
nspawn project is relatively young; hence there is still a lot that needs to be developed.
Docker
Docker is the smartest and most prominent container in Linux to run an
applications environment. Over the past few years, it has grabbed the
most attention. Docker containers use most of the namespaces and cgroups
present in systemd for providing a strong isolated environment. Docker
runs on the Docker daemon, which starts an isolated instance like
systemd-nspawn, in which any service can be deployed with just a few
tweaks. It can be used as a sandboxing tool to run applications securely
or to deploy some software service inside it.
To get your first Docker container running, you need to first start the
Docker daemon, and then download the base image from the Docker online
repository, as follows:
Note: You can also download other Docker images from the Docker Hub (https://hub.docker.com/).
It will download the base Kali Linux image. You can see all the available images on your system by using the following code:
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
kalilinux /kali-linux-docker
latest 63ae5ac8df0f 1 minute ago 325 MB
centos centos6 b9aeeaeb5e17 9 months ago 202.6 MB
hello-world latest 91c95931e552 9 months ago 910 B
|
To run a program inside your container, use the command given below:
bin dev home lib64 mnt proc run selinux sys usr
boot etc lib media opt root sbin srv tmp var
|
This will start (run) your container, execute the command and then
close the container. To get an interactive shell inside the container,
use the command given below:
This will get you inside the container where you can do your work, isolated from your host machine.
24a70cb3095a is your containers ID. You can check all the running containers by using the following command:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
24a70cb3095a kalilinux /kali-linux-docker /bin/bash About a minute ago Up About a minute angry_cori
|
While installing the Docker image, Docker automatically creates a
veth for Docker, which makes the Docker image connect to the main system. You can check this by using
#ifconfig and pinging your main system. At any instance, you can save your Docker state as a new container by using the code given below:
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
new_image latest a87c73abca9d 6 seconds ago 325 MB
kalilinux /kali-linux-docker
latest 63ae5ac8df0f 1 hours ago 325 MB
centos centos6 b9aeeaeb5e17 9 months ago 202.6 MB
hello-world latest 91c95931e552 9 months ago 910 B
|
You can remove that image by using
#docker rmi new_image. To stop a container, use
docker stop and after that remove the files created on the host node by that container.
For running applications on a Docker instance, you may require to
attach it to the host system in some way. So, to mount the external
storage to the Docker image, you can use the -v flag, as follows:
This will mount
/temp/ from the main system to the
/home/ of the host system. To attach the Docker port to an external system port, use
-p:
This will attach the external port 4567 to the containers port 80.
This can be very useful for SaaS and PaaS, provided that the deployed
application needs to connect to the external network. Running GUI
applications on Docker can often be another requirement. Docker doesn’t
have x server defined so, to do that, you need to mount the x server
file to the Docker instance.
This will forward the X11 socket to the container inside Docker. To
ship the Docker image to another system, you need to push it on the
Docker online repository, as follows:
You can even save the container image in the tar archive:
There is a lot more to learn on Docker, but going deeper into the
subject is not the purpose of this article. The positive point about
Docker is its many tutorials and hacks available online, from which you
can easily get a better understanding of how to use it to get your work
done. Since its first release in 2013, Docker has improved so much that
it can be deployed in a production or testing environment because it is
easy to use.
There are other solutions made for Docker, which are designed for all
scenarios. These include Kubernetes (a Google project for the
orchestration of Docker), Swarm and many more services for Docker
migrations, which provide graphical dashboards, etc. Automation tools
for systems admins like Puppet and Chef are also starting to provide
support to Docker containers. Even systemd has started to provide a
management utility for nspawn and other containers with a number of
tools like machinectl and journalctl.
machinectl
This comes pre-installed with the systemd init manager. It is used to
manage and control the state of the systemd based virtual machine, and
the container works underneath the systemd service. To see all
containers running in your system, use the command given below:
To get a status of any running container, use the command given below:
Note: machinectl doesnt show Docker containers, since the latter run behind the Docker daemon.
To log in to a container, use the command given below:
Switch off a container, as follows:
To kill a container forcefully, use the following command:
To view the logs of a container, you can use journalctl, as follows:
What to get from this article
Sandboxes are important for every IT professional, but different
professionals may require different solutions. If you are a developer or
application tester,
chroot may not be a good solution as it allows attackers to escape from the
chroot
jail. Weak containers like systemd-nspawn or firejail can be a good
solution because they are easy to deploy. Using Docker-like containers
for application testing can be a minor headache, as making your
container ready for your process to run smoothly can be a little
painful.
If you are a SaaS or PaaS provider, containers will always be the best
solution for you because of their strong isolation, easy shipping, live
migration and clustering-like features. You may go with traditional
virtualisation solutions (virtual machines), but resource management and
quick booting can only be got with containers.