2017年12月5日 星期二

Many approaches to sandboxing in Linux

You can isolate malicious programs or risky tasks by sandboxing them in different ways to stop them from affecting your main system. This article gives the reader a working knowledge of sandboxing in Linux.
Securing your system is a big priority for every production environment, whether you are a systems admin or a software developer. The best way to secure your operating system from doubtful programs or processes is by sandboxing (also termed as jailing). Sandboxing involves providing a safe environment for a program or software so that you can play around with it without hurting your system. It actually keeps your program isolated from the rest of the system, by using any one of the different methods available in the Linux kernel. Sandboxing can be useful to systems administrators if they want to test their tasks without any damage and also to developers for testing their pieces of code. A sandbox can help you to create a different environment from your base operating system. It has become trendy due to its extensive use by PaaS and SaaS providers.
The idea of jailing is not new since it has been available in UNIX based BSD OSs. For years, BSD has used the concept of ‘jails’, while Solaris has used ‘zones’. But in Linux, this concept was started with chroot and has been possible because namespaces are present in the Linux kernel.
Namespaces
Namespaces are features available in Linux to isolate processes in different system resource aspects. There are six types of namespaces available up to kernel 4.0. And more will be added in the future. These are:
  • mnt (mount points, file systems)
  • pid (processes)
  • net (network stack)
  • ipc (system V IPC)
  • uts (host name)
  • user (UIDs)
Linux namespaces are not new. The first one was added to Linux in 2008 (Linux kernel 2.6), but they became more widely used only in Linux kernel 3.6, when work on the most complex of them all — the users namespace — was completed. Linux kernel uses clone(), unshare() and setns() system calls to create and control namespaces.
Creation of new namespaces is done by the clone() system call, which is also used to start a process. The setns() system call adds a running process to the existing namespace. The unshare() call works on a process inside the namespace, and makes the caller a member of the namespace. Its main purpose is to isolate the namespace without having to create a new process or thread (as is done by clone()).You can directly use some services to get the features of these namespaces. CLONE_NEW* identifiers are used with these system calls to identify the type of namespace. These three system calls make use of the CLONE_NEW* as CLONE_NEWIPC, CLONE_NEWNS, CLONE_NEWNET, CLONE_NEWPID, CLONE_NEWUSER, and CLONE_NEWUTS. A process in a namespace can be different because of its unique inode number when it is created.
#ls -al /proc//ns
lrwxrwxrwx 1 root root 0 Feb 7 13:52 ipc -> ipc:[4026532253]
lrwxrwxrwx 1 root root 0 Feb 7 15:39 mnt -> mnt:[4026532251]
lrwxrwxrwx 1 root root 0 Feb 7 13:52 net -> net:[4026531957]
lrwxrwxrwx 1 root root 0 Feb 7 13:52 pid -> pid:[4026532254]
lrwxrwxrwx 1 root root 0 Feb 7 13:52 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Feb 7 15:39 uts -> uts:[4026532252]
Mount namespace: A process views different mount points other than the original system mount point. It creates a separate file system tree associated with different processes, which restricts them from making changes to the root file system.
PID namespace: PID namespace isolates a process ID from the main PID hierarchy. A process inside a PID namespace can have the same PID as a process outside it, and even inside the namespace, you can have different init with PID 1.
UTS namespace: In the UTS (UNIX Timesharing System) namespace, a process can have a different set of domain names and host names than the main system. It uses sethostname() and setdomainname() to do that.
IPC namespace: This is used for inter-process communication resources isolation and POSIX message queues.
User namespace: This isolates user and group IDs inside a namespace, which is allowed to have the same UID or GID in the namespace as in the host machine. In your system, unprivileged processes can create user namespaces in which they have full privileges.
Network namespace: Inside this namespace, processes can have different network stacks, i.e., different network devices, IP addresses, routing tables, etc.
Sandboxing tools available in Linux use this namespaces feature to isolate a process or create a new virtual environment. A much more secure tool will be that which uses maximum namespaces for isolation. Now, let’s talk about different methods of sandboxing, from soft to hard isolation.
chroot
chroot is the oldest sandboxing tool available in Linux. Its work is the same as mount namespace, but it is implemented much earlier. chroot changes the root directory for a process to any chroot directory (like /chroot). As the root directory is the top of the file system hierarchy, applications are unable to access directories higher up than the root directory, and so are isolated from the rest of the system. This prevents applications inside the chroot from interfering with files elsewhere on your computer. To create an isolated environment in old SystemV based operating systems, you first need to copy all required packages and libraries to that directory. For demonstration purposes, I am running ‘ls’ on the chroot directory.
First, create a directory to set as root a file system for a process:
#mkdir /chroot
Next, make the required directory inside it.
#mkdir /chroot/{lib,lib64,bin,etc}
Now, the most important step is to copy the executable and libraries. To get the shell inside the chroot, you also need /bin/bash.
#cp -v /bin/{bash,ls} /chroot/bin
To see the libraries required for this script, run the following command:
#ldd /bin/bash
linux-vdso.so.1 (0x00007fff70deb000)
libncurses.so.5 => /lib/x86_64-linux-gnu/libncurses.so.5 (0x00007f25e33a9000)
libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f25e317f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f25e2f7a000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f25e2bd6000)
/lib64/ld-linux-x86-64.so.2 (0x00007f25e360d000)
 
#ldd /bin/ls
linux-vdso.so.1 (0x00007fff4f8e6000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f9f00aec000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9f00748000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f9f004d7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9f002d3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9f00d4f000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9f000b6000)
Now, copy these files to the lib or lib64 of /chroot
as required.
Once you have copied all the necessary files, it’s time to enter the chroot.
#sudo chroot /chroot/ /bin/bash
You will be prompted with a shell running inside your virtual environment. Here, you don’t have much to run besides ls, but it has changed the root file system for this process to /chroot.
To get a more full-featured environment you can use the debootstrap utility to bootstrap a basic Debian system:
#debootstrap --arch=amd64 unstable my_deb/
It will download a minimal system to run under chroot. You can use this to even test 32-bit applications on 64-bit systems or for testing your program before installation. To get process management, mount proc to the chroot, and to make the contents of home ‘lost on exit’, mount tmpfs at /home//:
#sudo mount -o bind /proc my_deb/proc
#mount -t tmpfs -o size=100m tmpfs /home/user
To get Internet connection inside, use the following command:
#sudo cp /etc/resolv.conf /var/chroot/etc/resolv.conf
After that, you are ready to enter your environment.
#chroot my_deb/ /bin/bash
Here, you get a whole basic operating system inside your chroot. But it differs from your main system by mount point, because it only uses the mount property as the isolator. It has the same hostname, IP address and process running as in the main system. That’s why it is much less secure (this is even mentioned in the man page of chroot), and any running process can still harm your computer by killing your tasks or affecting network based services.
Note: To run graphical applications inside chroot, open x server by running the following command on the main system:
#xhost +
and on chroot system
#export DISPLAY=:0.0
On systemd based systems, chrooting is pretty straightforward. It’s needed to define the root directory on the processes unit file only.
[Unit]
Description=my_chroot_Service
[Service]
RootDirectory=/chroot/foobar
ExecStartPre=/usr/local/bin/pre.sh
ExecStart=/bin/my_program
RootDirectoryStartOnly=yes
Here RootDirectory shows where the root directory is for the foobar process.
Note: The program script path has to be inside chroot, which makes the full path of that process script as /chroot/bin/my_program.
Before the daemon is started, a shell script pre.sh is invoked, the purpose of which is to set up the chroot environment as necessary, i.e., mount /proc and similar file systems into it, depending on what the service might need. You can start your service by using the following command:
#systemctl start my_chroot_Service.service
Ip-netns
The Ip-netns utility is one of the few that directly use network namespaces to create virtual interfaces. To create a new network namespace, use the following command:
#ip netns add netns1
To check the interfaces inside, use the command shown below:
#ip netns exec netns ip addr
You can even get the shell inside it, as follows:
#ip netns exec netns /bin/bash
This will take you inside the network namespace, which has only a single network interface with no IP. So, you are not connected with the external network and also can’t ping.
#ip netns exec netns ip link set dev lo up
This will bring the loop interface up. But to connect to the external network you need to create a virtual Ethernet and add it to netns as follows:
# ip link add veth0 type veth peer name veth1
# ip link set veth1 netns netns1
Now, it’s time to set the IP to these devices, as follows:
# ip netns exec netns1 ifconfig veth1 10.1.1.1/24 up
# ifconfig veth0 10.1.1.2/24 up
Unshare
The unshare utility is used to create any namespace isolated environment and run a program or shell inside it.
To get a network namespace and run the shell inside it, use the command shown below:
#unshare --net /bin/bash
The shell you get back will come with a different network stack. You can check this by using #ip addr, as follows:
1: lo: mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
To create a user namespace environment, use the following command:
#unshare --user /bin/bash
You can check your user inside the shell by using the command below:
#whoami
nobody
To get the PID namespace, use the following command:
#unshare --pid --fork /bin/bash
Inside this namespace, you can see all the processes but cannot kill any.
#ps -aux |grep firefox
root 1110 42.6 11.0 1209424 436756 tty1 Sl 23:36 0:15 .firefox1/./firefox
root 1208 0.0 0.0 12660 1648 pts/2 S+ 23:37 0:00 grep firefox
#kill 1110
bash: kill: (1110) - No such process
To get a whole different degree of process tree isolation you need to mount another proc for the namespace, as follows:
unshare --pid --fork --mount-proc /bin/bash
In this way, you can use unshare to create a single namespace. More about it can be found out on the man page of unshare.
Note: A namespace created by using unshare can also be combined to create a single shell which uses different namespaces. For example:
#unshare --pid --fork --user /bin/bash
This will create an isolated environment using the PID and user namespaces.
Firejail
Firejail is an SUID sandbox program that is used to isolate programs for testing or security purposes. It is written in C and can be configured to use most of the namespaces. To start a service in firejail, use the following command:
#firejail firefox
It will start Firefox in a sandbox with the root file system mounted as read only. To start Firefox with only ~/Downloads and ~/.mozilla mounted to write, use the following command:
#firejail --whitelist=~/.mozilla --whitelist=~/Download firefox
Firejail, by default, uses the user namespace and mounts empty temporary file systems (tmpfs) on top of the user home directory in private mode. To start a program in private mode, use the command given below:
#firejail --private firefox
To start firejail in a new network stack, use the following command:
#firejail --net=eth0 --whitelist=~/.mozilla --whitelist=~/Download firefox
To assign an IP address to the sandbox, use the following command:
#firejail --net=eth0 --ip=192.168.1.155 firefox
Note: To sandbox all programs running by a user, you can change the default shell of that user to /usr/bin/firejail.
#chsh –shell /usr/bin/firejail
Containers
When learning about virtualisation technologies, what attracted me most were containers because of their easy deployment. Containers (also known as lightweight virtualisation) are tools for isolation, which use namespaces for the purpose. They are a better sandboxing utility, because they generally use more then one namespace and are more focused on creating a whole virtual system instance rather than isolating a single process.
Containers are not a new technology. They have been in UNIX and Linux for decades but due to their increasing use in SaaS and PaaS, they have become a hot topic since they provide the most secure environment to deliver and use these services. They are called lightweight virtualisation because they provide process level isolation, which means they depend on the Linux kernel. Hence, only those instances can be created which use the same base kernel. There are lots of containers available for Linux that have gained popularity over the past few years.
Systemd-nspawn
This is a utility available by default with systemd, which creates separate containers for isolation. It uses mount and PID namespaces by default but another namespace can also be configured. To create a container or isolated shell, you need to download a basic distribution which we have done already, using debootstrap. To get inside this container, use the code below:
#systemd-nspawn -D my_deb
This container is stronger then chroot because it not only has a different mount point but also a separate process tree (check it by ps -aux). But still, the hostname and IP interfaces are the same as the host system. To add your own network stack, you need to connect to the existing network bridge.
#systemd-nspawn -D my_deb --network-bridge=br0
This will start the container with the network namespace with a pair of veth devices. You can even boot the instance by the -b option, as follows:
#systemd-nspawn -bD my_deb
Note: While booting the container, you will be required to enter the password of the root user; so first run #passwd inside to set the root password.
The whole nspawn project is relatively young; hence there is still a lot that needs to be developed.
Docker
Docker is the smartest and most prominent container in Linux to run an applications environment. Over the past few years, it has grabbed the most attention. Docker containers use most of the namespaces and cgroups present in systemd for providing a strong isolated environment. Docker runs on the Docker daemon, which starts an isolated instance like systemd-nspawn, in which any service can be deployed with just a few tweaks. It can be used as a sandboxing tool to run applications securely or to deploy some software service inside it.
To get your first Docker container running, you need to first start the Docker daemon, and then download the base image from the Docker online repository, as follows:
#service docker start
#docker pull kalilinux/kali-linux-docker
Note: You can also download other Docker images from the Docker Hub (https://hub.docker.com/).
It will download the base Kali Linux image. You can see all the available images on your system by using the following code:
#docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
kalilinux/kali-linux-docker
latest 63ae5ac8df0f 1 minute ago 325 MB
centos centos6 b9aeeaeb5e17 9 months ago 202.6 MB
hello-world latest 91c95931e552 9 months ago 910 B
To run a program inside your container, use the command given below:
#docker run -i -t kalilinux/kali-linux-docker ls
bin dev home lib64 mnt proc run selinux sys usr
boot etc lib media opt root sbin srv tmp var
This will start (run) your container, execute the command and then close the container. To get an interactive shell inside the container, use the command given below:
#docker run -t -i kalilinux/kali-linux-docker /bin/bash
root@24a70cb3095a:/#
This will get you inside the container where you can do your work, isolated from your host machine. 24a70cb3095a is your container’s ID. You can check all the running containers by using the following command:
#docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
24a70cb3095a kalilinux/kali-linux-docker /bin/bash” About a minute ago Up About a minute angry_cori
While installing the Docker image, Docker automatically creates a veth for Docker, which makes the Docker image connect to the main system. You can check this by using #ifconfig and pinging your main system. At any instance, you can save your Docker state as a new container by using the code given below:
#docker commit 24a70cb3095a new_image
#docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
new_image latest a87c73abca9d 6 seconds ago 325 MB
kalilinux/kali-linux-docker
latest 63ae5ac8df0f 1 hours ago 325 MB
centos centos6 b9aeeaeb5e17 9 months ago 202.6 MB
hello-world latest 91c95931e552 9 months ago 910 B
You can remove that image by using #docker rmi new_image. To stop a container, use docker stop and after that remove the files created on the host node by that container.
#docker stop 24a70cb3095a
#docker rm 24a70cb3095a
For running applications on a Docker instance, you may require to attach it to the host system in some way. So, to mount the external storage to the Docker image, you can use the -v flag, as follows:
#docker run -it -v /temp/:/home/ kalilinux/kali-linux-docker /bin/bash
This will mount /temp/ from the main system to the /home/ of the host system. To attach the Docker port to an external system port, use  -–p:
#docker run -it -v /temp/:/home/ -p 4567:80 kalilinux/kali-linux-docker /bin/bash
This will attach the external port 4567 to the container’s port 80. This can be very useful for SaaS and PaaS, provided that the deployed application needs to connect to the external network. Running GUI applications on Docker can often be another requirement. Docker doesn’t have x server defined so, to do that, you need to mount the x server file to the Docker instance.
#docker run -it -v -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix \ kalilinux/kali-linux-docker /bin/bash
This will forward the X11 socket to the container inside Docker. To ship the Docker image to another system, you need to push it on the Docker online repository, as follows:
#docker push new_image
You can even save the container image in the tar archive:
#docker export new_image
There is a lot more to learn on Docker, but going deeper into the subject is not the purpose of this article. The positive point about Docker is its many tutorials and hacks available online, from which you can easily get a better understanding of how to use it to get your work done. Since its first release in 2013, Docker has improved so much that it can be deployed in a production or testing environment because it is easy to use.
There are other solutions made for Docker, which are designed for all scenarios. These include Kubernetes (a Google project for the orchestration of Docker), Swarm and many more services for Docker migrations, which provide graphical dashboards, etc. Automation tools for systems admins like Puppet and Chef are also starting to provide support to Docker containers. Even systemd has started to provide a management utility for nspawn and other containers with a number of tools like machinectl and journalctl.
machinectl
This comes pre-installed with the systemd init manager. It is used to manage and control the state of the systemd based virtual machine, and the container works underneath the systemd service. To see all containers running in your system, use the command given below:
#machinectl -a
To get a status of any running container, use the command given below:
#machinectl status my_deb
Note: machinectl doesn’t show Docker containers, since the latter run behind the Docker daemon.
To log in to a container, use the command given below:
#machinectl login my_deb
Switch off a container, as follows:
#machinectl poweroff my_deb
To kill a container forcefully, use the following command:
#machinectl -s kill my_deb
To view the logs of a container, you can use journalctl, as follows:
#journalctl -M my_deb
What to get from this article
Sandboxes are important for every IT professional, but different professionals may require different solutions. If you are a developer or application tester, chroot may not be a good solution as it allows attackers to escape from the chroot jail. Weak containers like systemd-nspawn or firejail can be a good solution because they are easy to deploy. Using Docker-like containers for application testing can be a minor headache, as making your container ready for your process to run smoothly can be a little painful.
If you are a SaaS or PaaS provider, containers will always be the best solution for you because of their strong isolation, easy shipping, live migration and clustering-like features. You may go with traditional virtualisation solutions (virtual machines), but resource management and quick booting can only be got with containers.

沒有留言: