重装Ubuntu 系统

  1. 备份重要目录: /home, /root
  2. docker移植,见:http://sunie.top/blog/post/sunie/修改docker容器存放位置
  3. 直接使用U盘install,会出现几个选项

第二个需要占用新的磁盘,因此只能选第一个。因为已经备份了重要目录,因此放手去做!分为以下步骤:

更改hostname:

hostnamectl set-hostname mofs

重新挂载磁盘:

默认磁盘挂载在/media/sy目录下

# 解除占用
fuser -m /dev/sda
kill -9 <上面显示的进程号>

# 重新挂载
umount /dev/sda
mkdir -p /mnt/data1
mount /dev/sda /mnt/data1

# 设置默认挂载
blkid	# 查看分区UUID及文件系统(ext4)

tee /etc/fstab <<-'EOF'
UUID=XXX	/mnt/data1	ext4	defaults	0	1	
EOF

mount -a # 验证配置正确?

开启ssh远程登录

apt install openssh-server
apt install net-tools
ssh-keygen -t rsa -q -f ~/.ssh/id_rsa -P ""

# root设置密码
passwd root
sed -i "s|#PermitRootLogin prohibit-password|PermitRootLogin yes|g" /etc/ssh/sshd_config
systemctl restart ssh

开启免密登录:

(base) [root@localhost ~]# ssh-copy-id root@192.168.0.185
(base) [root@localhost ~]# ssh root@192.168.0.185
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0601 for '/root/.ssh/id_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Load key "/root/.ssh/id_rsa": bad permissions
# 这是因为本地id_rsa权限不对
(base) [root@localhost ~]# chmod 600 ~/.ssh/id_rsa
(base) [root@localhost ~]# ssh root@192.168.0.185
Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.13.0-40-generic x86_64)

添加镜像源

(使用https会出现证书过期问题:does not have a Release file.

cp /etc/apt/sources.list /etc/apt/sources.list.bac
tee /etc/apt/sources.list <<-'EOF'
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
EOF
apt update

安装显卡驱动:

先更新软件并重启

不能使用官方的install driver!!!不用blacklist nouveau!!!

# "cc"/"make" not in path
apt install -y gcc make

# "fatal error: asm/kmap_types.h: No such file or directory"
apt install -y linux-headers-$(uname -r)

# The Nouveau kernel driver is currently in use by your system.  This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.
echo -e 'blacklist nouveau\noptions nouveau modeset=0' >> /etc/modprobe.d/blacklist.conf

update-initramfs -u
reboot  # 天杀的,双屏变单屏了,千万不要尝试这个方法!!!
lsmod | grep nouveau

cat /var/log/nvidia-installer.log

按照这个来:https://askubuntu.com/questions/1077061/how-do-i-install-nvidia-and-cuda-drivers-into-ubuntu/1077063#1077063 (Ubuntu 20.04 LTS, CUDA 11.5.0, NVIDIA 495 and libcudnn 8.0.4)

# 安装显卡驱动及cuda11.5
apt install -y nvidia-driver-495
reboot
wget https://developer.download.nvidia.com/compute/cuda/11.5.0/local_installers/cuda_11.5.0_495.29.05_linux.run
sh cuda_11.5.0_495.29.05_linux.run

tee /etc/profile.d/cuda.sh  <<-'EOF'
# set PATH for cuda 11.5 installation
if [ -d "/usr/local/cuda-11.5/bin/" ]; then
    export PATH=/usr/local/cuda-11.5/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-11.5/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
fi
EOF
source /etc/profile.d/cuda.sh

# 下载及安装cudnn(外网的cudnn下载太慢了!故使用本地备份)
url="http://sunie.tpddns.cn:9051/Server/files"
test -s libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb || wget "$url/libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb"
test -s libcudnn8-dev_8.1.0.77-1+cuda11.2_amd64.deb || wget "$url/libcudnn8-dev_8.1.0.77-1+cuda11.2_amd64.deb"
test -s libcudnn8-samples_8.1.0.77-1+cuda11.2_amd64.deb || wget "$url/libcudnn8-samples_8.1.0.77-1+cuda11.2_amd64.deb"

cp cuda/include/cudnn.h /usr/local/cuda/include
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

dpkg -i libcudnn8*

# 测试cudnn
apt install -y libfreeimage3 libfreeimage-dev

cp -r /usr/src/cudnn_samples_v8/ ~ 
cd ~/cudnn_samples_v8/mnistCUDNN/
make clean && make
./mnistCUDNN

conda

环境重建:anaconda环境重建

./anaconda/bin/conda init

pytorch

mkdir && cd whl
wget https://download.pytorch.org/whl/cu115/torch-1.11.0%2Bcu115-cp39-cp39-linux_x86_64.whl
wget https://download.pytorch.org/whl/cu115/torchaudio-0.11.0%2Bcu115-cp39-cp39-linux_x86_64.whl
wget https://download.pytorch.org/whl/cu115/torchvision-0.12.0%2Bcu115-cp39-cp39-linux_x86_64.whl

pip install *.whl

python -c "import torch;print(torch.cuda.device_count(), torch.cuda.get_device_name(0), torch.__version__)"
# 1 NVIDIA GeForce RTX 3060 1.11.0+cu115

下载docker

安装docker并绑定环境

apt install -y ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
apt update
apt install -y docker-ce docker-ce-cli containerd.io

# 设定容器存储位置
echo '{"graph":"/mnt/data1/docker"}' > /etc/docker/daemon.json
systemctl restart docker
docker info

# 下载docker-compose
curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

curl -L "http://d.sunie.top:9009/Server/files/docker-compose-Linux-x86_64" -o /usr/local/bin/docker-compose

curl: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

echo 'deb http://security.ubuntu.com/ubuntu xenial-security main' >> /etc/apt/sources.list
apt update
apt install libssl1.0.0

创建用户:

将重要目录还原

cd /mnt/data2/backup/home
cp -r gitlab /home/

for file in $(ls | tr " " "\?") 
do 
    if [ $file != "gitlab" ];then
    	echo $file 
    	useradd $file -m -p $file -s /bin/bash
    	cp -r $file/* /home/$file/
    fi
done

搭建Linux环境:

系统版本:Ubuntu20.04LTS

Clash

mkdir && cd clash
wget 'http://nas.sunie.top/files/clash-linux-amd64-v1.9.0.gz' 
tar -zxvf clash-linux-amd64-v1.9.0.gz
mv clash-linux-amd64 clash
chmod +x clash
wget -O config.yaml 'https://ipy.ipyipy.buzz/link/U5qRPYTMHb55YWSn?clash=2&log-level=info'
./clash -d .

nodejs

https://segmentfault.com/a/1190000039973959

cd /usr/local/src/
wget https://nodejs.org/dist/v14.16.1/node-v14.16.1-linux-x64.tar.xz    # 下载
tar xf node-v14.16.1-linux-x64.tar.xz         # 解压
cd node-v14.16.1-linux-x64/                   # 进入解压目录
ln -s /usr/local/src/node-v14.16.1-linux-x64/bin/node /usr/bin/node
ln -s /usr/local/src/node-v14.16.1-linux-x64/bin/npm /usr/bin/npm
node -v
npm -v

Typora+picgo+gitee

下载与主题:https://blog.csdn.net/y_universe/article/details/107184300

破解:https://www.bilibili.com/video/BV11a41187zh/

tar -zxvf Typora-linux-x64.tar.gz
cd bin/Typora-linux-x64/

echo "[Desktop Entry]
Name=Typora
Exec=`pwd`/Typora
Type=Application
Icon=`pwd`/resources/assets/icon/icon_512x512.png
" > /usr/share/applications/typora.desktop

cp ~/Downloads/app.asar resources/

picgo + gitee图床:https://blog.csdn.net/qq_20549061/article/details/106796119

注意仓gitee库名是网址上的,与实际显示不同,创建快捷方式:

echo "[Desktop Entry]
Name=picgo
Exec=/home/sy/Downloads/PicGo-2.3.0.AppImage
Type=Application
" > /usr/share/applications/picgo.desktop
image-20220313162245015 image-20220313162107723

必须要在中文的模式下才能选择PicGo(app)

flameshot

https://zhuanlan.zhihu.com/p/45919661

百度云

下载:https://zhuanlan.zhihu.com/p/77330173

直链下载助手:https://greasyfork.org/zh-CN/scripts/418182-百度网盘简易下载助手-直链下载复活版

Aria2:https://www.jianshu.com/p/d05d9226323a

免费的开源轻量级多协议命令行实用程序,可以从Internet上下载文件。它支持各种协议,例如HTTP,HTTPS,FTP甚至BitTorrent。Aria2可在Windows,Linux和Mac OSX上运行,下载测试:

 aria2c http://down.qq.com/qqweb/LinuxQQ/linuxqq_2.0.0-b2-1084_x86_64.rpm

正常下载百度云文件

迅雷

https://wwa.lanzoui.com/ij7qxgle7ud

VMware

下载:https://www.vmware.com/products/workstation-pro/workstation-pro-evaluation.html

安装:https://www.cnblogs.com/garyzhuang/p/9580062.html

打开后出错:https://askubuntu.com/questions/1096619/install-vmware-on-ubuntu-18-10-build-environment-error

激活&安装win11:https://dbmer.com/win/computer-course/vmware-install-windows11/

apt install -y perl gcc kernel-devel libX11 libXinerama libXcursor libXtst

./VMware-Workstation-Full-16.2.3-19376536.x86_64.bundle --console \
> --eulas-agreed \
> --required \
> -s vmware-workstation serialNumber MA491-6NL5Q-AZAM0-ZH0N2-AAJ5A

vmware-modconfig --install-status

VScode

官网下载:https://code.visualstudio.com/download

设置同步: settings sync,在如下文件设置github token,使用快捷键shift+alt+U同步

image-20220313165915151

显卡驱动相关踩坑

一、按教程探索

1 问题描述

运行tf-gpu和torch-gpu出现以下提示:

failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

root@sunie:/home/sunie# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

root@sunie:/home/sunie# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  460.32.03  Sun Dec 27 19:00:34 UTC 2020 
# 内核为460.32.03

root@sunie:/home/sunie# ls /usr/src | grep nvidia
nvidia-460.39
# 电脑/系统驱动为460.39

root@sunie:/home/sunie# ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001C8Csv0000103Csd00008478bc03sc00i00
vendor   : NVIDIA Corporation
model    : GP107M [GeForce GTX 1050 Ti Mobile]
driver   : nvidia-driver-450 - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-460 - distro non-free recommended
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-460-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

原因是:NVIDIA 内核驱动版本与系统驱动不一致

2 完全卸载驱动

将内核和系统启动都全部卸载,然后重装

sudo apt-get purge nvidia*
apt autoremove  # 一定要加上这个才能卸载干净

rmmod nvidia_uvm
rmmod nvidia_drm
rmmod nvidia_modeset
rmmod nvidia

sudo apt-get install nvidia-driver-460 nvidia-settings nvidia-prime

3 再次检查,依然有问题:

root@sunie:/home/sunie# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

root@sunie:/Server/files# cat /proc/driver/nvidia/version
cat: /proc/driver/nvidia/version: 没有那个文件或目录

root@sunie:/Server/files# ls /usr/src | grep nvidia
nvidia-460.39

root@sunie:/docker# apt-get install dkms
root@sunie:/docker# dkms install -m nvidia -v 460.39
Module nvidia/460.39 already installed on kernel 5.8.0-43-generic/x86_64

root@sunie:/docker# apt install nvidia-driver-460
正在读取软件包列表... 完成
正在分析软件包的依赖关系树       
正在读取状态信息... 完成       
nvidia-driver-460 已经是最新版 (460.39-0ubuntu0.20.04.1)。
升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 67 个软件包未被升级。

有人说重启就好,但我这是服务器,不可能随随便便重启!

二、最终解决方式

重启,完全卸载所有驱动,使用autoinstall重装

尝试保持驱动不更新时,出现以下情况:

root@sunie:/home/sunie# sudo apt-mark hold nvidia-460
E: 无法定位软件包 nvidia-460
E: 没有发现匹配的软件包

三、nvcc -v没有目录

添加环境变量并source,注意一定要source

export PATH=/usr/local/cuda-11.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PA

四、无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系

显卡驱动的安装还没有结束。某一天重启之后,显卡又不见了,按照上面的步骤重装。然而卡在最后一步:

vim /etc/modprobe.d/blacklist.conf

# 添加禁用组建:
# for nvidia display device install
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist nvidiafb

update-initramfs -u
reboot

add-apt-repository ppa:graphics-drivers/ppa
apt update
apt install nvidia-driver-460

五、no divices were found

https://www.nvidia.com/download/driverResults.aspx/166883/en-us