重装Ubuntu 系统
- 备份重要目录: /home, /root
- docker移植,见:http://sunie.top/blog/post/sunie/修改docker容器存放位置
- 直接使用U盘install,会出现几个选项
第二个需要占用新的磁盘,因此只能选第一个。因为已经备份了重要目录,因此放手去做!分为以下步骤:
更改hostname:
hostnamectl set-hostname mofs
重新挂载磁盘:
默认磁盘挂载在/media/sy
目录下
# 解除占用
fuser -m /dev/sda
kill -9 <上面显示的进程号>
# 重新挂载
umount /dev/sda
mkdir -p /mnt/data1
mount /dev/sda /mnt/data1
# 设置默认挂载
blkid # 查看分区UUID及文件系统(ext4)
tee /etc/fstab <<-'EOF'
UUID=XXX /mnt/data1 ext4 defaults 0 1
EOF
mount -a # 验证配置正确?
开启ssh远程登录
apt install openssh-server
apt install net-tools
ssh-keygen -t rsa -q -f ~/.ssh/id_rsa -P ""
# root设置密码
passwd root
sed -i "s|#PermitRootLogin prohibit-password|PermitRootLogin yes|g" /etc/ssh/sshd_config
systemctl restart ssh
开启免密登录:
(base) [root@localhost ~]# ssh-copy-id root@192.168.0.185
(base) [root@localhost ~]# ssh root@192.168.0.185
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: UNPROTECTED PRIVATE KEY FILE! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0601 for '/root/.ssh/id_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Load key "/root/.ssh/id_rsa": bad permissions
# 这是因为本地id_rsa权限不对
(base) [root@localhost ~]# chmod 600 ~/.ssh/id_rsa
(base) [root@localhost ~]# ssh root@192.168.0.185
Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.13.0-40-generic x86_64)
添加镜像源
(使用https会出现证书过期问题:does not have a Release file.
)
cp /etc/apt/sources.list /etc/apt/sources.list.bac
tee /etc/apt/sources.list <<-'EOF'
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
EOF
apt update
安装显卡驱动:
先更新软件并重启
不能使用官方的install driver!!!不用blacklist nouveau!!!
# "cc"/"make" not in path
apt install -y gcc make
# "fatal error: asm/kmap_types.h: No such file or directory"
apt install -y linux-headers-$(uname -r)
# The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.
echo -e 'blacklist nouveau\noptions nouveau modeset=0' >> /etc/modprobe.d/blacklist.conf
update-initramfs -u
reboot # 天杀的,双屏变单屏了,千万不要尝试这个方法!!!
lsmod | grep nouveau
cat /var/log/nvidia-installer.log
按照这个来:https://askubuntu.com/questions/1077061/how-do-i-install-nvidia-and-cuda-drivers-into-ubuntu/1077063#1077063 (Ubuntu 20.04 LTS, CUDA 11.5.0, NVIDIA 495 and libcudnn 8.0.4)
# 安装显卡驱动及cuda11.5
apt install -y nvidia-driver-495
reboot
wget https://developer.download.nvidia.com/compute/cuda/11.5.0/local_installers/cuda_11.5.0_495.29.05_linux.run
sh cuda_11.5.0_495.29.05_linux.run
tee /etc/profile.d/cuda.sh <<-'EOF'
# set PATH for cuda 11.5 installation
if [ -d "/usr/local/cuda-11.5/bin/" ]; then
export PATH=/usr/local/cuda-11.5/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.5/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
fi
EOF
source /etc/profile.d/cuda.sh
# 下载及安装cudnn(外网的cudnn下载太慢了!故使用本地备份)
url="http://sunie.tpddns.cn:9051/Server/files"
test -s libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb || wget "$url/libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb"
test -s libcudnn8-dev_8.1.0.77-1+cuda11.2_amd64.deb || wget "$url/libcudnn8-dev_8.1.0.77-1+cuda11.2_amd64.deb"
test -s libcudnn8-samples_8.1.0.77-1+cuda11.2_amd64.deb || wget "$url/libcudnn8-samples_8.1.0.77-1+cuda11.2_amd64.deb"
cp cuda/include/cudnn.h /usr/local/cuda/include
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
dpkg -i libcudnn8*
# 测试cudnn
apt install -y libfreeimage3 libfreeimage-dev
cp -r /usr/src/cudnn_samples_v8/ ~
cd ~/cudnn_samples_v8/mnistCUDNN/
make clean && make
./mnistCUDNN
conda
环境重建:anaconda环境重建
./anaconda/bin/conda init
pytorch
mkdir && cd whl
wget https://download.pytorch.org/whl/cu115/torch-1.11.0%2Bcu115-cp39-cp39-linux_x86_64.whl
wget https://download.pytorch.org/whl/cu115/torchaudio-0.11.0%2Bcu115-cp39-cp39-linux_x86_64.whl
wget https://download.pytorch.org/whl/cu115/torchvision-0.12.0%2Bcu115-cp39-cp39-linux_x86_64.whl
pip install *.whl
python -c "import torch;print(torch.cuda.device_count(), torch.cuda.get_device_name(0), torch.__version__)"
# 1 NVIDIA GeForce RTX 3060 1.11.0+cu115
下载docker
安装docker并绑定环境
apt install -y ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
apt update
apt install -y docker-ce docker-ce-cli containerd.io
# 设定容器存储位置
echo '{"graph":"/mnt/data1/docker"}' > /etc/docker/daemon.json
systemctl restart docker
docker info
# 下载docker-compose
curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
curl -L "http://d.sunie.top:9009/Server/files/docker-compose-Linux-x86_64" -o /usr/local/bin/docker-compose
curl: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory
echo 'deb http://security.ubuntu.com/ubuntu xenial-security main' >> /etc/apt/sources.list
apt update
apt install libssl1.0.0
创建用户:
将重要目录还原
cd /mnt/data2/backup/home
cp -r gitlab /home/
for file in $(ls | tr " " "\?")
do
if [ $file != "gitlab" ];then
echo $file
useradd $file -m -p $file -s /bin/bash
cp -r $file/* /home/$file/
fi
done
搭建Linux环境:
系统版本:Ubuntu20.04LTS
Clash
mkdir && cd clash
wget 'http://nas.sunie.top/files/clash-linux-amd64-v1.9.0.gz'
tar -zxvf clash-linux-amd64-v1.9.0.gz
mv clash-linux-amd64 clash
chmod +x clash
wget -O config.yaml 'https://ipy.ipyipy.buzz/link/U5qRPYTMHb55YWSn?clash=2&log-level=info'
./clash -d .
nodejs
https://segmentfault.com/a/1190000039973959
cd /usr/local/src/
wget https://nodejs.org/dist/v14.16.1/node-v14.16.1-linux-x64.tar.xz # 下载
tar xf node-v14.16.1-linux-x64.tar.xz # 解压
cd node-v14.16.1-linux-x64/ # 进入解压目录
ln -s /usr/local/src/node-v14.16.1-linux-x64/bin/node /usr/bin/node
ln -s /usr/local/src/node-v14.16.1-linux-x64/bin/npm /usr/bin/npm
node -v
npm -v
Typora+picgo+gitee
下载与主题:https://blog.csdn.net/y_universe/article/details/107184300
破解:https://www.bilibili.com/video/BV11a41187zh/
tar -zxvf Typora-linux-x64.tar.gz
cd bin/Typora-linux-x64/
echo "[Desktop Entry]
Name=Typora
Exec=`pwd`/Typora
Type=Application
Icon=`pwd`/resources/assets/icon/icon_512x512.png
" > /usr/share/applications/typora.desktop
cp ~/Downloads/app.asar resources/
picgo + gitee图床:https://blog.csdn.net/qq_20549061/article/details/106796119
注意仓gitee库名是网址上的,与实际显示不同,创建快捷方式:
echo "[Desktop Entry]
Name=picgo
Exec=/home/sy/Downloads/PicGo-2.3.0.AppImage
Type=Application
" > /usr/share/applications/picgo.desktop
必须要在中文的模式下才能选择PicGo(app)
flameshot
https://zhuanlan.zhihu.com/p/45919661
百度云
下载:https://zhuanlan.zhihu.com/p/77330173
直链下载助手:https://greasyfork.org/zh-CN/scripts/418182-百度网盘简易下载助手-直链下载复活版
Aria2:https://www.jianshu.com/p/d05d9226323a
免费的开源轻量级多协议命令行实用程序,可以从Internet上下载文件。它支持各种协议,例如HTTP,HTTPS,FTP甚至BitTorrent。Aria2可在Windows,Linux和Mac OSX上运行,下载测试:
aria2c http://down.qq.com/qqweb/LinuxQQ/linuxqq_2.0.0-b2-1084_x86_64.rpm
正常下载百度云文件
迅雷
https://wwa.lanzoui.com/ij7qxgle7ud
VMware
下载:https://www.vmware.com/products/workstation-pro/workstation-pro-evaluation.html
安装:https://www.cnblogs.com/garyzhuang/p/9580062.html
打开后出错:https://askubuntu.com/questions/1096619/install-vmware-on-ubuntu-18-10-build-environment-error
激活&安装win11:https://dbmer.com/win/computer-course/vmware-install-windows11/
apt install -y perl gcc kernel-devel libX11 libXinerama libXcursor libXtst
./VMware-Workstation-Full-16.2.3-19376536.x86_64.bundle --console \
> --eulas-agreed \
> --required \
> -s vmware-workstation serialNumber MA491-6NL5Q-AZAM0-ZH0N2-AAJ5A
vmware-modconfig --install-status
VScode
官网下载:https://code.visualstudio.com/download
设置同步: settings sync
,在如下文件设置github token,使用快捷键shift+alt+U
同步
显卡驱动相关踩坑
一、按教程探索
1 问题描述
运行tf-gpu和torch-gpu出现以下提示:
failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
root@sunie:/home/sunie# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
root@sunie:/home/sunie# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 460.32.03 Sun Dec 27 19:00:34 UTC 2020
# 内核为460.32.03
root@sunie:/home/sunie# ls /usr/src | grep nvidia
nvidia-460.39
# 电脑/系统驱动为460.39
root@sunie:/home/sunie# ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001C8Csv0000103Csd00008478bc03sc00i00
vendor : NVIDIA Corporation
model : GP107M [GeForce GTX 1050 Ti Mobile]
driver : nvidia-driver-450 - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-460 - distro non-free recommended
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-460-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
原因是:NVIDIA 内核驱动版本与系统驱动不一致
2 完全卸载驱动
将内核和系统启动都全部卸载,然后重装
sudo apt-get purge nvidia*
apt autoremove # 一定要加上这个才能卸载干净
rmmod nvidia_uvm
rmmod nvidia_drm
rmmod nvidia_modeset
rmmod nvidia
sudo apt-get install nvidia-driver-460 nvidia-settings nvidia-prime
3 再次检查,依然有问题:
root@sunie:/home/sunie# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
root@sunie:/Server/files# cat /proc/driver/nvidia/version
cat: /proc/driver/nvidia/version: 没有那个文件或目录
root@sunie:/Server/files# ls /usr/src | grep nvidia
nvidia-460.39
root@sunie:/docker# apt-get install dkms
root@sunie:/docker# dkms install -m nvidia -v 460.39
Module nvidia/460.39 already installed on kernel 5.8.0-43-generic/x86_64
root@sunie:/docker# apt install nvidia-driver-460
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
nvidia-driver-460 已经是最新版 (460.39-0ubuntu0.20.04.1)。
升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 67 个软件包未被升级。
有人说重启就好,但我这是服务器,不可能随随便便重启!
二、最终解决方式
重启,完全卸载所有驱动,使用autoinstall重装
尝试保持驱动不更新时,出现以下情况:
root@sunie:/home/sunie# sudo apt-mark hold nvidia-460
E: 无法定位软件包 nvidia-460
E: 没有发现匹配的软件包
三、nvcc -v没有目录
添加环境变量并source,注意一定要source
export PATH=/usr/local/cuda-11.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PA
四、无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系
显卡驱动的安装还没有结束。某一天重启之后,显卡又不见了,按照上面的步骤重装。然而卡在最后一步:
vim /etc/modprobe.d/blacklist.conf
# 添加禁用组建:
# for nvidia display device install
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist nvidiafb
update-initramfs -u
reboot
add-apt-repository ppa:graphics-drivers/ppa
apt update
apt install nvidia-driver-460
五、no divices were found
https://www.nvidia.com/download/driverResults.aspx/166883/en-us
评论区