Python定时数据脚本迁移 Docker 实践

概念

Docker 使用 Go 进行开发实现,基于 Linux 内核的 cgroup,namespace,以及 AUFS 类的 Union FS 等技术,对进程进行封装隔离,属于 操作系统层面的虚拟化技术。由于隔离的进程独立于宿主和其它的隔离的进程,因此也称其为容器。Docker 包括三个基本概念:镜像( Image ),容器( Container ),仓库( Repository )。

  • 更高效的利用系统资源
    由于容器不需要进行硬件虚拟以及运行完整操作系统等额外开销,Docker 对系统资源的利用率更高。无论是应用执行速度、内存损耗或者文件存储速度,都要比传统虚拟机技术更高效。

  • 更快速的启动时间
    传统的虚拟机技术启动应用服务往往需要数分钟,而 Docker 容器应用,由于直接运行于宿主内核,无需启动完整的操作系统,因此可以做到秒级、甚至毫秒级的启动时间。大大的节约了开发、测试、部署的时间。

  • 一致的运行环境
    开发过程中一个常见的问题是环境一致性问题。由于开发环境、测试环境、生产环境不一致,导致有些 bug 并未在开发过程中被发现。而 Docker 的镜像提供了除内核外完整的运行时环境

  • 持续交付和部署
  • 更轻松的迁移
  • 更轻松的维护和扩展
    Docker 使用的分层存储以及镜像的技术,使得应用重复部分的复用更为容易,也使得应用的
    维护更新更加简单,基于基础镜像进一步扩展镜像也变得非常简单。此

目标

为达到更快持续交付和部署、更方便的迁移。数据统计相关Python脚本整体迁移至 Docker。

实践

1. 安装 Docker

2. 准备镜像

a. 准备统计脚本源文件 /python3-project-all ,如抓取 JIRA 上的 Bug 信息:

import requests  
from mysql_connect import Connect  
from config import Bj3Mysql  
def obtain_insert_data():  
    project_url = 'http://bug.xingshulin.com/rest/api/2/project'
    search_url = 'http://bug.xingshulin.com/rest/api/2/search'

    s = requests.Session()  # save the login cookie
    s.auth = ('username', 'password')  

    total = s.post(search_url, json={"startAt": 0, "maxResults": 1}).json()['total']
    insert_list = []

    for startAt in range(0, total, 1000):
        search_params = {
            "startAt": startAt,
            "maxResults": 1000
        }
        print("Begin to get the issues from {} to {}".format(startAt, startAt+1000))

        issues = s.post(search_url, json=search_params).json()['issues']

        for issue in issues:
            fields = issue['fields']
            components_str = list_to_str(fields, 'components')
            version_str = list_to_str(fields, 'versions')
            fix_version_str = list_to_str(fields, 'fixVersions')
            resolution_str = fields['resolution']['name'] if fields['resolution'] else None
            assignee = fields['assignee']['displayName'] if fields['assignee'] else None
            environment = fields['environment'].replace('"', '') if fields['environment'] else None

            tmp_value = (issue['id'], fields['creator']['displayName'], fields['created'],
                         fields['issuetype']['name'], fields['summary'].replace('"', ' '),
                         fields['priority']['name'], components_str, environment,
                         fields['status']['name'], resolution_str, version_str,
                         fix_version_str, fields['updated'], fields['resolutiondate'],
                         fields['project']['name'], assignee)
            insert_list.append(tmp_value)
    print("The script get %s issues info in total." %len(insert_list))
    return insert_list
def list_to_str(fields, key):  
    if not fields[key]:
        return None
    func = lambda x: " ".join(x)
    result_list = [x['name'] for x in fields[key]]
    result = func(result_list)
    return result
def update_info_to_statistics(insert_list):  
    insert_update_sql = 'insert into xsl_statistics.jira_bug_info(bug_id, creator, created_time, bug_type, summary, priority, ' \
                        'components, environment, status, resolution, affect_version, fix_version, updated_time, ' \
                        'resolution_time, project, assignee) ' \
                        'values("{0}","{1}","{2}","{3}","{4}","{5}","{6}","{7}","{8}","{9}","{10}","{11}","{12}","{13}","{14}", "{15}") ' \
                        'on duplicate key update creator = "{1}", created_time = "{2}", bug_type = "{3}", summary = "{4}", ' \
                        'priority = "{5}", components = "{6}", environment = "{7}", status = "{8}", resolution = "{9}", ' \
                        'affect_version = "{10}", fix_version = "{10}", updated_time = "{12}", ' \
                        'resolution_time = "{13}", project = "{14}", assignee = "{15}"'

    conn = Connect(Bj3Mysql)
    try:
        for i in insert_list:
            update_sql = insert_update_sql.format(i[0],i[1],i[2],i[3],i[4],i[5],i[6],i[7],
                                                  i[8],i[9],i[10],i[11],i[12],i[13],i[14], i[15])
            #print(update_sql)
            conn.cursor.execute(update_sql)
        conn.connection.commit()
    except BaseException as e:
        print("updating failed", e)
    finally:
        conn.close_connect()
if __name__ == "__main__":  
    insert_list = obtain_insert_data()
    update_info_to_statistics(insert_list)

b. Docker 初始化配置Dockerfile

FROM  python:3.5

ADD ./sources.list  /etc/apt/sources.list  
RUN apt-get update && apt-get install -y python-mysqldb python3-dev libmysqlclient-dev cron

RUN  cp  /usr/local/lib/python3.5/configparser.py /usr/local/lib/python3.5/ConfigParser.py && \  
pip install python_dateutil requests datetime pytz  pymysql mysqlclient  MySQL-python numpy openpyxl pandas SensorsAnalyticsSDK

RUN cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && apt-get clean all

ADD / /  
RUN useradd  suoper && \  
mv /crontab.conf /etc/cron.d/crontab.conf && \  
crontab -u suoper /etc/cron.d/crontab.conf && \  
chown -R suoper:suoper /logs

RUN chmod +x /start.sh

CMD ["cron", "-f"]  

c. 配置定时任务计划crontab.conf

SHELL=/bin/sh  
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# m h dom mon dow user    command
17 *    * * *   root    cd / && run-parts --report /etc/cron.hourly  
25 6    * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )  
47 6    * * 7   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )  
52 6    1 * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

1 * * * *  echo "Hello $(date)" >> /logs/python3log.log 2>&1  
#
# worktile
00 17 * * 5  /start.sh  /projects/worktile_extractor.py >> /logs/python3log.log 2>&1  
05 11 * * *  /start.sh  /projects/worktile_extractor.py >> /logs/python3log.log 2>&1  
0 * * * *  /start.sh  /projects/sd_event_failed_reason.py >> /logs/python3log.log 2>&1

# sensors
20 5 * * 1  /start.sh /projects/sd_extract_event.py >> /logs/python3log.log 2>&1  
#10 1 * * *  python /projects/video_view_event_track.py >> /logs/python3log.log 2>&1
#zanting
30 8-22 * * * /start.sh /projects/executor_virtual_events.py >> /logs/python3log.log 2>&1  
30 2 * * * /start.sh /projects/sd_track_monitor.py >> /logs/python3log.log 2>&1  
0 3 * * * /start.sh /projects/qa_quan_track_monitor.py >> /logs/python3log.log 2>&1  
0 2 * * 7 /start.sh /projects/sd_update_user_attributes.py >> /logs/python3log.log 2>&1

# Umeng
10 2 * * * /start.sh /projects/umeng_data_extractor.py daily >> /logs/python3log.log 2>&1  
10 8 * * 1 /start.sh /projects/umeng_data_extractor.py weekly >> /logs/python3log.log 2>&1  
10 4 * * * /start.sh /projects/umeng_channel_version.py >> /logs/python3log.log 2>&1

# Jira
10 3 * * 1-5 /start.sh /projects/jira_bug_info_extractor.py >> /logs/python3log.log 2>&1  
#jiankong
0 8 * * * /start.sh /projects/birt_report_monitor.py >> /logs/python3log.log 2>&1

#business
#0 17 * * 1-5 d /projects/az_conference_email.py>> /logs/python3log.log 2>&1

3.构建镜像

docker build -t python3-project-all /apps/python3-project/python3-project-all  

4 运行生成容器

docker run -d  -v /apps/python3-project/python3-project-all/logs:/logs  python3-project-all  

f. crontab.conf文件中的任务,就会自动按照配置的时间运行

3.修改脚本
推荐修改镜像文件后,重新构建 Docker,修改版本号

docker build -t python3-project-all:V1 /apps/python3-project/python3-project-all  

还可以通过commit把容器保存为镜像,但是这种操作是黑盒操作,外部不清楚容器发生什么变化:

docker commit \  
--author "Eavn Zhang <zpy@98ki.com>" \
--message "修改了Jira 脚本" \
python3 \  
python3-project-all:v1  

然后可以通过ls 命令查看

 docker image ls python3-project-all

新的镜像定制好后,可以来运行这个镜像。

docker run --name staticsGroupData -d -p 81:80 python3-project-all:v2  

附A:docker 常用命令

docker ps #查看目前容器属性(主要查看容器ID:CONTAINER ID ,下面简称CID)  
docker exec -it ${CID} bash #进入容器内部  
docker exec -it ${CID} 'python  /projects/xxxx' #在容器内手动执行脚本  
docker stop CID #停止容器运行  
docker rm CID #删除容器  

附B:资料

Docker 从入门到实践
使用docker部署服务和一些注意事项

张鹏宇

继续阅读此作者的更多文章