Computer Science Basics (1)

25-04-22 5

在本文中，您将会了解到关于ComputerScienceBasics(1)的新资讯，并给出一些关于1《想成为黑客，不知道这些命令行可不行》（LearnEnoughCommandLinetoBeDang

在本文中，您将会了解到关于Computer Science Basics (1)的新资讯，并给出一些关于1《想成为黑客，不知道这些命令行可不行》（Learn Enough Command Line to Be Dangerous）——基础（Basics）、2016/08/28 Back to the Basics Essentials of Modern C++ Style、4. Character encoding basics、ACM-ICPC2018 沈阳网络赛 Lattice''s basics in digital electronics（模拟）的实用技巧。

本文目录一览：

Computer Science Basics (1)
1《想成为黑客，不知道这些命令行可不行》（Learn Enough Command Line to Be Dangerous）——基础（Basics）
2016/08/28 Back to the Basics Essentials of Modern C++ Style
4. Character encoding basics
ACM-ICPC2018 沈阳网络赛 Lattice''s basics in digital electronics（模拟）

Computer Science Basics (1)

1) DMA and multiprogramming

DMA, or Direct Memory Acess, is a technique to speed up memory acess. With DMA, contents in memory could be transferred without CPU, thus leaving CPU free while doing I/O. Otherwise, the CPU would be busy coping bytes back and forth.

Multiprogramming is the rapid switching of the CPU between several processes. Its main purpose is to keep CPU busy while waiting for some I/O to complete.

We can deduce from the upper two that if a computer has no DMA, its multiprogramming feature would have little significance, because the CPU is still busy while doing some I/O.

2) The idea of computer family

A computer family refers to a series of computers whose programs are compatitable and only differs in performance and price. For example, Intel''s Pentium I, II, III and 4.

3) Hyperthreading and Superscalar CPU

Superscalar CPU architecture is an advanced pipeline design. It enables instruction level parallelism by offering mutiple execution units. Superscalar architecture does not mean that two or more threads could be executed simutaneously. It only leads to more fast execution of some thead.

Hyperthreading, or multithreading is the idea of the same CPU holding two or more state of threads at the same time. Hence, thread switching time would be reduced to the order or a nanosecond. Also, hyperthreading does not mean two or more threads could be executed at the same time! It only speed up thread switching.

4) trap instruction

A trap instruction switches the execution mode of CPU from user mode to kernel mode. This instruction allows a user program to invoke functions in the operating system kernel.

5) Seperation of policy and mechanism

Put mechanism for doing something in kernel, not policy.

Consider a process scheduling algorithm. Mechanism: looking for the highest priority process to run. Policy: assign priorities to processes.

6) Microkernel-based operating system V.S. Monolithic operating systems

Monolithic: all drivers are in the kernel, a buggy driver may result in a kernel crash by referencing an invaild memory address

Microkernel: splitting the operating system up into small, well defined modules, only one of which -- the microkernel -- runs in kernel mode. This design is aimed at high reliability.

Linux is a monolithic operating system with microkernel design philosophy.

1《想成为黑客，不知道这些命令行可不行》（Learn Enough Command Line to Be Dangerous）——基础（Basics）

基础

正如著名作者Neal Stephenson所说的那样，''(开发, 译者加)开始是命令行''，尽管通过用户图形界面使用计算机及其简单，但是在许多场景中，最有效、最灵活地与计算机交互的方式是使用命令行界面。在命令行界面中，计算机使用者敲出命令指示计算机执行所需任务。这些命令可以有很多种结合方式来实现多样的输出。例如一个典型的命令，如下图2

<center>图2:命令行命令原型</center> 本教程覆盖了基础的Unix命令，其中Unix指的是一系列包括Linux,Android, iOS(iPhone/iPad) 和macOS的操作系统。Unix系统为万维网上的大多数软件提供服务，运行于大多数的移动、平板设备，同时也服务世界上大量的台上计算机。由于Unix在现代计算的核心地位，所以本教程涵盖了Unix开发软件的方法。可以动摇Unix的霸主地位的唯一例外是Microsoft Windows，它不属于Unix的一部分，但是大部分开发使用的原生Windows开发工具依然受益于学习Unix命令行。除此之外，有些用户可能需要在Unix服务器上发出命令（如，通过''''secure shell‘’命令 ssh），此时对Unix命令的熟悉变得非常重要。最后，强烈推荐Windows用户运行免费的Linux虚拟机(如下‘运行虚拟机’所说)学习、利用本教程。另一个好的选择是使用cloud IDE如[Cloud9](http://c9.io/)，若用此方法，学习[Ruby on Rails Tutorial book](http://railstutorial.org/book)这本书的[Development environment](https://www.railstutorial.org/book/beginning#sec-development_environment)这一节。

备注：本节重要的命令行总结在表2

Box2 运行虚拟机:

为了能完成本教程，Windows用户应该安装一对免费程序来运行虚拟机（模拟计算机），这样就可以允许Windows系统托管Linux操作系统的一个版本，操作步骤如下所示： 1.安装适用于系统的VirtualBox正确版本(免费) 2.下载学习虚拟机(大文件) 3.下载成功后，双击产生的''OVA''文件，并根据指示安装虚拟机(VM). 4. 双击VM本身并用默认用户密码‘foobar!’登录（若成功完成这些步骤，在复杂的技术中是一个好的开始，更深入的探索在1.3章节的Box 5开启），最后会产生一个为本教程预先配置的Linux桌面环境(包括一个命令行终端程序)，如下图3所示从长远来看，我建议尽快入手Mac，也许你会需要存点时间，因为Mac通常比Windows机器更贵，但是在大多数的情况下，价格高的生产效率也会更高。（如果你发现你更喜欢Linux,并会一直坚持喜欢，但是Mac因为有更好的用户界面更易使用，另外，你可以一直在虚拟机（VM）中运行Linux,即使在Mac上也可以）

<center>图3:运行在主机系统中的Linux虚拟机</center>

2016/08/28 Back to the Basics Essentials of Modern C++ Style

每天推荐一个英文视频

http://v.qq.com/page/y/j/i/y0...
https://www.youtube.com/watch...

本日看点

4. Character encoding basics

Text in a computer or on the Web is composed of characters. Characters represent letters of the alphabet, punctuation, or other symbols.Characters are grouped into a character set .This is then called a coded character set when each character is assigned a particular number, called a code point. These code points will be represented in the computer by one or more bytes. A character encoding is a key to unlock (ie. crack) the code. It is a set of mappings between the bytes representing numbers in the computer and characters in the coded character set. Without the key, the data looks like garbage.

To start with you have to understand a little bit about computers. Information on a computer is stored and transmitted in what are called bits. Characters in a character set are stored as one or more bytes in a computer Certain bits or combinations of bits equate to certain characters. There are many different character encodings. If the wrong encoding is applied to the bytes in memory, the result will be unintelligible text. It is therefore important, if people are to read your content, that you correctly label the character encoding used.

At first developers thought that 256 letters must be enough. Computers were for English speaking people only and few special characters where allowed. There were good reasons for this: memory was expensive and a fixed size for characters made the programming easier. This first way of storing things were called ASCII.

Time went by and people discovered that they needed letters that were not among the original 256.So what they did was replace a few special characters with the ones they needed.Different organizations have assembled different sets of characters and created encodings for them.But different people needed different characters and soon we had hundreds of sets to select from. This mess is what you see if you select View -> (Character) encodings -> More (encodings) in your browser. In addition, it is usually impossible to combine different encodings on the same Web page or in a database, so it is usually very difficult to support multilingual pages using ‘legacy’ approaches to encoding.

For example, in the coded character set called ISO 8859-1 (also known as Latin1) the decimal code point value for the letter é is 233. In ISO 8859-5, the same code point represents the Cyrillic character щ. These character sets contain fewer than 256 characters and map code points to byte values directly. So a code point with the value 233 is represented by a single byte with a value of 233. Note however that that byte may represent either é or щ, depending on the context.

There are other ways of handling characters from a range of scripts. For example, with the Unicode character set, you can represent both characters in the same set. In fact, Unicode contains, in a single set, most characters you are likely to ever need. While the value of 233 still represents the é, the Cyrillic character щ now has a code point value of 1097. This is too large a number to be represented by a single byte*. If you use the character encoding for Unicode text called UTF-8, щ will be represented by two bytes, but the code point value is not simply derived from the value of the two bytes spliced together – some more complicated decoding is needed. Other Unicode characters map to one, three or four bytes in the UTF-8 encoding.

UTF-8 is the most widely used way to represent Unicode text in web pages. But UTF-8 is only one of the possible ways of encoding Unicode characters. In other words, a single code point in the Unicode character set can actually be mapped to different byte sequences, depending on which encoding was used for the document. Unicode code points can be mapped to bytes using any one of the encodings called UTF-8, UTF-16 or UTF-32. The Devanagari character क, with code point 2325 (which is 915 in hexadecimal notation), will be represented by two bytes when using the UTF-16 encoding (09 15), three bytes with UTF-8 (E0 A4 95), or four bytes with UTF-32 (00 00 09 15).

There can be further complications beyond those described in the panel above (such as byte order and escape sequences), but the detail described there shows why it is important that the application you are working with knows which character encoding is appropriate for your data, and knows how to handle that encoding.

The Unicode Consortium provides a large, single character set that aims to include all the characters needed for any writing system in the world. It is now fundamental to the architecture of the Web and operating systems, and is supported by all major web browsers and applications.

A font is a collection of glyph definitions, ie. definitions of shapes used to display characters.

Once your application has worked out what characters it is dealing with, it will then look in the font for glyphs in order to display or print those characters. (Of course, if the encoding information was wrong, it will be looking up glyphs for the wrong characters.)

A given font will usually cover a single character set, or in the case of a large character set like Unicode, just a subset of all the characters in the set. If your font doesn''t have a glyph for a particular character, some applications will look for the missing character in other fonts on your system (which will mean that the glyph will look different from the surrounding text, like a ransom note). Otherwise you will typically see a square box, a question mark or some other character instead. For example:

It is important to clearly distinguish between the concepts of a character set versus a character encoding.

...The "charset" parameter identifies a character encoding, which is a method of converting a sequence of bytes into a sequence of characters. This conversion fits naturally with the scheme of Web activity: servers send HTML documents to user agents as a stream of bytes; user agents interpret them as a sequence of characters. The conversion method can range from simple one-to-one correspondence to complex switching schemes or algorithms...

Reference: Section 5.2 Character encodings of the HTML Document Representation W3C Recommendations

Character encoding tells the browser and validator what set of characters to use when converting the bits to characters.

Choosing an encoding

Everyone developing content, whether content authors or programmers, must decide what character encoding to use. UTF-8 is a popular recommendation these days, but there may still be things you should consider before using it. Content developers and webmasters may also need to ensure that the server delivers content with the correct character encoding declarations, since server settings can override in-document declarations.

Reference: http://www.w3.org/International/articles/definitions-characters/ http://www.w3.org/International/questions/qa-what-is-encoding

ACM-ICPC2018 沈阳网络赛 Lattice''s basics in digital electronics（模拟）

Lattice''s basics in digital electronics

44.08%
1000ms
131072K

LATTICE is learning Digital Electronic Technology. He is talented, so he understood all those pieces of knowledge in 10^{-9}10−9 second. In the next 10^{-9}10−9 second, he built a data decoding device that decodes data encoded with his special binary coding rule to meaningful words.

His coding rule is called "prefix code", a type of code system (typically a variable-length code) distinguished by its possession of the "prefix property", which requires that there is no whole code word in the system that is a prefix (initial segment) of any other code word in the system. Note that his code is composed of only 00 and 11.

LATTICE''s device only receives data that perfectly matches LATTICE''s rules, in other words, people who send message to LATTICE will always obey his coding rule. However, in the process of receiving data, there are errors that cannot avoid, so LATTICE uses parity check to detect error bytes, after every 88-bit data there is 11 bit called parity bit, which should be ''0'' if there are odd number of ''1''s in the previous 88 bits and should be ''1'' if there are even number of ''1''s. If the parity bit does not meet the fact, then the whole 99 bits (including the parity bit) should be considered as invalid data and ignored. Data without parity bit is also considered as invalid data. Parity bits will be deleted after the parity check.

For example, consider the given data "101010101010101010101010", it should be divided into 33parts:"101010101","010101010" and "101010". For the first part, there are 44 ''1''s in the first 88 bits, and parity bit is ''1'', so this part passed the check. For the second part, there are 44 ''1''s and parity bit is ''0'', so this part failed the check. For the third part, it has less than 99 bits so it contains no parity bit, so this part also failed the check. The data after parity check is "10101010", which is the first 88 bits of first part.

Data passed the parity check will go into a process that decodes LATTICE''s code. The process is described in the following example: consider a situation that, "010" represents ''A'' and "1011" represents ''B'', if the data after parity check is "01010110101011010010", it can be divided into "010"+"1011"+"010"+"1011"+"010"+"010", which means "ABABAA" . LATTICE''s device is so exquisite that it can decode all visible characters in the ASCII table .

LATTICE is famous for his Talk show, some reporters have sneaked into his mansion, they stole the data LATTICE to decode in hexadecimal, the coding rule consists of NN pairs of corresponding relations from a bit string S_iSi to an ASCII code C_iCi, and the message length MM, they want to peek his privacy so they come to you to write a program that decodes messages that LATTICE receives.

Input

The first line an integer T\ (T<35)T (T<35) represents the number of test cases.

Every test case starts with one line containing two integers, M\ (0<M\leq100000)M (0<M≤100000), the number of original characters, and N\ (1\leq N \leq 256)N (1≤N≤256), then NN lines, every line contains an integer C_iCi, and a string S_i(0<|S_i|\leq 10)Si(0<∣Si∣≤10), means that S_iSi represents C_iCi, the ASCII code to a visible character and S_iSi only contains ''0''or ''1'' and there are no two numbers ii and jj that S_iSi is prefix of S_jSj.

Then one line contains data that is going to be received in hexadecimal. (0<|data|<200000)(0<∣data∣<200000).

Output

For each test case, output the decoded message in a new line, the length of the decoded message should be the same with the length of original characters, which means you can stop decoding having outputted MM characters. Input guarantees that it will have no less than MM valid characters and all given ASCII codes represent visible characters.

Hint

Lattice''s encoding rule for test case 22:

ASCII code	character	lattice''s code
4949	11	00010001
5050	22	0100101001
5151	33	011011

the device takes this input in hex

14DB24722698

input in binary

0001 0100 1101 1011 0010 0100 0111 0010 0010 0110 1001 1000

formatted into 66 lines, each line contains 88 data bits and one parity bit

00010100 1

10110110 0

10010001 1

10010001 0

01101001 1

parity check of the third line and the last line failed, so ignore those two lines.parity bits should also be ignored.

00010100

10110110

10010001

01101001

arrange those bits by the rules informed

0001 01001 011 011 01001 0001 011 01001

output the result

12332132

样例输入复制

2
15 9
32 0100
33 11
100 1011
101 0110
104 1010
108 00
111 100
114 0111
119 0101
A6Fd021171c562Fde1
8 3
49 0001
50 01001
51 011
14DB24722698

样例输出复制

hello world!!!!
12332132

题目来源

ACM-ICPC 2018 沈阳赛区网络预赛

大模拟，进制转换解码。900ms... 多交几发就过了

#include<bits/stdc++.h>
#define MAX 15
using namespace std;
typedef long long ll;

int x;
char y[MAX];
string s,ss;
map<string,int> mp;
string bb(char x){
    if(x==''0'') return "0000";if(x==''8'') return "1000";
    if(x==''1'') return "0001";if(x==''9'') return "1001";
    if(x==''2'') return "0010";if(x==''A''||x==''a'') return "1010";
    if(x==''3'') return "0011";if(x==''B''||x==''b'') return "1011";
    if(x==''4'') return "0100";if(x==''C''||x==''c'') return "1100";
    if(x==''5'') return "0101";if(x==''D''||x==''d'') return "1101";
    if(x==''6'') return "0110";if(x==''E''||x==''e'') return "1110";
    if(x==''7'') return "0111";if(x==''F''||x==''f'') return "1111";
}
int find(string s){
    int c=0;
    for(int i=0;i<s.length();i++){
        if(s[i]==''1'') c++;
    }
    return c;
}
int main()
{
    int t,n,m,i,j;
    scanf("%d",&t);
    while(t--){
        scanf("%d%d",&n,&m);
        mp.clear();
        for(i=1;i<=m;i++){
            scanf("%d %s",&x,y);
            mp[y]=x;
        }
        cin>>s;
        ss="";
        int len=s.length();
        for(i=0;i<len;i++){
            ss+=bb(s[i]);
        }
        s="";
        len=ss.length();
        for(i=0;i<len;i+=9){
            if(i+8>=len) break;
            if((find(ss.substr(i,8))&1)!=ss[i+8]-''0'') s+=ss.substr(i,8);
        }
        len=s.length();
        for(i=0;i<len;i++){
            if(n<=0) break;
            for(j=i;j<len;j++){
                if(mp[s.substr(i,j-i+1)]){
                    printf("%c",mp[s.substr(i,j-i+1)]);
                    n--;i=j;
                    break;
                }
            }
        }
        printf("\n");
    }
    return 0;
}

我们今天的关于Computer Science Basics (1)的分享就到这里，谢谢您的阅读，如果想了解更多关于1《想成为黑客，不知道这些命令行可不行》（Learn Enough Command Line to Be Dangerous）——基础（Basics）、2016/08/28 Back to the Basics Essentials of Modern C++ Style、4. Character encoding basics、ACM-ICPC2018 沈阳网络赛 Lattice''s basics in digital electronics（模拟）的相关信息，可以在本站进行搜索。

本文标签：